MANTA¶
Call structural variants with manta.
Example¶
This wrapper can be used in the following way:
rule manta:
input:
ref="human_g1k_v37_decoy.small.fasta",
samples=["mapped/a.bam"],
index=["mapped/a.bam.bai"],
bed="test.bed.gz", # optional
output:
vcf="results/out.bcf",
idx="results/out.bcf.csi",
cand_indel_vcf="results/small_indels.vcf.gz",
cand_indel_idx="results/small_indels.vcf.gz.tbi",
cand_sv_vcf="results/cand_sv.vcf.gz",
cand_sv_idx="results/cand_sv.vcf.gz.tbi",
params:
extra_cfg="", # optional
extra_run="", # optional
log:
"logs/manta.log",
threads: 2
resources:
mem_mb=4096,
wrapper:
"v2.6.0/bio/manta"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra_cfg param allows for additional program arguments to configManta.py.
- The extra_run param allows for additional program arguments to runWorkflow.py.
- The runDir is created using pythons tempfile, meaning that all intermediate files are deleted on job completion
- For more information see, https://github.com/Illumina/manta
Software dependencies¶
manta=1.6.0
bcftools=1.17
Input/Output¶
Input:
- BAM/CRAM file(s)
- reference genome
- BED file (optional)
Output:
- SVs and indels scored and genotyped under a diploid model (diploidSV.vcf.gz).
- Unfiltered SV and indel candidates (candidateSV.vcf.gz).
- Subset of the previous file containing only simple insertion and deletion variants less than the minimum scored variant size (candidateSmallIndels.vcf.gz).
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import math
from snakemake.shell import shell
from pathlib import Path
from tempfile import TemporaryDirectory
extra_cfg = snakemake.params.get("extra_cfg", "")
extra_run = snakemake.params.get("extra_run", "")
bed = snakemake.input.get("bed", "")
if bed:
bed = f"--callRegions {bed}"
mem_gb = snakemake.resources.get("mem_gb", "")
if not mem_gb:
# 20 Gb of mem by default
mem_gb = math.ceil(snakemake.resources.get("mem_mb", 20480) / 1024)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
with TemporaryDirectory() as tempdir:
tempdir = Path(tempdir)
run_dir = tempdir / "runDir"
bams = []
# Symlink BAM/CRAM files to avoid problems with filenames
for aln, idx in zip(snakemake.input.samples, snakemake.input.index):
aln = Path(aln)
idx = Path(idx)
(tempdir / aln.name).symlink_to(aln.resolve())
bams.append(tempdir / aln.name)
if idx.name.endswith(".bam.bai") or idx.name.endswith(".cram.crai"):
(tempdir / idx.name).symlink_to(idx.resolve())
if idx.name.endswith(".bai"):
(tempdir / idx.name).with_suffix(".bam.bai").symlink_to(idx.resolve())
elif idx.name.endswith(".crai"):
(tempdir / idx.name).with_suffix(".cram.crai").symlink_to(idx.resolve())
else:
raise ValueError(f"invalid index file name provided: {idx}")
bams = list(map("--normalBam {}".format, bams))
shell(
# Configure Manta
"configManta.py {extra_cfg} {bams} --referenceFasta {snakemake.input.ref} {bed} --runDir {run_dir} {log}; "
# Run Manta
"python2 {run_dir}/runWorkflow.py {extra_run} --jobs {snakemake.threads} --memGb {mem_gb} {log}; "
)
# Copy outputs into proper position.
def infer_vcf_ext(vcf):
if vcf.endswith(".vcf.gz"):
return "z"
elif vcf.endswith(".bcf"):
return "b"
else:
raise ValueError(
"invalid VCF extension. Only '.vcf.gz' and '.bcf' are supported."
)
def copy_vcf(origin_vcf, dest_vcf, dest_idx):
if dest_vcf and dest_vcf != origin_vcf:
dest_vcf_format = infer_vcf_ext(dest_vcf)
shell(
"bcftools view --threads {snakemake.threads} --output {dest_vcf:q} --output-type {dest_vcf_format} {origin_vcf:q} {log}"
)
origin_idx = str(origin_vcf) + ".tbi"
if dest_idx and dest_idx != origin_idx:
shell(
"bcftools index --threads {snakemake.threads} --output {dest_idx:q} {dest_vcf:q} {log}"
)
results_base = run_dir / "results" / "variants"
# Copy main VCF output
vcf_temp = results_base / "diploidSV.vcf.gz"
vcf_final = snakemake.output.get("vcf")
idx_final = snakemake.output.get("idx")
copy_vcf(vcf_temp, vcf_final, idx_final)
# Copy candidate small indels VCF
cand_indel_vcf_temp = results_base / "candidateSmallIndels.vcf.gz"
cand_indel_vcf_final = snakemake.output.get("cand_indel_vcf")
cand_indel_idx_final = snakemake.output.get("cand_indel_idx")
copy_vcf(cand_indel_vcf_temp, cand_indel_vcf_final, cand_indel_idx_final)
# Copy candidates structural variants VCF
cand_sv_vcf_temp = results_base / "candidateSV.vcf.gz"
cand_sv_vcf_final = snakemake.output.get("cand_sv_vcf")
cand_sv_idx_final = snakemake.output.get("cand_sv_idx")
copy_vcf(cand_sv_vcf_temp, cand_sv_vcf_final, cand_sv_idx_final)