GATK MUTECT2
Call somatic SNVs and indels via local assembly of haplotypes
URL: https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2
Example
This wrapper can be used in the following way:
rule mutect2:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
output:
vcf="variant/{sample}.vcf",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v3.9.0-14-g476823b/bio/gatk/mutect"
rule mutect2_bam:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
output:
vcf="variant_bam/{sample}.vcf",
bam="variant_bam/{sample}.bam",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v3.9.0-14-g476823b/bio/gatk/mutect"
rule mutect2_complete:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
intervals="genome/intervals.bed",
output:
vcf="variant_complete/{sample}.vcf",
bam="variant_complete/{sample}.bam",
f1r2="counts/{sample}.f1r2.tar.gz",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v3.9.0-14-g476823b/bio/gatk/mutect"
rule mutect2_list:
input:
fasta="genome/genome.fasta",
map=expand("mapped/{sample}.bam", sample=["a", "b"]),
output:
vcf="variant_list/a_b.vcf",
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect2_list.log",
params:
extra="--tumor-sample a_normal --normal-sample b_normal",
wrapper:
"v3.9.0-14-g476823b/bio/gatk/mutect"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
gatk4=4.5.0.0
snakemake-wrapper-utils=0.6.2
Input/Output
Input:
map
: Mapped reads (SAM/BAM/CRAM)fasta
: Reference Fasta fileintervals
: Optional path to a BED interval filepon
: Optional path to Panel of Normals (flagged as BETA)germline
: Optional path to known germline variants
Output:
vcf
: Path to variant filebam
: Optional path to output bam filef1r2
: Optional path to f1r2 count file
Params
extra
: Optional parameters for GATK Mutect2use_parallelgc
: Automatically add “-XX:ParallelGCThreads={snakemake.threads}” to your command line. Set to True if your architecture supports ParallelGCThreads.use_omp
: Automatically set OMP_NUM_THREADS environment variable. Set to True if your java architecture uses OMP threads.java_opts
: allows for additional arguments to be passed to the java compiler (not for -XmX or -Djava.io.tmpdir, -XX:ParallelGCThreads, since they are handled automatically).
Code
"""Snakemake wrapper for GATK4 Mutect2"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# On non-omp systems, and in case OMP_NUM_THREADS
# was not set, define OMP_NUM_THREADS through python
if "OMP_NUM_THREADS" not in os.environ.keys() and snakemake.params.get(
"use_omp", False
):
os.environ["OMP_NUM_THREADS"] = snakemake.threads
bam_output = snakemake.output.get("bam", "")
if bam_output:
bam_output = f"--bam-output {bam_output }"
germline_resource = snakemake.input.get("germline", "")
if germline_resource:
germline_resource = f"--germline-resource {germline_resource}"
intervals = snakemake.input.get("intervals", "")
if intervals:
intervals = f"--intervals {intervals}"
f1r2 = snakemake.output.get("f1r2", "")
if f1r2:
f1r2 = f"--f1r2-tar-gz {f1r2}"
pon = snakemake.input.get("pon", "")
if pon:
pon = f"--panel-of-normals {pon}"
aln = snakemake.input.map
# If aln is a string, then no change are done
if isinstance(aln, list):
# Do not add the first `--input` since it will
# be added later in the shell command
aln = " --input ".join(aln)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
# In case Java execution environment suits GC parallel
# calls, these must be given as optional java parameters
if snakemake.params.get("use_parallelgc", False):
if "UseParallelGC" not in java_opts:
java_opts += " -XX:+UseParallelGC "
if "ParallelGCThreads" not in java_opts:
java_opts += f" -XX:ParallelGCThreads={snakemake.threads}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' Mutect2" # Tool and its subprocess
" --native-pair-hmm-threads {snakemake.threads}"
" --input {aln}" # Path to input mapping file
" --reference {snakemake.input.fasta}" # Path to reference fasta file
" {f1r2}" # Optional path to output f1r2 count file
" {germline_resource}" # Optional path to optional germline resource VCF
" {intervals}" # Optional path to optional bed intervals
" {pon} " # Optional path to panel of normals
" {extra}" # Extra parameters
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}" # Path to output vcf file
" {bam_output}" # Path to output bam file, optional
" {log}" # Logging behaviour
)