GATK MUTECT2

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gatk/mutect?label=version%20update%20pull%20requests

Call somatic SNVs and indels via local assembly of haplotypes

URL: https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2

Example

This wrapper can be used in the following way:

rule mutect2:
    input:
        fasta="genome/genome.fasta",
        map="mapped/{sample}.bam",
    output:
        vcf="variant/{sample}.vcf",
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads: 1
    resources:
        mem_mb=1024,
    log:
        "logs/mutect_{sample}.log",
    wrapper:
        "v3.8.0-1-g149ef14/bio/gatk/mutect"


rule mutect2_bam:
    input:
        fasta="genome/genome.fasta",
        map="mapped/{sample}.bam",
    output:
        vcf="variant_bam/{sample}.vcf",
        bam="variant_bam/{sample}.bam",
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads: 1
    resources:
        mem_mb=1024,
    log:
        "logs/mutect_{sample}.log",
    wrapper:
        "v3.8.0-1-g149ef14/bio/gatk/mutect"


rule mutect2_complete:
    input:
        fasta="genome/genome.fasta",
        map="mapped/{sample}.bam",
        intervals="genome/intervals.bed",
    output:
        vcf="variant_complete/{sample}.vcf",
        bam="variant_complete/{sample}.bam",
        f1r2="counts/{sample}.f1r2.tar.gz",
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads: 1
    resources:
        mem_mb=1024,
    log:
        "logs/mutect_{sample}.log",
    wrapper:
        "v3.8.0-1-g149ef14/bio/gatk/mutect"

rule mutect2_list:
    input:
        fasta="genome/genome.fasta",
        map=expand("mapped/{sample}.bam", sample=["a", "b"]),
    output:
        vcf="variant_list/a_b.vcf",
    threads: 1
    resources:
        mem_mb=1024,
    log:
        "logs/mutect2_list.log",
    params:
        extra="--tumor-sample a_normal --normal-sample b_normal",
    wrapper:
        "v3.8.0-1-g149ef14/bio/gatk/mutect"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • gatk4=4.5.0.0

  • snakemake-wrapper-utils=0.6.2

Input/Output

Input:

  • map: Mapped reads (SAM/BAM/CRAM)

  • fasta: Reference Fasta file

  • intervals: Optional path to a BED interval file

  • pon: Optional path to Panel of Normals (flagged as BETA)

  • germline: Optional path to known germline variants

Output:

  • vcf: Path to variant file

  • bam: Optional path to output bam file

  • f1r2: Optional path to f1r2 count file

Params

  • extra: Optional parameters for GATK Mutect2

  • use_parallelgc: Automatically add “-XX:ParallelGCThreads={snakemake.threads}” to your command line. Set to True if your architecture supports ParallelGCThreads.

  • use_omp: Automatically set OMP_NUM_THREADS environment variable. Set to True if your java architecture uses OMP threads.

  • java_opts: allows for additional arguments to be passed to the java compiler (not for -XmX or -Djava.io.tmpdir, -XX:ParallelGCThreads, since they are handled automatically).

Authors

  • Thibault Dayris

  • Filipe G. Vieira

Code

"""Snakemake wrapper for GATK4 Mutect2"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# On non-omp systems, and in case OMP_NUM_THREADS
# was not set, define OMP_NUM_THREADS through python
if "OMP_NUM_THREADS" not in os.environ.keys() and snakemake.params.get(
    "use_omp", False
):
    os.environ["OMP_NUM_THREADS"] = snakemake.threads


bam_output = snakemake.output.get("bam", "")
if bam_output:
    bam_output = f"--bam-output {bam_output }"


germline_resource = snakemake.input.get("germline", "")
if germline_resource:
    germline_resource = f"--germline-resource {germline_resource}"

intervals = snakemake.input.get("intervals", "")
if intervals:
    intervals = f"--intervals {intervals}"

f1r2 = snakemake.output.get("f1r2", "")
if f1r2:
    f1r2 = f"--f1r2-tar-gz {f1r2}"

pon = snakemake.input.get("pon", "")
if pon:
    pon = f"--panel-of-normals {pon}"


aln = snakemake.input.map
# If aln is a string, then no change are done
if isinstance(aln, list):
    # Do not add the first `--input` since it will
    # be added later in the shell command
    aln = " --input ".join(aln)


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

# In case Java execution environment suits GC parallel
# calls, these must be given as optional java parameters
if snakemake.params.get("use_parallelgc", False):
    if "UseParallelGC" not in java_opts:
        java_opts += " -XX:+UseParallelGC "
    if "ParallelGCThreads" not in java_opts:
        java_opts += f" -XX:ParallelGCThreads={snakemake.threads}"


with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "gatk --java-options '{java_opts}' Mutect2"  # Tool and its subprocess
        " --native-pair-hmm-threads {snakemake.threads}"
        " --input {aln}"  # Path to input mapping file
        " --reference {snakemake.input.fasta}"  # Path to reference fasta file
        " {f1r2}"  # Optional path to output f1r2 count file
        " {germline_resource}"  # Optional path to optional germline resource VCF
        " {intervals}"  # Optional path to optional bed intervals
        " {pon} "  # Optional path to panel of normals
        " {extra}"  # Extra parameters
        " --tmp-dir {tmpdir}"
        " --output {snakemake.output.vcf}"  # Path to output vcf file
        " {bam_output}"  # Path to output bam file, optional
        " {log}"  # Logging behaviour
    )