.. _`bio/gatk/haplotypecaller`: GATK HAPLOTYPECALLER ==================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gatk/haplotypecaller?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gatk/haplotypecaller Run gatk HaplotypeCaller. **URL**: https://gatk.broadinstitute.org/hc/en-us/articles/9570334998171-HaplotypeCaller Example ------- This wrapper can be used in the following way: .. code-block:: python rule haplotype_caller: input: # single or list of bam files bam="mapped/{sample}.bam", ref="genome.fasta", # known="dbsnp.vcf" # optional output: vcf="calls/{sample}.vcf", # bam="{sample}.assemb_haplo.bam", log: "logs/gatk/haplotypecaller/{sample}.log", params: extra="", # optional java_opts="", # optional threads: 4 resources: mem_mb=1024, wrapper: "v3.0.1/bio/gatk/haplotypecaller" rule haplotype_caller_gvcf: input: # single or list of bam files bam="mapped/{sample}.bam", ref="genome.fasta", # known="dbsnp.vcf" # optional output: gvcf="calls/{sample}.g.vcf", # bam="{sample}.assemb_haplo.bam", log: "logs/gatk/haplotypecaller/{sample}.log", params: extra="", # optional java_opts="", # optional threads: 4 resources: mem_mb=1024, wrapper: "v3.0.1/bio/gatk/haplotypecaller" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * The `java_opts` param allows for additional arguments to be passed to the java compiler, e.g. `-XX:ParallelGCThreads=10` (not for `-XmX` or `-Djava.io.tmpdir`, since they are handled automatically). * The `extra` param allows for additional program arguments. Software dependencies --------------------- * ``gatk4=4.4.0.0`` * ``snakemake-wrapper-utils=0.6.2`` Input/Output ------------ **Input:** * BAM file **Output:** * GVCF file Authors ------- * Johannes Köster * Jake VanCampen * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Johannes Köster" __copyright__ = "Copyright 2018, Johannes Köster" __email__ = "johannes.koester@protonmail.com" __license__ = "MIT" import os import tempfile from snakemake.shell import shell from snakemake_wrapper_utils.java import get_java_opts extra = snakemake.params.get("extra", "") java_opts = get_java_opts(snakemake) bams = snakemake.input.bam if isinstance(bams, str): bams = [bams] bams = list(map("--input {}".format, bams)) intervals = snakemake.input.get("intervals", "") if not intervals: intervals = snakemake.params.get("intervals", "") if intervals: intervals = "--intervals {}".format(intervals) known = snakemake.input.get("known", "") if known: known = "--dbsnp " + str(known) vcf_output = snakemake.output.get("vcf", "") if vcf_output: output = " --output " + str(vcf_output) gvcf_output = snakemake.output.get("gvcf", "") if gvcf_output: output = " --emit-ref-confidence GVCF " + " --output " + str(gvcf_output) if (vcf_output and gvcf_output) or (not gvcf_output and not vcf_output): if vcf_output and gvcf_output: raise ValueError( "please set vcf or gvcf as output, not both! It's not supported by gatk" ) else: raise ValueError("please set one of vcf or gvcf as output (not both)!") bam_output = snakemake.output.get("bam", "") if bam_output: bam_output = " --bam-output " + str(bam_output) log = snakemake.log_fmt_shell(stdout=True, stderr=True) with tempfile.TemporaryDirectory() as tmpdir: shell( "gatk --java-options '{java_opts}' HaplotypeCaller" " --native-pair-hmm-threads {snakemake.threads}" " {bams}" " --reference {snakemake.input.ref}" " {intervals}" " {known}" " {extra}" " --tmp-dir {tmpdir}" " {output}" " {bam_output}" " {log}" ) .. |nl| raw:: html