DEEPVARIANT

Call genetic variants using deep neural network. Copyright 2017 Google LLC. BSD 3-Clause “New” or “Revised” https://github.com/google/deepvariant

Example

This wrapper can be used in the following way:

rule deepvariant:
    input:
        bam="mapped/{sample}.bam",
        ref="genome/genome.fasta"
    output:
        vcf="calls/{sample}.vcf.gz"
    params:
        model="wgs",   # {wgs, wes, pacbio, hybrid}
        sample_name=lambda w: w.sample, # optional
        extra=""
    threads: 2
    log:
        "logs/deepvariant/{sample}/stdout.log"
    wrapper:
        "0.75.0-13-g0997adf/bio/deepvariant"


rule deepvariant_gvcf:
    input:
        bam="mapped/{sample}.bam",
        ref="genome/genome.fasta"
    output:
        vcf="gvcf_calls/{sample}.vcf.gz",
        gvcf="gvcf_calls/{sample}.g.vcf.gz"
    params:
        model="wgs",   # {wgs, wes, pacbio, hybrid}
        extra=""
    threads: 2
    log:
        "logs/deepvariant/{sample}/stdout.log"
    wrapper:
        "0.75.0-13-g0997adf/bio/deepvariant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • deepvariant==1.1.0

Input/Output

Input:

  • fasta
  • bam

Output:

  • vcf
  • visual report html

Notes

  • The extra param alllows for additional program arguments.
  • This snakemake wrapper uses bioconda deepvariant package. Copyright 2018 Brad Chapman.

Authors

  • Tetsuro Hisayoshi
  • Nikos Tsardakas Renhuldt

Code

__author__ = "Tetsuro Hisayoshi"
__copyright__ = "Copyright 2020, Tetsuro Hisayoshi"
__email__ = "hisayoshi0530@gmail.com"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

log_dir = os.path.dirname(snakemake.log[0])
output_dir = os.path.dirname(snakemake.output[0])

# sample name defaults to basename
sample_name = snakemake.params.get(
    "sample_name", os.path.splitext(os.path.basename(snakemake.input.bam))[0]
)


make_examples_gvcf = postprocess_gvcf = ""
gvcf = snakemake.output.get("gvcf", None)
if gvcf:
    make_examples_gvcf = "--gvcf {tmp_dir} "
    postprocess_gvcf = (
        "--gvcf_infile {tmp_dir}/{sample_name}.gvcf.tfrecord@{snakemake.threads}.gz "
        "--gvcf_outfile {snakemake.output.gvcf} "
    )

with tempfile.TemporaryDirectory() as tmp_dir:
    shell(
        "(dv_make_examples.py "
        "--cores {snakemake.threads} "
        "--ref {snakemake.input.ref} "
        "--reads {snakemake.input.bam} "
        "--sample {sample_name} "
        "--examples {tmp_dir} "
        "--logdir {log_dir} " + make_examples_gvcf + "{extra} \n"
        "dv_call_variants.py "
        "--cores {snakemake.threads} "
        "--outfile {tmp_dir}/{sample_name}.tmp "
        "--sample {sample_name} "
        "--examples {tmp_dir} "
        "--model {snakemake.params.model} \n"
        "dv_postprocess_variants.py "
        "--ref {snakemake.input.ref} "
        + postprocess_gvcf
        + "--infile {tmp_dir}/{sample_name}.tmp "
        "--outfile {snakemake.output.vcf} ) {log}"
    )