The Snakemake Wrappers repository

https://img.shields.io/badge/snakemake-≥4.5.0-brightgreen.svg?style=flat-square

The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.

Usage

The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "0.2.0/bio/samtools/sort"

Here, Snakemake will automatically download the corresponding wrapper from https://bitbucket.org/snakemake/snakemake-wrappers/src/0.2.0/bio/samtools/sort/wrapper.py. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id (see here). This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://.

Each wrapper defines required software packages and versions. In combination with the --use-conda flag of Snakemake, these will be deployed automatically.

Contribute

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:

  • fork the repository
  • develop your contribution
  • perform a pull request

The pull request will be reviewed and included as fast as possible. Thereby, contributions should follow the coding style of the already present examples, i.e.

  • provide a meta.yaml with name, description and author of the wrapper,
  • provide an environment.yaml which lists all required software packages (the packages shall be available via https://anaconda.org),
  • provide an example Snakefile that shows how to use the wrapper,
  • follow the python style guide,
  • use 4 spaces for indentation.

BCFTOOLS

Wrappers

BCFTOOLS CALL

Call variants with bcftools.

Software dependencies
  • samtools ==1.5
  • bcftools ==1.5
Example

This wrapper can be used in the following way:

rule bcftools_call:
    input:
        ref="genome.fasta",
        samples=expand("mapped/{sample}.sorted.bam", sample=config["samples"]),
        indexes=expand("mapped/{sample}.sorted.bam.bai", sample=config["samples"])
    output:
        # Here, we optionally use a region as wildcard and constrain it to the
        # format accepted by samtools mpileup.
        "called/{region,.+(:[0-9]+-[0-9]+)?}.bcf"
    params:
        # Optional parameters for samtools mpileup (except -g, -f).
        # In this example, we forward the region wildcard from the output file to mpileup.
        mpileup="--region {region}",
        # Optional parameters for bcftools call (except -v, -o, -m).
        call=""
    log:
        "logs/bcftools_call/{region}.log"
    wrapper:
        "0.30.0/bio/bcftools/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "(samtools mpileup {snakemake.params.mpileup} {snakemake.input.samples} "
    "--fasta-ref {snakemake.input.ref} --BCF --uncompressed | "
    "bcftools call -m {snakemake.params.call} -o {snakemake.output[0]} -v -) 2> {snakemake.log}")
BCFTOOLS CONCAT

Concatenate vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.6
Example

This wrapper can be used in the following way:

rule bcftools_concat:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.30.0/bio/bcftools/concat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools concat {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}")
BCFTOOLS MERGE

Merge vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.6
Example

This wrapper can be used in the following way:

rule bcftools_merge:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.30.0/bio/bcftools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell(
    "bcftools merge {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}")
BCFTOOLS VIEW

View vcf/bcf file in a different format.

Software dependencies
  • bcftools ==1.5
Example

This wrapper can be used in the following way:

rule bcf_to_vcf:
    input:
        "{prefix}.bcf"
    output:
        "{prefix}.vcf"
    params:
        ""  # optional parameters for bcftools view (except -o)
    wrapper:
        "0.30.0/bio/bcftools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools view {snakemake.params} {snakemake.input[0]} "
    "-o {snakemake.output[0]}")

BOWTIE2

Wrappers

BOWTIE2

Map reads with bowtie2.

Software dependencies
  • bowtie2 ==2.3.2
  • samtools ==1.5
Example

This wrapper can be used in the following way:

rule bowtie2:
    input:
        sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bowtie2/{sample}.log"
    params:
        index="index/genome",  # prefix of reference genome index (built with bowtie2-build)
        extra=""  # optional parameters
    threads: 8
    wrapper:
        "0.30.0/bio/bowtie2/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert n == 1 or n == 2, "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "-U {}".format(*snakemake.input.sample)
else:
    reads = "-1 {} -2 {}".format(*snakemake.input.sample)

shell(
    "(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
    "-x {snakemake.params.index} {reads} "
    "| samtools view -Sbh -o {snakemake.output[0]} -) {log}")

BUSCO

Assess assembly and annotation completeness with BUSCO

Software dependencies

  • python ==3.6
  • busco

Example

This wrapper can be used in the following way:

rule run_busco:
    input:
        "sample_data/target.fa"
    output:
        "txome_busco/full_table_txome_busco.tsv",
    log:
        "logs/quality/transcriptome_busco.log"
    threads: 8
    params:
        mode="transcriptome",
        lineage_path="sample_data/example",
        # optional parameters
        extra=""
    wrapper:
        "0.30.0/bio/busco"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Tessa Pierce

Code

"""Snakemake wrapper for BUSCO assessment"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"

# busco does not allow you to direct output location: handle this by moving output
outdir = path.dirname(snakemake.output[0])
if '/' in outdir:
    out_name = path.basename(outdir)
else:
    out_name = outdir

#note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell("run_busco --in {snakemake.input} --out {out_name} --force "
      " --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
      " {extra} {log}" )

busco_outname = 'run_' + out_name

# move to intended location
shell("cp -r {busco_outname}/* {outdir}")
shell("rm -rf {busco_outname}")

BWA

Wrappers

BWA ALN

Map reads with bwa aln.

Software dependencies
  • bwa ==0.7.15
Example

This wrapper can be used in the following way:

rule bwa_aln:
    input:
        "reads/{sample}.{pair}.fastq"
    output:
        "sai/{sample}.{pair}.sai"
    params:
        index="genome",
        extra=""
    log:
        "logs/bwa_aln/{sample}.{pair}.log"
    threads: 8
    wrapper:
        "0.30.0/bio/bwa/aln"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa aln."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "bwa aln"
    " {extra}"
    " -t {snakemake.threads}"
    " {snakemake.params.index}"
    " {snakemake.input[0]}"
    " > {snakemake.output[0]} {log}")
BWA INDEX

Creates a BWA index.

Software dependencies
  • bwa ==0.7.15
Example

This wrapper can be used in the following way:

rule bwa_index:
    input:
        "{genome}.fasta"
    output:
        "{genome}.amb",
        "{genome}.ann",
        "{genome}.bwt",
        "{genome}.pac",
        "{genome}.sa"
    log:
        "logs/bwa_index/{genome}.log"
    params:
        prefix="{genome}",
        algorithm="bwtsw"
    wrapper:
        "0.30.0/bio/bwa/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

#Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
    raise ValueError("Only one reference genome can be inputed!")

#Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")

if len(prefix) > 0:
    prefix = "-p " + prefix

#Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")

if len(construction_algorithm) != 0:
    construction_algorithm = "-a " + construction_algorithm

shell(
    "bwa index"
    " {prefix}"
    " {construction_algorithm}"
    " {snakemake.input[0]}"
    " {log}")
BWA MEM

Map reads using bwa mem, with optional sorting using samtools or picard.

Software dependencies
  • bwa ==0.7.15
  • samtools ==1.5
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="genome",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="none",             # Can be 'none', 'samtools' or 'picard'.
        sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
        sort_extra=""            # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "0.30.0/bio/bwa/mem"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Julian de Ruiter
Code
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {1, 2}:
    raise ValueError("input must have 1 (single-end) or "
                     "2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
                " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | " + pipe_cmd + ") {log}")
BWA SAMPE

Map paired-end reads with bwa sampe.

Software dependencies
  • bwa ==0.7.15
  • samtools ==1.3
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule bwa_sampe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_sampe/{sample}.log"
    wrapper:
        "0.30.0/bio/bwa/sampe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Check inputs.
if not len(snakemake.input.sai) == 2:
    raise ValueError('input.sai must have 2 elements')

if not len(snakemake.input.fastq) == 2:
    raise ValueError('input.fastq must have 2 elements')

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
                " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa sampe"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}")
BWA SAMSE

Map single-end reads with bwa samse.

Software dependencies
  • bwa ==0.7.15
  • samtools ==1.3
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule bwa_samse:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai"
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_samse/{sample}.log"
    wrapper:
        "0.30.0/bio/bwa/samse"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
                " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa samse"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}")

CUTADAPT

Wrappers

CUTADAPT-PE

Trim paired-end reads using cutadapt.

Software dependencies
  • cutadapt ==1.13
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        fastq1="trimmed/{sample}.1.fastq",
        fastq2="trimmed/{sample}.2.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    wrapper:
        "0.30.0/bio/cutadapt/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params}"
    " -o {snakemake.output.fastq1}"
    " -p {snakemake.output.fastq2}"
    " {snakemake.input}"
    " > {snakemake.output.qc} {log}")
CUTADAPT-SE

Trim single-end reads using cutadapt.

Software dependencies
  • cutadapt ==1.13
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        "reads/{sample}.fastq"
    output:
        fastq="trimmed/{sample}.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    wrapper:
        "0.30.0/bio/cutadapt/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params}"
    " -o {snakemake.output.fastq}"
    " {snakemake.input[0]}"
    " > {snakemake.output.qc} {log}")

DELLY

Call variants with delly.

Software dependencies

  • delly ==0.7.8

Example

This wrapper can be used in the following way:

rule delly:
    input:
        ref="genome.fasta",
        samples=["mapped/a.bam"],
        # optional exclude template (see https://github.com/dellytools/delly)
        exclude="human.hg19.excl.tsv"
    output:
        "sv/calls.bcf"
    params:
        extra=""  # optional parameters for delly (except -g, -x)
    log:
        "logs/delly.log"
    threads: 2  # It is best to use as many threads as samples
    wrapper:
        "0.30.0/bio/delly"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Johannes Köster

Code

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


try:
    exclude = "-x " + snakemake.input.exclude
except AttributeError:
    exclude = ""


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
    "{exclude} -g {snakemake.input.ref} "
    "-o {snakemake.output[0]} {snakemake.input.samples} {log}")

EPIC

Wrappers

EPIC

Find broad enriched domains in ChIP-Seq data with epic

Software dependencies
  • epic =0.2.7
  • pandas =0.22.0
Example

This wrapper can be used in the following way:

rule epic:
    input:
      treatment = "bed/test.bed",
      background = "bed/control.bed"
    output:
      enriched_regions = "epic/enriched_regions.csv", # required
      bed = "epic/enriched_regions.bed", # optional
      matrix = "epic/matrix.gz" # optional
    log:
        "logs/epic/epic.log"
    params:
      genome = "hg19", # optional, default hg19
      extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
    threads: 1 # optional, defaults to 1
    wrapper:
        "0.30.0/bio/epic/peaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • All/any of the different bigwig options must be given as extra parameters
Authors
  • Endre Bakken Stovner
Code
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1

treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")

# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")

bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")

if len(snakemake.log) > 0:
    log = snakemake.log[0]

genome = snakemake.params.get("genome")

cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"

if bed:
    cmd += " -b {bed}"
if matrix:
    cmd += " -sm {matrix}"
if log:
    cmd += " -l {log}"

cmd += " {extra}"

shell(cmd)

FASTQ_SCREEN

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

Software dependencies

  • fastq-screen ==0.5.2
  • bowtie2 ==2.2.6
  • bowtie ==1.1.2

Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq.gz"
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png"
    params:
        fastq_screen_config=fastq_screen_config,
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "0.30.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.
  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.
  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.

Authors

  • Ryan Dale

Code

import os
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"

_config = snakemake.params['fastq_screen_config']

subset = snakemake.params.get('subset', 100000)
aligner = snakemake.params.get('aligner', 'bowtie2')
extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell()

# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
    tmp = tempfile.NamedTemporaryFile(delete=False).name
    with open(tmp, 'w') as fout:
        for label, indexes in _config['database'].items():
            for aligner, index in indexes.items():
                fout.write('\t'.join([
                    'DATABASE', label, index, aligner.upper()]) + '\n')
        for aligner, path in _config['aligner_paths'].items():
            fout.write('\t'.join([aligner.upper(), path]) + '\n')
    config_file = tmp
else:
    config_file = _config

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = os.path.basename(snakemake.input[0].split('.fastq')[0])
tempdir = tempfile.mkdtemp()

shell(
    "fastq_screen --outdir {tempdir} "
    "--force "
    "--aligner {aligner} "
    "--conf {config_file} "
    "--subset {subset} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{snakemake.input[0]} "
    "{log}"
)

# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")

# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
    shell("rm {tmp}")

FASTQC

Generate fastq qc statistics using fastqc.

Software dependencies

  • fastqc ==0.11.7

Example

This wrapper can be used in the following way:

rule fastqc:
    input:
        "reads/{sample}.fastq"
    output:
        html="qc/fastqc/{sample}.html",
        zip="qc/fastqc/{sample}.zip"
    params: ""
    log:
        "logs/fastqc/{sample}.log"
    wrapper:
        "0.30.0/bio/fastqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Julian de Ruiter

Code

"""Snakemake wrapper for fastqc."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)

    split_ind = 2 if base.endswith(".gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell("fastqc {snakemake.params} --quiet "
          "--outdir {tempdir} {snakemake.input[0]}"
          " {log}")

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path} {snakemake.output.html}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path} {snakemake.output.zip}")

FGBIO

Wrappers

FGBIO ANNOTATEBAMWITHUMIS

Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule AnnotateBam:
    input:
        bam="mapped/{sample}.bam",
        umi="umi/{sample}.fastq"
    output:
        "mapped/{sample}.annotated.bam"
    params: ""
    log:
        "logs/fgbio/annotate_bam/{sample}.log"
    wrapper:
        "0.30.0/bio/fgbio/annotatebamwithumis"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input.bam

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
    raise ValueError("Input bam should be a string: " + str(bam_input) + "!")

umi_input = snakemake.input.umi

if umi_input is None:
    raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
    raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")

if not len(snakemake.output) == 1:
    raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]


if output_file is None:
    raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
    raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")

shell("fgbio AnnotateBamWithUmis"
      " -i {bam_input}"
      " -f {umi_input}"
      " -o {output_file}"
      " {extra_params}"
      " {log}")
FGBIO CALLMOLECULARCONSENSUSREADS

Calls consensus sequences from reads with the same unique molecular tag.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule ConsensusReads:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.m3.bam"
    params:
        extra="-M 3"
    log:
        "logs/fgbio/consensus_reads/{sample}.log"
    wrapper:
        "0.30.0/bio/fgbio/callmolecularconsensusreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell("fgbio CallMolecularConsensusReads"
      " -i {bam_input}"
      " -o {output_file}"
      " {extra_params}"
      " {log}")
FGBIO GROUPREADSBYUMI

Groups reads together that appear to have come from the same original molecule.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule GroupReads:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.gu.bam"
    params:
        extra="-s adjacency --edits 1"
    log:
        "logs/fgbio/group_reads/{sample}.log"
    wrapper:
        "0.30.0/bio/fgbio/groupreadsbyumi"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell("fgbio GroupReadsByUmi"
      " -i {bam_input}"
      " -o {output_file}"
      " {extra_params}"
      " {log}")
FGBIO SETMATEINFORMATION

Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule SetMateInfo:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.mi.bam"
    params: ""
    log:
        "logs/fgbio/set_mate_info/{sample}.log"
    wrapper:
        "0.30.0/bio/fgbio/setmateinformation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell("fgbio SetMateInformation"
      " -i {bam_input}"
      " -o {output_file}"
      " {extra_params}"
      " {log}")

FREEBAYES

Call small genomic variants with freebayes.

Software dependencies

  • freebayes ==1.1.0
  • bcftools ==1.5
  • parallel ==20170422

Example

This wrapper can be used in the following way:

rule freebayes:
    input:
        ref="genome.fasta",
        # you can have a list of samples here
        samples="mapped/{sample}.bam"
    output:
        "calls/{sample}.vcf"  # either .vcf or .bcf
    log:
        "logs/freebayes/{sample}.log"
    params:
        extra="",         # optional parameters
        chunksize=100000  # reference genome chunk size for parallelization (default: 100000)
    threads: 2
    wrapper:
        "0.30.0/bio/freebayes"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Johannes Köster

Code

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = snakemake.params.get("extra", "")

pipe = ""
if snakemake.output[0].endswith(".bcf"):
    pipe = "| bcftools view -Ob -"

if snakemake.threads == 1:
    freebayes = "freebayes"
else:
    chunksize = snakemake.params.get("chunksize", 100000)
    freebayes = ("freebayes-parallel <(fasta_generate_regions.py "
                 "{snakemake.input.ref}.fai {chunksize}) "
                 "{snakemake.threads}").format(snakemake=snakemake,
                                               chunksize=chunksize)

shell("({freebayes} {params} -f {snakemake.input.ref}"
      " {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}")

GATK

Wrappers

GATK BASERECALIBRATOR

Run gatk BaseRecalibrator and ApplyBQSR in one step.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule gatk_bqsr:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        bam="recal/{sample}.bam"
    log:
        "logs/gatk/bqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from tempfile import TemporaryDirectory
import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

with TemporaryDirectory() as tmpdir:
    recal_table = os.path.join(tmpdir, "recal_table.grp")
    log = snakemake.log_fmt_shell(stdout=True, stderr=True)
    shell("gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
          "-R {snakemake.input.ref} -I {snakemake.input.bam} "
          "-O {recal_table} --known-sites {snakemake.input.known} {log}")

    log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
    shell("gatk --java-options '{java_opts}' ApplyBQSR -R {snakemake.input.ref} -I {snakemake.input.bam} "
          "--bqsr-recal-file {recal_table} "
          "-O {snakemake.output.bam} {log}")
GATK COMBINEGVCFS

Run gatk CombineGVCFs.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
        ref="genome.fasta"
    output:
        gvcf="calls/all.g.vcf",
    log:
        "logs/gatk/combinegvcfs.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    wrapper:
        "0.30.0/bio/gatk/combinegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' CombineGVCFs {extra} "
      "{gvcfs} "
      "-R {snakemake.input.ref} "
      "-O {snakemake.output.gvcf} {log}")
GATK GENOTYPEGVCFS

Run gatk GenotypeGVCFs.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcf="calls/all.g.vcf",  # combined gvcf over multiple samples
        ref="genome.fasta"
    output:
        vcf="calls/all.vcf",
    log:
        "logs/gatk/genotypegvcfs.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/genotypegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
      "-V {snakemake.input.gvcf} "
      "-R {snakemake.input.ref} "
      "-O {snakemake.output.vcf} {log}")
GATK HAPLOTYPECALLER

Run gatk HaplotypeCaller.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule haplotype_caller:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
        # known="dbsnp.vcf"  # optional
    output:
        gvcf="calls/{sample}.g.vcf",
    log:
        "logs/gatk/haplotypecaller/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell

known = snakemake.input.get("known", "")
if known:
    known = "--dbsnp " + known

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
bams = snakemake.input.bam
if isinstance(bams, str):
    bams = [bams]
bams = list(map("-I {}".format, bams))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
      "-R {snakemake.input.ref} {bams} "
      "-ERC GVCF "
      "-O {snakemake.output.gvcf} {known} {log}")
GATK SELECTVARIANTS

Run gatk SelectVariants.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule gatk_select:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.vcf"
    log:
        "logs/gatk/select/snvs.log"
    params:
        extra="--select-type-to-include SNP",  # optional filter arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/selectvariants"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
      "{extra} -O {snakemake.output.vcf} {log}")
GATK VARIANTFILTRATION

Run gatk VariantFiltration.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

rule gatk_filter:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.filtered.vcf"
    log:
        "logs/gatk/filter/snvs.log"
    params:
        filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
        extra="",  # optional arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/variantfiltration"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
filters = ["--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
           for name, expr in snakemake.params.filters.items()]

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
      "{extra} {filters} -O {snakemake.output.vcf} {log}")
GATK VARIANTRECALIBRATOR

Run gatk VariantRecalibrator.

Software dependencies
  • gatk4 ==4.0.5.1
Example

This wrapper can be used in the following way:

from snakemake.remote import GS

# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()


def gatk_bundle(f):
    return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))


rule haplotype_caller:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
        # resources have to be given as named input files
        hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
        omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
        g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
        dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
        # use aux to e.g. download other necessary file
        aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
             gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
    output:
        vcf="calls/all.recal.vcf",
        tranches="calls/all.tranches"
    log:
        "logs/gatk/variantrecalibrator.log"
    params:
        mode="SNP",  # set mode, must be either SNP, INDEL or BOTH
        # resource parameter definition. Key must match named input files from above.
        resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
                   "omni":   {"known": False, "training": True, "truth": False, "prior": 12.0},
                   "g1k":   {"known": False, "training": True, "truth": False, "prior": 10.0},
                   "dbsnp":  {"known": True, "training": False, "truth": False, "prior": 2.0}},
        annotation=["QD", "FisherStrand"],  # which fields to use with -an (see VariantRecalibrator docs)
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.30.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

def fmt_res(resname, resparams):
    fmt_bool = lambda b: str(b).lower()
    try:
        f = snakemake.input.get(resname)
    except KeyError:
        raise RuntimeError("There must be a named input file for every resource (missing: {})".format(resname))
    return "{},known={},training={},truth={},prior={}:{}".format(
           resname, fmt_bool(resparams["known"]), fmt_bool(resparams["training"]),
           fmt_bool(resparams["truth"]), resparams["prior"], f)

resources = ["--resource {}".format(fmt_res(resname, resparams))
             for resname, resparams in snakemake.params["resources"].items()]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
    tranches = "--tranches-file " + snakemake.output.tranches

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' VariantRecalibrator {extra} {resources} "
      "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
      "-mode {snakemake.params.mode} "
      "--output {snakemake.output.vcf} "
      "{tranches} {annotation} {log}")

HISAT2

Map reads with hisat2.

Software dependencies

  • hisat2 ==2.1.0
  • samtools ==1.5

Example

This wrapper can be used in the following way:

rule hisat2:
    input:
      reads=["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
    output:
      "mapped/{sample}.bam"
    log:                                # optional
      "logs/hisat2/{sample}.log"
    params:                             # idx is required, extra is optional
      idx="genome.fa",
      extra="--min-intronlen 1000"
    threads: 8                          # optional, defaults to 1
    wrapper:
      "0.30.0/bio/hisat2"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The -S flag must not be used since output is already directly piped to samtools for compression.
  • The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
  • The wrapper does not yet handle SRA input accessions.
  • No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.

Authors

  • Wibowo Arindrarto

Code

__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()

# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
    input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
    input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
    input_flags = "-1 {0} -2 {1}".format(*reads)
else:
    raise RuntimeError(
        "Reads parameter must contain at least 1 and at most 2"
        " input files.")

# Executed shell command
shell(
    "(hisat2 {extra} --threads {snakemake.threads}"
    " -x {snakemake.params.idx} {input_flags}"
    " | samtools view -Sbh -o {snakemake.output[0]} -)"
    " {log}")

JANNOVAR

Annotate predicted effect of nucleotide changes with Jannovar

Software dependencies

  • jannovar-cli ==0.25

Example

This wrapper can be used in the following way:

rule jannovar:
    input:
        vcf="{sample}.vcf",
        pedigree="pedigree_ar.ped" # optional, contains familial relationships
    output:
        "jannovar/{sample}.vcf.gz"
    log:
        "logs/jannovar/{sample}.log"
    params:
        database="hg19_small.ser", # path to jannovar reference dataset
        extra="--show-all"         # optional parameters
    wrapper:
        "0.30.0/bio/jannovar"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Bradford Powell

Code

__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

pedigree = snakemake.input.get("pedigree", "")
if pedigree:
      pedigree = '--pedigree-file "%s"'%pedigree

shell("jannovar annotate-vcf --database {snakemake.params.database}"
      " --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
      " {pedigree} {extra} {log}")

LOFREQ

Wrappers

LOFREQ CALL

simply call variants

Software dependencies
  • samtools ==1.6
  • lofreq ==2.1.3.1
Example

This wrapper can be used in the following way:

rule lofreq:
    input:
        "data/{sample}.bam"
    output:
        "calls/{sample}.vcf"
    log:
        "logs/lofreq_call/{sample}.log"
    params:
        ref="data/genome.fasta",
        extra=""
    threads: 8
    wrapper:
        "0.30.0/bio/lofreq/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)

if ref is None:
    raise ValueError("A reference must be provided")

bam_input = snakemake.input[0]

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
    raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

shell(
    "lofreq call-parallel "
    " --pp-threads {snakemake.threads}"
    " -f {ref}"
    " {bam_input}"
    " -o {output_file}"
    " {extra}"
    " {log}")

MINIMAP2

Wrappers

MINIMAP2

A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2

Software dependencies
  • minimap2 ==2.5
Example

This wrapper can be used in the following way:

rule minimap2:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.paf"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb"  # optional
    threads: 3
    wrapper:
        "0.30.0/bio/minimap2/aligner"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

inputQuery = " ".join(snakemake.input.query)

shell("(minimap2 -t {snakemake.threads} {extra} "
      "{snakemake.input.target} {inputQuery} >"
      "{snakemake.output[0]}) {log}")
MINIMAP2 INDEX

creates a minimap2 index

Software dependencies
  • minimap2 ==2.5
Example

This wrapper can be used in the following way:

rule minimap2_index:
    input:
        target="target/{input1}.fasta"
    output:
        "{input1}.mmi"
    log:
        "logs/minimap2_index/{input1}.log"
    params:
        extra=""  # optional additional args
    threads: 3
    wrapper:
        "0.30.0/bio/minimap2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("(minimap2 -t {snakemake.threads} {extra} "
      "-d {snakemake.output[0]} {snakemake.input.target}) {log}")

MULTIQC

Generate qc report using multiqc.

Software dependencies

  • multiqc ==1.2
  • networkx <2.0

Example

This wrapper can be used in the following way:

rule multiqc:
    input:
        expand("samtools_stats/{sample}.txt", sample=["a", "b"])
    output:
        "qc/multiqc.html"
    params:
        ""  # Optional: extra parameters for multiqc.
    log:
        "logs/multiqc.log"
    wrapper:
        "0.30.0/bio/multiqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Julian de Ruiter

Code

"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {snakemake.params}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_dirs}"
    " {log}")

NGS-DISAMBIGUATE

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.

Software dependencies

  • ngs-disambiguate ==2016.11.10
  • bamtools ==2.4.0

Example

This wrapper can be used in the following way:

rule disambiguate:
    input:
        a="mapped/{sample}.a.bam",
        b="mapped/{sample}.b.bam"
    output:
        a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
        b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
        a_disambiguated='disambiguate/{sample}.graft.bam',
        b_disambiguated='disambiguate/{sample}.host.bam',
        summary='qc/disambiguate/{sample}.txt'
    params:
        algorithm="bwa",
        # optional: Prefix to use for output. If omitted, a
        # suitable value is guessed from the output paths. Prefix
        # is used for the intermediate output paths, as well as
        # sample name in summary file.
        prefix="{sample}",
        extra=""
    wrapper:
        "0.30.0/bio/ngs-disambiguate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Julian de Ruiter

Code

"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")

output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
    prefix = path.splitext(path.basename(snakemake.output.summary))[0]

# Run command.
shell(
    "ngs_disambiguate"
    " {extra}"
    " -o {output_dir}"
    " -s {prefix}"
    " -a {snakemake.params.algorithm}"
    " {snakemake.input.a}"
    " {snakemake.input.b}")

# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)

output_map = {
    output_base + ".ambiguousSpeciesA.bam":
        snakemake.output.a_ambiguous,
    output_base + ".ambiguousSpeciesB.bam":
        snakemake.output.b_ambiguous,
    output_base + ".disambiguatedSpeciesA.bam":
        snakemake.output.a_disambiguated,
    output_base + ".disambiguatedSpeciesB.bam":
        snakemake.output.b_disambiguated,
    output_base + "_summary.txt":
        snakemake.output.summary
}

for src, dest in output_map.items():
    if src != dest:
        shell('mv {src} {dest}')

PICARD

Wrappers

PICARD ADDORREPLACEREADGROUPS

Add or replace read groups with picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule replace_rg:
    input:
        "mapped/{sample}.bam"
    output:
        "fixed-rg/{sample}.bam"
    log:
        "logs/picard/replace_rg/{sample}.log"
    params:
        "RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
    wrapper:
        "0.30.0/bio/picard/addorreplacereadgroups"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
      "O={snakemake.output} &> {snakemake.log}")
PICARD COLLECTALIGNMENTSUMMARYMETRICS

Collect metrics on aligned reads with picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam"
    output:
        "stats/{sample}.summary.txt"
    log:
        "logs/picard/alignment-summary/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.30.0/bio/picard/collectalignmentsummarymetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell("picard CollectAlignmentSummaryMetrics {snakemake.params} "
      "INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
      "REFERENCE_SEQUENCE={snakemake.input.ref} {log}")
PICARD COLLECTHSMETRICS

Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule picard_collect_hs_metrics:
    input:
        bam="mapped/{sample}.bam",
        reference="genome.fasta",
        # Baits and targets should be given as interval lists. These can
        # be generated from bed files using picard BedToIntervalList.
        bait_intervals="regions.intervals",
        target_intervals="regions.intervals"
    output:
        "stats/hs_metrics/{sample}.txt"
    params:
        # Optional extra arguments. Here we reduce sample size
        # to reduce the runtime in our unit test.
        "SAMPLE_SIZE=1000"
    log:
        "logs/picard_collect_hs_metrics/{sample}.log"
    wrapper:
        "0.30.0/bio/picard/collecthsmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard CollectHSMetrics."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard CollectHsMetrics"
    " {extra}"
    " INPUT={snakemake.input.bam}"
    " OUTPUT={snakemake.output[0]}"
    " REFERENCE_SEQUENCE={snakemake.input.reference}"
    " BAIT_INTERVALS={snakemake.input.bait_intervals}"
    " TARGET_INTERVALS={snakemake.input.target_intervals}"
    " {log}")
PICARD COLLECTINSERTSIZEMETRICS

Collect metrics on insert size of paired end reads with picard tools.

Software dependencies
  • picard ==2.9.2
  • r-base ==3.3.2
Example

This wrapper can be used in the following way:

rule insert_size:
    input:
        "mapped/{sample}.bam"
    output:
        txt="stats/{sample}.isize.txt",
        pdf="stats/{sample}.isize.pdf"
    log:
        "logs/picard/insert_size/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.30.0/bio/picard/collectinsertsizemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell("picard CollectInsertSizeMetrics {snakemake.params} "
      "INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
      "HISTOGRAM_FILE={snakemake.output.pdf} {log}")
PICARD CREATESEQUENCEDICTIONARY

Create a .dict file for a given FASTA file

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule create_dict:
    input:
        "genome.fasta"
    output:
        "genome.dict"
    log:
        "logs/picard/create_dict.log"
    params:
        extra=""  # optional: extra arguments for picard.
    wrapper:
        "0.30.0/bio/picard/createsequencedictionary"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    'picard '
    'CreateSequenceDictionary '
    '{extra} '
    'R={snakemake.input[0]} '
    'O={snakemake.output[0]} '
    '{log}')
PICARD MARKDUPLICATES

Mark PCR and optical duplicates with picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/picard/dedup/{sample}.log"
    params:
        "REMOVE_DUPLICATES=true"
    wrapper:
        "0.30.0/bio/picard/markduplicates"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
      "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
      "&> {snakemake.log}")
PICARD MERGESAMFILES

Merge sam/bam files using picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule merge_bams:
    input:
        expand("mapped/{sample}.bam", sample=["a", "b"])
    output:
        "merged.bam"
    log:
        "logs/picard_mergesamfiles.log"
    params:
        "VALIDATION_STRINGENCY=LENIENT"
    wrapper:
        "0.30.0/bio/picard/mergesamfiles"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " MergeSamFiles"
    " {snakemake.params}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}")
PICARD MERGEVCFS

Merge vcf files using picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule merge_vcfs:
    input:
        ["snvs.chr1.vcf", "snvs.chr2.vcf"]
    output:
        "snvs.vcf"
    log:
        "logs/picard/mergevcfs.log"
    params:
        extra=""
    wrapper:
        "0.30.0/bio/picard/mergevcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(f) for f in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "picard"
    " MergeVcfs"
    " {extra}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}")
PICARD REVERTSAM

Reverts SAM or BAM files to a previous state. .

Software dependencies
  • picard ==2.18.16
Example

This wrapper can be used in the following way:

rule revert_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "revert/{sample}.bam"
    log:
        "logs/picard/revert_sam/{sample}.log"
    params:
        extra="SANITIZE=true" # optional: Extra arguments for picard.
    wrapper:
        "0.30.0/bio/picard/revertsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard RevertSam."""

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    'picard'
    ' RevertSam'
    ' {extra}'
    ' INPUT={snakemake.input[0]}'
    ' OUTPUT={snakemake.output[0]}'
    ' {log}')
PICARD SOMTOFASTQ

Converts a SAM or BAM file to FASTQ.

Software dependencies
  • picard ==2.18.16
Example

This wrapper can be used in the following way:

rule bam_to_fastq:
    input:
        "mapped/{sample}.bam"
    output:
        fastq1="reads/{sample}.R1.fastq",
        fastq2="reads/{sample}.R2.fastq"
    log:
        "logs/picard/sam_to_fastq/{sample}.log"
    params:
        extra="" # optional: Extra arguments for picard.
    wrapper:
        "0.30.0/bio/picard/samtofastq"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)

if not isinstance(fastq1, str):
    raise ValueError("f1 needs to be provided")

output = " FASTQ=" + fastq1

if isinstance(fastq2, str):
     output += " SECOND_END_FASTQ=" + fastq2

if isinstance(fastq_unpaired, str):
    if not isinstance(fastq2, str):
        raise ValueError("f2 is required if fastq_unpaired is set")
    else:
        output += " UNPAIRED_FASTQ=" + fastq_unpaired

shell(
    'picard'
    ' SamToFastq'
    ' {extra}'
    ' INPUT={snakemake.input[0]}'
    ' {output}'
    ' {log}')
PICARD SORTSAM

Sort sam/bam files using picard tools.

Software dependencies
  • picard ==2.9.2
Example

This wrapper can be used in the following way:

rule sort_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "sorted/{sample}.bam"
    log:
        "logs/picard/sort_sam/{sample}.log"
    params:
        sort_order="coordinate",
        extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
    wrapper:
        "0.30.0/bio/picard/sortsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    'picard'
    ' SortSam'
    ' {extra}'
    ' INPUT={snakemake.input[0]}'
    ' OUTPUT={snakemake.output[0]}'
    ' SORT_ORDER={snakemake.params.sort_order}'
    ' {log}')

PINDEL

Wrappers

PINDEL

Call variants with pindel.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]


rule pindel:
    input:
        ref="genome.fasta",
        # samples to call
        samples=["mapped/a.bam"],
        # bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
        config="pindel_config.txt"
    output:
        expand("pindel/all_{type}", type=pindel_types)
    params:
        # prefix must be consistent with output files
        prefix="pindel/all",
        extra=""  # optional parameters (except -i, -f, -o)
    log:
        "logs/pindel.log"
    threads: 4
    wrapper:
        "0.30.0/bio/pindel/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
      "-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}")
PINDEL2VCF

Convert pindel output to vcf.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

rule pindel2vcf:
    input:
        ref="genome.fasta",
        pindel="pindel/all_{type}"
    output:
        "pindel/all_{type}.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.{type}.log"
    wrapper:
        "0.30.0/bio/pindel/pindel2vcf"

rule pindel2vcf_multi_input:
    input:
        ref="genome.fasta",
        pindel=["pindel/all_D", "pindel/all_INV"]
    output:
        "pindel/all.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.log"
    wrapper:
        "0.30.0/bio/pindel/pindel2vcf"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

expected_endings = ['INT', 'D', 'SI', 'INV', 'INV_final' 'TD', 'LI', 'BP', 'CloseEndMapped','RP']

def split_file_name(file_parts, file_ending_index):
    return "_".join(file_parts[:file_ending_index]), "_".join(file_parts[file_ending_index])

def process_input_path(input_file):
    """
        :params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
        :return: ""/path/to/file", "all"

    """
    file_path, file_name = os.path.split(input_file)
    file_parts = file_name.split("_")
    #seperate ending and name, to name: all ending: D or name: all ending: INV_final
    file_name, file_ending = split_file_name(file_parts, -2 if file_name.endswith("_final") else -1)
    if not file_ending in expected_endings:
        raise Exception("Unexpected variant type: " + file_ending)
    return  file_path, file_name

with tempfile.TemporaryDirectory() as tmpdirname:
    input_flag = "-p"
    input_file = snakemake.input.get("pindel")
    if  isinstance(input_file, list) and len(input_file) > 1:
        input_flag = "-P"
        input_path, input_name = process_input_path(input_file[0])
        input_file = os.path.join(input_path,input_name)
        for variant_input in snakemake.input.pindel:
            if not variant_input.startswith(input_file):
                raise Exception("Unable to extract common path from multi file input, expect path is: " + input_file)
            if not os.path.isfile(variant_input):
                raise Exception("Input \"" + input_file + "\" is not a file!")
            os.symlink(os.path.abspath(variant_input),os.path.join(tmpdirname, os.path.basename(variant_input)))
        input_file = os.path.join(tmpdirname,input_name)
    shell("pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}")

RUBIC

RUBIC detects recurrent copy number alterations using copy number breaks.

Software dependencies

  • r-base =3.4.1
  • r-rubic =1.0.3
  • r-data.table =1.10.4
  • r-pracma =2.0.4
  • r-ggplot2 =2.2.1
  • r-gtable =0.2.0
  • r-codetools =0.2_15
  • r-digest =0.6.12

Example

This wrapper can be used in the following way:

rule rubic:
    input:
        seg="{samples}/segments.txt",
        markers="{samples}/markers.txt"
    output:
        out_gains="{samples}/gains.txt",
        out_losses="{samples}/losses.txt",
        out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
    params:
        fdr="",
        genefile=""
    wrapper:
        "0.30.0/bio/rubic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Beatrice F. Tan

Code

# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"

library(RUBIC)

all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]

rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])

SALMON

Wrappers

SALMON_INDEX

Index a transcriptome assembly with salmon

Software dependencies
  • salmon ==0.10.1
Example

This wrapper can be used in the following way:

rule salmon_index:
    input:
        "assembly/transcriptome.fasta"
    output:
        directory("salmon/transcriptome_index")
    log:
        "logs/salmon/transcriptome_index.log"
    threads: 2
    params:
        # optional parameters
        extra=""
    wrapper:
        "0.30.0/bio/salmon/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Index."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("salmon index -t {snakemake.input} -i {snakemake.output} "
      " --threads {snakemake.threads} {extra} {log}" )
SALMON_QUANT

Quantify transcripts with salmon

Software dependencies
  • salmon ==0.10.0
Example

This wrapper can be used in the following way:

rule salmon_quant_reads:
    input:
        # If you have multiple fastq files for a single sample (e.g. technical replicates)
        # use a list for r1 and r2.
        r1 = "reads/{sample}_1.fq.gz",
        r2 = "reads/{sample}_2.fq.gz",
        index = "salmon/transcriptome_index"
    output:
        quant = 'salmon/{sample}/quant.sf',
        lib = 'salmon/{sample}/lib_format_counts.json'
    log:
        'logs/salmon/{sample}.log'
    params:
        # optional parameters
        libtype ="A",
        #zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
        extra=""
    threads: 2
    wrapper:
        "0.30.0/bio/salmon/quant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Quant"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

def manual_decompression (reads, zip_ext):
    """ Allow *.bz2 input into salmon. Also provide same
    decompression for *gz files, as salmon devs mention
    it may be faster in some cases."""
    if zip_ext and reads:
        if zip_ext == 'bz2':
            reads = ' < (bunzip2 -c ' + reads + ')'
        elif zip_ext == 'gz':
            reads = ' < (gunzip -c ' + reads + ')'
    return reads

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")

r1 = snakemake.input.get("r1")
r2 =  snakemake.input.get("r2")
r = snakemake.input.get("r")

assert (r1 is not None and r2 is not None) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
    r1 = [snakemake.input.r1] if isinstance(snakemake.input.r1, str) else snakemake.input.r1
    r2 = [snakemake.input.r2] if isinstance(snakemake.input.r2, str) else snakemake.input.r2
    assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
    r1_cmd = ' -1 ' + manual_decompression(" ".join(r1), zip_extension)
    r2_cmd = ' -2 ' + manual_decompression(" ".join(r2), zip_extension)
    read_cmd = " ".join([r1_cmd,r2_cmd])
if r:
    assert r1 is None and r2 is None, "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
    r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
    read_cmd = ' -r ' + manual_decompression(" ".join(r), zip_extension)

outdir = path.dirname(snakemake.output.get('quant'))

shell("salmon quant -i {snakemake.input.index} "
      " -l {libtype} {read_cmd} -o {outdir} "
      " -p {snakemake.threads} {extra} {log} ")

SAMBAMBA

Wrappers

SAMBAMBA SORT

Sort bam file with sambamba

Software dependencies
  • sambamba ==0.6.6
Example

This wrapper can be used in the following way:

rule sambamba_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        ""  # optional parameters
    threads: 8
    wrapper:
        "0.30.0/bio/sambamba/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

shell(
    "sambamba sort {snakemake.params} -t {snakemake.threads} "
    "-o {snakemake.output[0]} {snakemake.input[0]}")

SAMTOOLS

Wrappers

SAMTOOLS FLAGSTAT

Use samtools to create a flagstat file from a bam or sam file.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_flagstat:
    input: "mapped/{sample}.bam"
    output: "mapped/{sample}.bam.flagstat"
    wrapper:
        "0.30.0/bio/samtools/flagstat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Preusch
Code
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS INDEX

Index bam file with samtools.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_index:
    input: "mapped/{sample}.sorted.bam"
    output: "mapped/{sample}.sorted.bam.bai"
    params:
        "" # optional params string
    wrapper:
        "0.30.0/bio/samtools/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE

Merge two bam files with samtools.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_merge:
    input:
        ["mapped/A.bam", "mapped/B.bam"]
    output:
        "merged.bam"
    params:
        "" # optional additional parameters as string
    threads: 8
    wrapper:
        "0.30.0/bio/samtools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools merge --threads {snakemake.threads} {snakemake.params} "
      "{snakemake.output[0]} {snakemake.input}")
SAMTOOLS SORT

Sort bam file with samtools.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "0.30.0/bio/samtools/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

shell(
    "samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} "
    "-T {prefix} {snakemake.input[0]}")
SAMTOOLS STATS

Generate stats using samtools.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_stats:
    input:
        "mapped/{sample}.bam"
    output:
        "samtools_stats/{sample}.txt"
    params:
        extra="",                       # Optional: extra arguments.
        region="1:1000000-2000000"      # Optional: region string.
    log:
        "logs/samtools_stats/{sample}.log"
    wrapper:
        "0.30.0/bio/samtools/stats"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell("samtools stats {extra} {snakemake.input}"
      " {region} > {snakemake.output} {log}")
SAMTOOLS VIEW

Convert or filter SAM/BAM.

Software dependencies
  • samtools ==1.6
Example

This wrapper can be used in the following way:

rule samtools_view:
    input:
        "{sample}.sam"
    output:
        "{sample}.bam"
    params:
        "-b" # optional params string
    wrapper:
        "0.30.0/bio/samtools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")

SICKLE

Wrappers

SICKLE PE

Trim paired-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    r1="input_R1.fq",
    r2="input_R2.fq"
  output:
    r1="output_R1.fq",
    r2="output_R2.fq",
    rs="output_single.fq",
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    # optional log file
    "logs/sickle/job.log"
  wrapper:
    "0.30.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
    "-o {snakemake.output.r1} -p {snakemake.output.r2} "
    "-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
    "{extra}) {log}"
)
SICKLE SE

Trim single-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    "input_R1.fq"
  output:
    "output_R1.fq"
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    "logs/sickle/job.log"
  wrapper:
    "0.30.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
    "-t {snakemake.params.qual_type} {extra}) {log}"
)

SNPEFF

Annotate predicted effect of nucleotide changes with SnpEff

Software dependencies

  • snpeff ==4.3.1t

Example

This wrapper can be used in the following way:

rule snpeff:
    input:
        "{sample}.vcf",
    output:
        vcf="snpeff/{sample}.vcf",    # the main output file, required
        stats="snpeff/{sample}.html", # summary statistics (in HTML), optional
        csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
    log:
        "logs/snpeff/{sample}.log"
    params:
        reference="ebola_zaire", # reference name (from `snpeff databases`)
        extra="-Xmx4g"                 # optional parameters (e.g., max memory 4g)
    wrapper:
        "0.30.0/bio/snpeff"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Bradford Powell

Code

__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell
from os import path
import shutil
import tempfile

shell.executable("bash")
shell_command =("(snpEff {data_dir} {stats_opt} {csvstats_opt} {extra}"
                " {snakemake.params.reference} {snakemake.input}"
                " > {snakemake.output.vcf}) {log}")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

data_dir = snakemake.params.get("data_dir", "")
if data_dir:
    data_dir = '-dataDir "%s"'%data_dir

stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = '' if not csvstats else '-csvStats {}'.format(csvstats)
stats_opt = '-noStats' if not stats  else '-stats {}'.format(stats)

shell(shell_command)
    #if stats:
    #    shutil.copy(path.join(stats_tempdir, 'stats'), stats)
    #if genes:
    #    shutil.copy(path.join(stats_tempdir, 'stats.genes.txt'), genes)

SOURMASH

Wrappers

SOURMASH_COMPUTE

Build a MinHash signature for a transcriptome, genome, or reads

Software dependencies
  • sourmash==2.0.0a7
Example

This wrapper can be used in the following way:

rule sourmash_reads:
    input:
        "reads/a.fastq"
    output:
        "reads.sig"
    log:
        "logs/sourmash/sourmash_compute_reads.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.30.0/bio/sourmash/compute"


rule sourmash_transcriptome:
    input:
        "assembly/transcriptome.fasta"
    output:
        "transcriptome.sig"
    log:
        "logs/sourmash/sourmash_compute_transcriptome.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.30.0/bio/sourmash/compute"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Lisa K. Johnson
Code
"""Snakemake wrapper for sourmash compute."""

__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled","1000")
k = snakemake.params.get("k","31")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
      " {extra} {log}" )

STAR

Wrappers

STAR

Map reads with STAR.

Software dependencies
  • star ==2.5.3a
Example

This wrapper can be used in the following way:

rule star_pe_multi:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
        # paired end reads needs to be ordered so each item in the two lists match
        fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
    output:
        # see STAR manual for additional output files
        "star/pe/{sample}/Aligned.out.bam"
    log:
        "logs/star/pe/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.30.0/bio/star/align"

rule star_se:
    input:
        fq1 = "reads/{sample}_R1.1.fastq"
    output:
        # see STAR manual for additional output files
        "star/{sample}/Aligned.out.bam"
    log:
        "logs/star/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.30.0/bio/star/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = [snakemake.input.fq1] if isinstance(snakemake.input.fq1, str) else snakemake.input.fq1
fq2 =  snakemake.input.get("fq2")
if fq2:
    fq2 = [snakemake.input.fq2] if isinstance(snakemake.input.fq2, str) else snakemake.input.fq2
    assert len(fq1) == len(fq2), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str =  " ".join([input_str_fq1, input_str_fq2])

if fq1[0].endswith(".gz"):
    readcmd = "--readFilesCommand zcat"
else:
    readcmd = ""

outprefix = os.path.dirname(snakemake.output[0]) + "/"

shell(
    "STAR "
    "{extra} "
    "--runThreadN {snakemake.threads} "
    "--genomeDir {snakemake.params.index} "
    "--readFilesIn {input_str} "
    "{readcmd} "
    "--outSAMtype BAM Unsorted "
    "--outFileNamePrefix {outprefix} "
    "--outStd Log "
    "{log}")

TRIM_GALORE

Wrappers

TRIM_GALORE-PE

Trim paired-end reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.5
Example

This wrapper can be used in the following way:

rule trim_galore_pe:
    input:
        ["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
    output:
        "trimmed/{sample}.1_val_1.fq.gz",
         "trimmed/{sample}.1.fastq.gz_trimming_report.txt",
         "trimmed/{sample}.2_val_2.fq.gz",
         "trimmed/{sample}.2.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.30.0/bio/trim_galore/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError("The trim_galore Snakemake wrapper cannot "
                       "be run with the `--fastqc` flag. Please "
                       "remove the flag from extra params. "
                       "You can use the fastqc Snakemake wrapper on "
                       "the input and output files instead.")

# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), \
        "trim_galore can only output files to a single directory." \
        " Please indicate only one directory for the output files."

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " --paired"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}")
TRIM_GALORE-SE

Trim unpaired reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.3
Example

This wrapper can be used in the following way:

rule trim_galore_se:
    input:
        "reads/{sample}.fastq.gz"
    output:
        "trimmed/{sample}_trimmed.fq.gz",
         "trimmed/{sample}.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.30.0/bio/trim_galore/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError("The trim_galore Snakemake wrapper cannot "
                       "be run with the `--fastqc` flag. Please "
                       "remove the flag from extra params. "
                       "You can use the fastqc Snakemake wrapper on "
                       "the input and output files instead.")

# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), \
        "trim_galore can only output files to a single directory." \
        " Please indicate only one directory for the output files."

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}")

TRIMMOMATIC

Wrappers

TRIMMOMATIC PE

Trim paired-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic_pe:
    input:
        r1="reads/{sample}.1.fastq.gz",
        r2="reads/{sample}.2.fastq.gz"
    output:
        r1="trimmed/{sample}.1.fastq.gz",
        r2="trimmed/{sample}.2.fastq.gz",
        # reads where trimming entirely removed the mate
        r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
        r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        compression_level="-9"
    wrapper:
        "0.30.0/bio/trimmomatic/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/pe

Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


def compose_input_gz(filename):
    if filename.endswith(".gz"):
        filename = "<(pigz --decompress --stdout {filename})".format(
            filename=filename
        )
    return filename


def compose_output_gz(filename, compression_level="-5"):
    if filename.endswith(".gz"):
        return ">(pigz {compression_level} > {filename})".format(
            compression_level=compression_level,
            filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)


# Collect files
input_r1 = compose_input_gz(snakemake.input.r1)
input_r2 = compose_input_gz(snakemake.input.r2)

output_r1 = compose_output_gz(snakemake.output.r1, compression_level)
output_r1_unp = compose_output_gz(snakemake.output.r1_unpaired, compression_level)
output_r2 = compose_output_gz(snakemake.output.r2, compression_level)
output_r2_unp = compose_output_gz(snakemake.output.r2_unpaired, compression_level)

shell(
    "trimmomatic PE {extra} "
    "{input_r1} {input_r2} "
    "{output_r1} {output_r1_unp} "
    "{output_r2} {output_r2_unp} "
    "{trimmer} "
    "{log}"
)
TRIMMOMATIC SE

Trim single-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic:
    input:
        "reads/{sample}.fastq.gz"  # input and output can be uncompressed or compressed
    output:
        "trimmed/{sample}.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        # optional compression levels from -0 to -9 and -11
        compression_level="-9"
    threads:
        32
    wrapper:
        "0.30.0/bio/trimmomatic/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/se

Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


def compose_input_gz(filename):
    if filename.endswith(".gz"):
        filename = "<(pigz --decompress --stdout {filename})".format(
            filename=filename
        )
    return filename


def compose_output_gz(filename, compression_level="-5"):
    if filename.endswith(".gz"):
        return ">(pigz {compression_level} > {filename})".format(
            compression_level=compression_level,
            filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Collect files
input = compose_input_gz(snakemake.input[0])
output = compose_output_gz(snakemake.output[0], compression_level)

shell("trimmomatic SE {extra} {input} {output} {trimmer} {log}")

TRINITY

Generate transcriptome assembly with Trinity

Software dependencies

  • trinity ==2.6.6

Example

This wrapper can be used in the following way:

rule trinity:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "trinity_out_dir/Trinity.fasta"
    log:
        'logs/trinity/trinity.log'
    params:
        extra=""
    threads: 4
    wrapper:
        "0.30.0/bio/trinity"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • Tessa Pierce

Code

"""Snakemake wrapper for Trinity."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
max_memory =  snakemake.params.get("max_memory", "10G")

#allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = [snakemake.input.left] if isinstance(snakemake.input.left, str) else snakemake.input.left
right =  snakemake.input.get("right")
if right:
    right = [snakemake.input.right] if isinstance(snakemake.input.right, str) else snakemake.input.right
    assert len(left) >= len(right), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
    input_str_left = ' --left ' + ",".join(left)
    input_str_right = ' --right ' + ",".join(right)
else:
    input_str_left = ' --single ' + ",".join(left)
    input_str_right = ''

input_cmd =  " ".join([input_str_left, input_str_right])

# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
    if 'fq' in left[0] or 'fastq' in left[0]:
        seqtype = 'fq'
    elif 'fa' in left[0] or 'fasta' in left[0]:
        seqtype = 'fa'
    else: # assertion is redundant - warning or error instead?
        assert seqtype is not None, "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"

outdir = path.dirname(snakemake.output[0])
assert 'trinity' in outdir, "output directory name must contain 'trinity'"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("Trinity {input_cmd} --CPU {snakemake.threads} "
      " --max_memory {max_memory} --seqType {seqtype} "
      " --output {outdir} {snakemake.params.extra} "
      " {log}")

VCF

Wrappers

COMPRESS VCF

Compress and index vcf file with bgzip and tabix.

Software dependencies
  • htslib ==1.5
Example

This wrapper can be used in the following way:

rule compress_vcf:
    input:
        "{prefix}.vcf"
    output:
        "{prefix}.vcf.gz"
    wrapper:
        "0.30.0/bio/vcf/compress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("bgzip --stdout {snakemake.input} > {snakemake.output} && tabix -p vcf {snakemake.output}")
UNCOMPRESS VCF

Uncompress vcf file with bgzip.

Software dependencies
  • htslib ==1.5
Example

This wrapper can be used in the following way:

rule uncompress_vcf:
    input:
        "{prefix}.vcf.gz"
    output:
        "{prefix}.vcf"
    wrapper:
        "0.30.0/bio/vcf/uncompress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("bgzip --decompress --stdout {snakemake.input} > {snakemake.output}")

VCFTOOLS

Wrappers

VCFTOOLS FILTER

Filter vcf files using vcftools

Software dependencies
  • vcftools ==0.1.15
Example

This wrapper can be used in the following way:

rule filter_vcf:
    input:
        "{sample}.vcf"
    output:
        "{sample}.filtered.vcf"
    params:
        extra="--chr 1 --recode-INFO-all"
    wrapper:
        "0.30.0/bio/vcftools/filter"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
    input_flag = "--gzvcf"

output = " > " + snakemake.output[0]
if output.endswith(".gz"):
    output = " | gzip" + output

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell("vcftools "
      "{input_flag} "
      "{snakemake.input} "
      "{extra} "
      "--recode "
      "--stdout "
      "{output} "
       "{log}")