The Snakemake Wrappers repository

https://img.shields.io/badge/snakemake-≥5.7.0-brightgreen.svg?style=flat-square https://github.com/snakemake/snakemake-wrappers/workflows/CI/badge.svg?branch=master

The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.

Usage

The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "0.2.0/bio/samtools/sort"

Here, Snakemake will automatically download the corresponding wrapper from https://github.com/snakemake/snakemake-wrappers/tree/0.2.0/bio/samtools/sort. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id. This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://.

Each wrapper defines required software packages and versions. In combination with the --use-conda flag of Snakemake, these will be deployed automatically.

Contribute

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:

  1. Fork the repository: https://github.com/snakemake/snakemake-wrappers
  2. Clone your fork locally.
  3. Locally, create a new branch: git checkout -b my-new-snakemake-wrapper
  4. Commit your contributions to that branch and push them to your fork: git push -u origin my-new-snakemake-wrapper
  5. Create a pull request.

The pull request will be reviewed and included as fast as possible. Contributions should follow the coding style of the already present examples, i.e.:

  • provide a meta.yaml with name, description and author(s) of the wrapper
  • provide an environment.yaml which lists all required software packages (the packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge. Other sustainable community maintained channels are possible as well.)
  • provide a minimal test case in a subfolder called test, with an example Snakefile that shows how to use the wrapper, some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test in test.py
  • follow the python style guide, using 4 spaces for indentation.

Testing locally

If you want to debug your contribution locally, before creating a pull request, we recommend adding your test case to the start of the list in test.py, so that it runs first. Then, install miniconda with the channels as described for bioconda and set up an environment with the necessary dependencies and activate it:

conda create -n test-snakemake-wrappers snakemake pytest conda
conda activate test-snakemake-wrappers

Afterwards, from the main directory of the repo, you can run the tests with:

pytest test.py -v

If you use a keyboard interrupt after your test has failed, you will get all the relevant stdout and stderr messages printed.

If you also want to test the docs generation locally, create another environment and activate it:

conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml
conda activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs:

cd docs
make html

If it runs through, you can open the main page at docs/_build/html/index.html in a web browser. If you want to start fresh, you can clean up the build with make clean.

ARRIBA

Detect gene fusions from chimeric STAR output

Software dependencies
  • arriba ==1.1.0
Example

This wrapper can be used in the following way:

rule arriba:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fasta",
        # path to annotation gtf
        annotation="annotation.gtf",
    output:
        # approved gene fusions
        fusions="fusions/{sample}.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.discarded.tsv" # optional
    log:
        "logs/arriba/{sample}.log"
    params:
        # arriba blacklist file
        blacklist="blacklist.tsv", # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
        # file containing known fusions
        known_fusions="", # optional
        # file containing information from structural variant analysis
        sv_file="", # optional
        # optional parameters
        extra="-T -P -i 1,2"
    threads: 1
    wrapper:
        "0.65.0/bio/arriba"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
    discarded_cmd = "-O " + discarded_fusions
else:
    discarded_cmd = ""

blacklist = snakemake.params.get("blacklist")
if blacklist:
    blacklist_cmd = "-b " + blacklist
else:
    blacklist_cmd = ""

known_fusions = snakemake.params.get("known_fusions")
if known_fusions:
    known_cmd = "-k" + known_fusions
else:
    known_cmd = ""

sv_file = snakemake.params.get("sv_file")
if sv_file:
    sv_cmd = "-d" + sv_file
else:
    sv_cmd = ""

shell(
    "arriba "
    "-x {snakemake.input.bam} "
    "-a {snakemake.input.genome} "
    "-g {snakemake.input.annotation} "
    "{blacklist_cmd} "
    "{known_cmd} "
    "{sv_cmd} "
    "-o {snakemake.output.fusions} "
    "{discarded_cmd} "
    "{extra} "
    "{log}"
)

ART

For art, the following wrappers are available:

ART_PROFILER_ILLUMINA

Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.

Software dependencies
  • art ==2016.06.05
Example

This wrapper can be used in the following way:

rule art_profiler_illumina:
    input:
        "data/{sample}.fq",
    output:
        "profiles/{sample}.txt"
    log:
        "logs/art_profiler_illumina/{sample}.log"
    params: ""
    threads: 2
    wrapper:
        "0.65.0/bio/art/profiler_illumina"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


from snakemake.shell import shell
import os
import tempfile
import re


# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
    # ensure that .fastq and .fastq.gz input files work, as well
    filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")

    # figure out the exact file extension after the above substitution
    ext = re.search("fq(\.gz)?$", filename)
    if ext:
        fq_extension = ext.group(0)
    else:
        raise IOError(
            "Incompatible extension: This art_profiler_illumina "
            "wrapper requires input files with one of the following "
            "extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
            "your input and the invocation of the wrapper accordingly."
        )

    os.symlink(
        # snakemake paths are relative, but the symlink needs to be absolute
        os.path.abspath(snakemake.input[0]),
        # the following awkward file name generation has reasons:
        # * the file name needs to be unique to the execution of the
        #   rule, as art will create and mv temporary files with its basename
        #   in the output directory, which causes utter confusion when
        #   executing instances of the rule in parallel
        # * temp file name cannot have any read infixes before the file
        #   extension, because otherwise art does read enumeration magic
        #   that messes up output file naming
        os.path.join(
            temp_input,
            filename.replace(
                "." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
            ),
        ),
    )

    # include output folder name in the profile_name command line argument and
    # strip off the file extension, as art will add its own ".txt"
    profile_name = os.path.join(
        os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
    )

    shell(
        "( art_profiler_illumina {snakemake.params} {profile_name}"
        " {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
    )

BAMTOOLS

For bamtools, the following wrappers are available:

BAMTOOLS FILTER

Filters BAM files. For more information about bamtools see bamtools documentation and bamtools source code.

Software dependencies
  • bamtools ==2.5.1
Example

This wrapper can be used in the following way:

rule bamtools_filter:
    input:
        "{sample}.bam"
    output:
        "filtered/{sample}.bam"
    params:
        # optional parameters
        tags = [ "NM:<4", "MQ:>=10" ],    # list of key:value pair strings
        min_size = "-2000",
        max_size = "2000",
        min_length = "10",
        max_length = "20",
        # to add more optional parameters (see bamtools filter --help):
        additional_params = "-mapQuality \">=0\" -isMapped \"true\""
    log:
        "logs/bamtools/filtered/{sample}.log"
    wrapper:
        "0.65.0/bio/bamtools/filter"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# extract arguments
params = ""
extra_limits = ""
tags = snakemake.params.get("tags")
min_size = snakemake.params.get("min_size")
max_size = snakemake.params.get("max_size")
min_length = snakemake.params.get("min_length")
max_length = snakemake.params.get("max_length")
additional_params = snakemake.params.get("additional_params")

if tags and tags is not None:
    params = params + " " + " ".join(map('-tag "{}"'.format, tags))

if min_size and min_size is not None:
    params = params + ' -insertSize ">=' + min_size + '"'
    if max_size and max_size is not None:
        extra_limits = extra_limits + ' -insertSize "<=' + max_size + '"'
else:
    if max_size and max_size is not None:
        params = params + ' -insertSize "<=' + max_size + '"'

if min_length and min_length is not None:
    params = params + ' -length ">=' + min_length + '"'
    if max_length and max_length is not None:
        extra_limits = extra_limits + ' -length "<=' + max_length + '"'
else:
    if max_length and max_length is not None:
        params = params + ' -length "<=' + max_length + '"'

if additional_params and additional_params is not None:
    params = params + " " + additional_params

if extra_limits:
    params = params + " | bamtools filter" + extra_limits

shell(
    "(bamtools filter"
    " -in {snakemake.input[0]}" + params + " -out {snakemake.output[0]}) {log}"
)
BAMTOOLS FILTER WITH JSON

Filters BAM files with JSON-script for filtering parameters and rules. For more information about bamtools see bamtools documentation and bamtools source code.

Software dependencies
  • bamtools ==2.5.1
Example

This wrapper can be used in the following way:

rule bamtools_filter_json:
    input:
        "{sample}.bam"
    output:
        "filtered/{sample}.bam"
    params:
        json="filtering-rules.json",
        region="" # optional parameter for defining a specific region, e.g. "chr1:500..chr3:750"
    log:
        "logs/bamtools/filtered/{sample}.log"
    wrapper:
        "0.65.0/bio/bamtools/filter_json"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

region = snakemake.params.get("region")
region_param = ""

if region and region is not None:
    region_param = ' -region "' + region + '"'

shell(
    "(bamtools filter"
    " -in {snakemake.input[0]}"
    " -out {snakemake.output[0]}"
    + region_param
    + " -script {snakemake.params.json}) {log}"
)
BAMTOOLS STATS

Use bamtools to collect statistics from a BAM file. For more information about bamtools see bamtools documentation and bamtools source code.

Software dependencies
  • bamtools ==2.5.1
Example

This wrapper can be used in the following way:

rule bamtools_stats:
    input:
        "{sample}.bam"
    output:
        "{sample}.bamstats"
    params:
        "-insert" # optional summarize insert size data
    log:
        "logs/bamtools/stats/{sample}.log"
    wrapper:
        "0.65.0/bio/bamtools/stats"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "(bamtools stats {snakemake.params} -in {snakemake.input[0]} > {snakemake.output[0]}) {log}"
)

BCFTOOLS

For bcftools, the following wrappers are available:

BCFTOOLS CALL

Call variants with bcftools call.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_call:
    input:
        pileup="{sample}.pileup.bcf",
    output:
        calls="{sample}.calls.bcf",
    params:
        caller="-m", # valid options include -c/--consensus-caller or -m/--multiallelic-caller
        options="--ploidy 1 --prior 0.001",
    log:
        "logs/bcftools_call/{sample}.log",
    wrapper:
        "0.65.0/bio/bcftools/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Michael Hall
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


class CallerOptionError(Exception):
    pass


valid_caller_opts = {"-c", "--consensus-caller", "-m", "--multiallelic-caller"}

caller_opt = snakemake.params.get("caller", "")
if caller_opt.strip() not in valid_caller_opts:
    raise CallerOptionError(
        "bcftools call expects either -m/--multiallelic-caller or "
        "-c/--consensus-caller as caller option."
    )

options = snakemake.params.get("options", "")

shell(
    "bcftools call {options} {caller_opt} --threads {snakemake.threads} "
    "-o {snakemake.output.calls} {snakemake.input.pileup} 2> {snakemake.log}"
)
BCFTOOLS CONCAT

Concatenate vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_concat:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.65.0/bio/bcftools/concat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools concat {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}"
)
BCFTOOLS INDEX

Index vcf/bcf file.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_index:
    input:
        "a.bcf"
    output:
        "a.bcf.csi"
    params:
        extra=""  # optional parameters for bcftools index
    wrapper:
        "0.65.0/bio/bcftools/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

shell("bcftools index" " {extra}" " {snakemake.input[0]}")
BCFTOOLS MERGE

Merge vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_merge:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.65.0/bio/bcftools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell(
    "bcftools merge {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}"
)
BCFTOOLS MPILEUP

Generate VCF or BCF containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files with bcftools mpileup.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_mpileup:
    input:
        index="genome.fasta.fai",
        ref="genome.fasta", # this can be left out if --no-reference is in options
        alignments="mapped/{sample}.bam",
    output:
        pileup="pileups/{sample}.pileup.bcf",
    params:
        options="--max-depth 100 --min-BQ 15",
    log:
        "logs/bcftools_mpileup/{sample}.log",
    wrapper:
        "0.65.0/bio/bcftools/mpileup"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


class MissingReferenceError(Exception):
    pass


options = snakemake.params.get("options", "")

# determine if a fasta reference is provided or not and add to options
if "--no-reference" not in options:
    ref = snakemake.input.get("ref", "")
    if not ref:
        raise MissingReferenceError(
            "The --no-reference option was not given, but no fasta reference was "
            "provided."
        )
    options += " --fasta-ref {}".format(ref)

shell(
    "bcftools mpileup {options} --threads {snakemake.threads} "
    "--output {snakemake.output.pileup} "
    "{snakemake.input.alignments} 2> {snakemake.log}"
)
BCFTOOLS NORM

Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule norm_vcf:
    input:
        "{prefix}.vcf"
    output:
        "{prefix}.vcf"
    params:
        ""  # optional parameters for bcftools norm (except -o)
    wrapper:
        "0.65.0/bio/bcftools/norm"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Dayne Filer
Code
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools norm {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)
BCFTOOLS REHEADER

Change header or sample names of vcf/bcf file.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_reheader:
    input:
        vcf="a.bcf",
        ## new header, can be omitted if "samples" is set
        header="header.txt",
        ## file containing new sample names, can be omitted if "header" is set
        samples="samples.tsv"
    output:
        "a.reheader.bcf"
    params:
        extra="",  # optional parameters for bcftools reheader
        view_extra="-O b"  # add output format for internal bcftools view call
    wrapper:
        "0.65.0/bio/bcftools/reheader"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


from snakemake.shell import shell

## Extract arguments
header = snakemake.input.get("header", "")
if header:
    header_cmd = "-h " + header
else:
    header_cmd = ""

samples = snakemake.input.get("samples", "")
if samples:
    samples_cmd = "-s " + samples
else:
    samples_cmd = ""

extra = snakemake.params.get("extra", "")
view_extra = snakemake.params.get("view_extra", "")

shell(
    "bcftools reheader "
    "{extra} "
    "{header_cmd} "
    "{samples_cmd} "
    "{snakemake.input.vcf} "
    "| bcftools view "
    "{view_extra} "
    "> {snakemake.output}"
)
BCFTOOLS VIEW

View vcf/bcf file in a different format.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcf_to_vcf:
    input:
        "{prefix}.bcf"
    output:
        "{prefix}.vcf"
    params:
        ""  # optional parameters for bcftools view (except -o)
    wrapper:
        "0.65.0/bio/bcftools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools view {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)

BEDTOOLS

For bedtools, the following wrappers are available:

COVERAGEBED

Returns the depth and breadth of coverage of features from B on the intervals in A.

Software dependencies
  • bedtools ==2.29.0
Example

This wrapper can be used in the following way:

rule coverageBed:
    input:
        a="bed/{sample}.bed",
        b="mapped/{sample}.bam"
    output:
        "stats/{sample}.cov"
    log:
        "logs/coveragebed/{sample}.log"
    params:
        extra=""  # optional parameters
    threads: 8
    wrapper:
        "0.65.0/bio/bedtools/coveragebed"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

input_a = snakemake.input.a
input_b = snakemake.input.b

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one file: " + str(output_file) + "!")

shell(
    "coverageBed"
    " -a {input_a}"
    " -b {input_b}"
    " {extra_params}"
    " > {output_file}"
    " {log}"
)
BEDTOOLS GENOMECOVERAGEBED

bedtools’s genomeCoverageBed computes the coverage of a feature file as histograms, per-base reports or BEDGRAPH summaries among a given genome. For usage information about genomeCoverageBed, please see bedtools’s documentation. For more information about bedtools, also see the source code.

Software dependencies
  • bedtools ==2.29.2
Example

This wrapper can be used in the following way:

rule genomecov_bam:
    input:
        "bam_input/{sample}.sorted.bam"
    output:
        "genomecov_bam/{sample}.genomecov"
    log:
        "logs/genomecov_bam/{sample}.log"
    params:
        "-bg"  # optional parameters
    wrapper:
        "0.65.0/bio/bedtools/genomecov"

rule genomecov_bed:
    input:
        # for genome file format please see:
        # https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format
        bed="bed_input/{sample}.sorted.bed",
        ref="bed_input/genome_file"
    output:
        "genomecov_bed/{sample}.genomecov"
    log:
        "logs/genomecov_bed/{sample}.log"
    params:
        "-bg"  # optional parameters
    wrapper:
        "0.65.0/bio/bedtools/genomecov"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

genome = ""
input_file = ""

if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
    input_file = "-ibam " + snakemake.input[0]

if len(snakemake.input) > 1:
    if (os.path.splitext(snakemake.input[0])[-1]) == ".bed":
        input_file = "-i " + snakemake.input.get("bed")
        genome = "-g " + snakemake.input.get("ref")

shell(
    "(genomeCoverageBed"
    " {snakemake.params}"
    " {input_file}"
    " {genome}"
    " > {snakemake.output[0]}) {log}"
)
BEDTOOLS INTERSECT

Intersect BED/BAM/VCF files with bedtools.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        left="A.bed",
        right="B.bed"
    output:
        "A_B.intersected.bed"
    params:
        ## Add optional parameters
        extra="-wa -wb" ## In this example, we want to write original entries in A and B for each overlap.
    log:
        "logs/intersect/A_B.log"
    wrapper:
        "0.65.0/bio/bedtools/intersect"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools intersect"
    " {extra}"
    " -a {snakemake.input.left}"
    " -b {snakemake.input.right}"
    " > {snakemake.output})"
    " {log}"
)
BEDTOOLS MERGE

Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        # Multiple bed-files can be added as list
        "A.bed"
    output:
        "A.merged.bed"
    params:
        ## Add optional parameters
        extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
    log:
        "logs/merge/A.log"
    wrapper:
        "0.65.0/bio/bedtools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster, Felix Mölder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de, felix.moelder@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if len(snakemake.input) > 1:
    if all(f.endswith(".gz") for f in snakemake.input):
        cat = "zcat"
    elif all(not f.endswith(".gz") for f in snakemake.input):
        cat = "cat"
    else:
        raise ValueError("Input files must be all compressed or uncompressed.")
    shell(
        "({cat} {snakemake.input} | "
        "sort -k1,1 -k2,2n | "
        "bedtools merge {extra} "
        "-i stdin > {snakemake.output}) "
        " {log}"
    )
else:
    shell(
        "( bedtools merge"
        " {extra}"
        " -i {snakemake.input}"
        " > {snakemake.output})"
        " {log}"
    )
BEDTOOLS SLOP

Increase the size of each feature in a BED/BAM/VCF by a specified factor.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        "A.bed"
    output:
        "A.slop.bed"
    params:
        ## Genome file, tab-seperated file defining the length of every contig
        genome="genome.txt",
        ## Add optional parameters
        extra = "-b 10" ## in this example, we want to increase the feature by 10 bases to both sides
    log:
        "logs/slop/A.log"
    wrapper:
        "0.65.0/bio/bedtools/slop"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools slop"
    " {extra}"
    " -i {snakemake.input[0]}"
    " -g {snakemake.params.genome}"
    " > {snakemake.output})"
    " {log}"
)

BENCHMARK

For benchmark, the following wrappers are available:

CHM-EVAL

Evaluate given VCF file with chm-eval (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

Software dependencies
  • perl =5.26
Example

This wrapper can be used in the following way:

rule chm_eval:
    input:
        kit="resources/chm-eval-kit",
        vcf="{sample}.vcf"
    output:
        summary="chm-eval/{sample}.summary", # summary statistics
        bed="chm-eval/{sample}.err.bed.gz" # bed file with errors
    params:
        extra="",
        build="38"
    log:
        "logs/chm-eval/{sample}.log"
    wrapper:
        "0.65.0/bio/benchmark/chm-eval"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

kit = snakemake.input.kit
vcf = snakemake.input.vcf
build = snakemake.params.build
extra = snakemake.params.get("extra", "")

if not snakemake.output[0].endswith(".summary"):
    raise ValueError("Output file must end with .summary")
out = snakemake.output[0][:-8]

shell("({kit}/run-eval -g {build} -o {out} {extra} {vcf} | sh) {log}")
CHM-EVAL-KIT

Download CHM-eval kit (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

Software dependencies
  • curl
Example

This wrapper can be used in the following way:

rule chm_eval_kit:
    output:
        directory("resources/chm-eval-kit")
    params:
        # Tag and version must match, see https://github.com/lh3/CHM-eval/releases.
        tag="v0.5",
        version="20180222"
    log:
        "logs/chm-eval-kit.log"
    cache: True
    wrapper:
        "0.65.0/bio/benchmark/chm-eval-kit"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = (
    "https://github.com/lh3/CHM-eval/releases/"
    "download/{tag}/CHM-evalkit-{version}.tar"
).format(version=snakemake.params.version, tag=snakemake.params.tag)

os.makedirs(snakemake.output[0])
shell("(curl -L {url} | tar --strip-components 1 -C {snakemake.output[0]} -xf -) {log}")
CHM-EVAL-SAMPLE

Download CHM-eval sample (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

Software dependencies
  • samtools =1.10
  • curl
Example

This wrapper can be used in the following way:

rule chm_eval_sample:
    output:
        bam="resources/chm-eval-sample.bam",
        bai="resources/chm-eval-sample.bam.bai"
    params:
        # Optionally only grab the first 100 records.
        # This is for testing, remove next line to grab all records.
        first_n=100
    log:
        "logs/chm-eval-sample.log"
    wrapper:
        "0.65.0/bio/benchmark/chm-eval-sample"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

url = "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR134/ERR1341796/CHM1_CHM13_2.bam"

pipefail = ""
fmt = "-b"
prefix = snakemake.params.get("first_n", "")
if prefix:
    prefix = "| head -n {} | samtools view -h -b".format(prefix)
    fmt = "-h"
    pipefail = "set +o pipefail"

    shell(
        """
        {pipefail}
        {{
            samtools view {fmt} {url} {prefix} > {snakemake.output.bam}
            samtools index {snakemake.output.bam}
        }} {log}
        """
    )
else:
    shell(
        """
        {{
            curl -L {url} > {snakemake.output.bam}
            samtools index {snakemake.output.bam}
        }} {log}
        """
    )

BISMARK

For bismark, the following wrappers are available:

BAM2NUC

Calculate mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence composition (see https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Nucleotide stats for genome is required for further stats for BAM file
rule bam2nuc_for_genome:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz"
    output:
        "indexes/{genome}/genomic_nucleotide_frequencies.txt"
    log:
        "logs/indexes/{genome}/genomic_nucleotide_frequencies.txt.log"
    wrapper:
        "0.65.0/bio/bismark/bam2nuc"

# Nucleotide stats for BAM file
rule bam2nuc_for_bam:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz",
        bam="bams/{sample}_{genome}.bam"
    output:
        report="bams/{sample}_{genome}.nucleotide_stats.txt"
    log:
        "logs/{sample}_{genome}.nucleotide_stats.txt.log"
    wrapper:
        "0.65.0/bio/bismark/bam2nuc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for bam2nuc tool that calculates mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence
composition."""
# https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmdline_args = ["bam2nuc {extra}"]

genome_fa = snakemake.input.get("genome_fa", None)
if not genome_fa:
    raise ValueError("bismark/bam2nuc: Error 'genome_fa' input not specified.")
genome_folder = os.path.dirname(genome_fa)
cmdline_args.append("--genome_folder {genome_folder:q}")


bam = snakemake.input.get("bam", None)
if bam:
    cmdline_args.append("{bam}")
    bams = bam if isinstance(bam, list) else [bam]

    report = snakemake.output.get("report", None)
    if not report:
        raise ValueError("bismark/bam2nuc: Error 'report' output isn't specified.")

    reports = report if isinstance(report, list) else [report]
    if len(reports) != len(bams):
        raise ValueError(
            "bismark/bam2nuc: Error number of paths in output:report ({} files)"
            " should be same as in input:bam ({} files).".format(
                len(reports), len(bams)
            )
        )
    output_dir = os.path.dirname(reports[0])
    if any(output_dir != os.path.dirname(p) for p in reports):
        raise ValueError(
            "bismark/bam2nuc: Error all reports should be in same directory:"
            " {}".format(output_dir)
        )
    if output_dir:
        cmdline_args.append("--dir {output_dir:q}")
else:
    cmdline_args.append("--genomic_composition_only")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))


# Move outputs into proper position.
if bam:
    log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

    expected_2_actual_paths = []
    for bam_path, report_path in zip(bams, reports):
        bam_name = os.path.basename(bam_path)
        bam_basename = os.path.splitext(bam_name)[0]
        expected_2_actual_paths.append(
            (
                report_path,
                os.path.join(
                    output_dir, "{}.nucleotide_stats.txt".format(bam_basename)
                ),
            )
        )

    for (exp_path, actual_path) in expected_2_actual_paths:
        if exp_path and (exp_path != actual_path):
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK

Align BS-Seq reads using Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark_pe:
    input:
        fq_1="reads/{sample}.1.fastq",
        fq_2="reads/{sample}.2.fastq",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}_pe.bam",
        report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        bam_unmapped_1="bams/{sample}_{genome}_unmapped_reads_1.fq.gz",
        bam_unmapped_2="bams/{sample}_{genome}_unmapped_reads_2.fq.gz",
        ambiguous_1="bams/{sample}_{genome}_ambiguous_reads_1.fq.gz",
        ambiguous_2="bams/{sample}_{genome}_ambiguous_reads_2.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log"
    params:
        # optional params string, e.g: -L32 -N0 -X400 --gzip
        # Useful options to tune:
        # (for bowtie2)
        # -N: The maximum number of mismatches permitted in the "seed", i.e. the first L base pairs
        # of the read (deafault: 1)
        # -L: The "seed length" (deafault: 28)
        # -I: The minimum insert size for valid paired-end alignments. ~ min fragment size filter (for
        # PE reads)
        # -X: The maximum insert size for valid paired-end alignments. ~ max fragment size filter (for
        # PE reads)
        # --gzip: Gzip intermediate fastq files
        # --ambiguous --unmapped
        # -p: bowtie2 parallel execution
        # --multicore: bismark parallel execution
        # --temp_dir: tmp dir for intermediate files instead of output directory
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "0.65.0/bio/bismark/bismark"

# Example: Single-ended reads
rule bismark_se:
    input:
        fq="reads/{sample}.fq.gz",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}.bam",
        report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}.nucleotide_stats.txt",
        bam_unmapped="bams/{sample}_{genome}_unmapped_reads.fq.gz",
        ambiguous="bams/{sample}_{genome}_ambiguous_reads.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log",
    params:
        # optional params string
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "0.65.0/bio/bismark/bismark"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for aligning methylation BS-Seq data using Bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from tempfile import TemporaryDirectory


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = os.path.basename(file_path)

    split_ind = 2 if base.endswith(".gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark {extra} --bowtie2"]

outdir = os.path.dirname(snakemake.output.bam)
if outdir:
    cmdline_args.append("--output_dir {outdir}")

genome_indexes_dir = os.path.dirname(snakemake.input.bismark_indexes_dir)
cmdline_args.append("{genome_indexes_dir}")

if not snakemake.output.get("bam", None):
    raise ValueError("bismark/bismark: Error 'bam' output file isn't specified.")
if not snakemake.output.get("report", None):
    raise ValueError("bismark/bismark: Error 'report' output file isn't specified.")

# basename
if snakemake.params.get("basename", None):
    cmdline_args.append("--basename {snakemake.params.basename:q}")
    basename = snakemake.params.basename
else:
    basename = None

# reads input
single_end_mode = snakemake.input.get("fq", None)
if single_end_mode:
    # for SE data, you only have to specify read1 input by -i or --in1, and
    # specify read1 output by -o or --out1.
    cmdline_args.append("--se {snakemake.input.fq:q}")
    mode_prefix = "se"
    if basename is None:
        basename = basename_without_ext(snakemake.input.fq)
else:
    # for PE data, you should also specify read2 input by -I or --in2, and
    # specify read2 output by -O or --out2.
    cmdline_args.append("-1 {snakemake.input.fq_1:q} -2 {snakemake.input.fq_2:q}")
    mode_prefix = "pe"

    if basename is None:
        # default basename
        basename = basename_without_ext(snakemake.input.fq_1) + "_bismark_bt2"

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

# Move outputs into proper position.
expected_2_actual_paths = [
    (
        snakemake.output.bam,
        os.path.join(
            outdir, "{}{}.bam".format(basename, "" if single_end_mode else "_pe")
        ),
    ),
    (
        snakemake.output.report,
        os.path.join(
            outdir,
            "{}_{}_report.txt".format(basename, "SE" if single_end_mode else "PE"),
        ),
    ),
    (
        snakemake.output.get("nucleotide_stats", None),
        os.path.join(
            outdir,
            "{}{}.nucleotide_stats.txt".format(
                basename, "" if single_end_mode else "_pe"
            ),
        ),
    ),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK2BEDGRAPH

Generate bedGraph and coverage files from positional methylation files created by bismark_methylation_extractor (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example for CHG+CHH summary coverage:
rule bismark2bedGraph_noncpg:
    input:
        "meth/CHG_context_{sample}.txt.gz",
        "meth/CHH_context_{sample}.txt.gz"
    output:
        bedGraph="meth_non_cpg/{sample}_non_cpg.bedGraph.gz",
        cov="meth_non_cpg/{sample}_non_cpg.bismark.cov.gz"
    log:
        "logs/meth_non_cpg/{sample}_non_cpg.log"
    params:
        extra="--CX"
    wrapper:
        "0.65.0/bio/bismark/bismark2bedGraph"

# Example for CpG only coverage
rule bismark2bedGraph_cpg:
    input:
        "meth/CpG_context_{sample}.txt.gz"
    output:
        bedGraph="meth_cpg/{sample}_CpG.bedGraph.gz",
        cov="meth_cpg/{sample}_CpG.bismark.cov.gz"
    log:
        "logs/meth_cpg/{sample}_CpG.log"
    wrapper:
        "0.65.0/bio/bismark/bismark2bedGraph"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark bismark2bedGraph tool."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

bedGraph = snakemake.output.get("bedGraph", "")
if not bedGraph:
    raise ValueError("bismark/bismark2bedGraph: Please specify bedGraph output path")

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark2bedGraph {params_extra}"]

dir_name = os.path.dirname(bedGraph)
if dir_name:
    cmdline_args.append("--dir {dir_name}")

fname = os.path.basename(bedGraph)
cmdline_args.append("--output {fname}")

cmdline_args.append("{snakemake.input}")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))
BISMARK2REPORT

Generate graphical HTML report from Bismark reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2report).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark2report_pe:
    input:
        alignment_report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_report="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}_pe.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}_pe.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}_pe.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "0.65.0/bio/bismark/bismark2report"

# Example: Single-ended reads
rule bismark2report_se:
    input:
        alignment_report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_report="bams/{sample}_{genome}.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "0.65.0/bio/bismark/bismark2report"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate graphical HTML report from Bismark reports."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2report

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell


def answer2bool(v):
    return str(v).lower() in ("yes", "true", "t", "1")


extra = snakemake.params.get("extra", "")
cmds = ["bismark2report {extra}"]

# output
html_file = snakemake.output.get("html", "")
output_dir = snakemake.output.get("html_dir", None)
if output_dir is None:
    if html_file:
        output_dir = os.path.dirname(html_file)
else:
    if html_file:
        raise ValueError(
            "bismark/bismark2report: Choose one: 'html=...' for a single dir or 'html_dir=...' for batch processing."
        )

if output_dir is None:
    raise ValueError(
        "bismark/bismark2report: Output file or directory not specified. "
        "Use 'html=...' for a single dir or 'html_dir=...' for batch "
        "processing."
    )

if output_dir:
    cmds.append("--dir {output_dir:q}")

if html_file:
    html_file_name = os.path.basename(html_file)
    cmds.append("--output {html_file_name:q}")

# reports
reports = [
    "alignment_report",
    "dedup_report",
    "splitting_report",
    "mbias_report",
    "nucleotide_report",
]
skip_optional_reports = answer2bool(
    snakemake.params.get("skip_optional_reports", False)
)
for report_name in reports:
    path = snakemake.input.get(report_name, "")
    if path:
        locals()[report_name] = path
        cmds.append("--{0} {{{1}:q}}".format(report_name, report_name))
    elif skip_optional_reports:
        cmds.append("--{0} 'none'".format(report_name))

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK2SUMMARY

Generate summary graphical HTML report from several Bismark text report files reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

import  os

rule bismark2summary:
    input:
        bam=["bams/a_genome_pe.bam", "bams/b_genome.bam"],

        # Bismark `bismark2summary` discovers reports automatically based
        # on files available in bam file containing folder
        #
        # If your per BAM file reports aren't in the same folder
        # you will need an additional task which symlinks all reports
        # (E.g. your splitting report generated by `bismark_methylation_extractor`
        # tool is in `meth` folder, and alignment related reports in `bams` folder)

        # These dependencies are here just to ensure that corresponding rules
        # has already finished at rule execution time, otherwise some reports
        # will be missing.
        dependencies=[
            "bams/a_genome_PE_report.txt",
            "bams/a_genome_pe.deduplication_report.txt",
            # for example splitting report is missing for 'a' sample

            "bams/b_genome_SE_report.txt",
            "bams/b_genome.deduplication_report.txt",
            "bams/b_genome.deduplicated_splitting_report.txt"
        ]
    output:
        html="qc/{experiment}.bismark2summary.html",
        txt="qc/{experiment}.bismark2summary.txt"
    log:
        "logs/qc/{experiment}.bismark2summary.log"
    wrapper:
        "0.65.0/bio/bismark/bismark2summary"

rule bismark2summary_prepare_symlinks:
    input:
        "meth/b_genome.deduplicated_splitting_report.txt",
    output:
        temp("bams/b_genome.deduplicated_splitting_report.txt"),
    log:
        "qc/bismark2summary_prepare_symlinks.symlinks.log"
    run:
        wd = os.getcwd()
        shell("echo 'Making symlinks' > {log}")
        for source, target in zip(input, output):
           target_dir = os.path.dirname(target)
           target_name = os.path.basename(target)
           log_path = os.path.join(wd, log[0])
           abs_src_path = os.path.abspath(source)
           shell("cd {target_dir} && ln -f -s {abs_src_path} {target_name} >> {log_path} 2>&1")

        shell("echo 'Done' >> {log}")

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate summary graphical HTML report from several Bismark text report files."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmds = ["bismark2summary {extra}"]

# basename
bam = snakemake.input.get("bam", None)
if not bam:
    raise ValueError(
        "bismark/bismark2summary: Please specify aligned BAM file path"
        " (one or several) using 'bam=..'"
    )

html = snakemake.output.get("html", None)
txt = snakemake.output.get("txt", None)
if not html or not txt:
    raise ValueError(
        "bismark/bismark2summary: Please specify both 'html=..' and"
        " 'txt=..' paths in output section"
    )

basename, ext = os.path.splitext(html)
if ext.lower() != ".html":
    raise ValueError(
        "bismark/bismark2summary: HTML report file should end"
        " with suffix '.html' but was {} ({})".format(ext, html)
    )

suggested_txt = basename + ".txt"
if suggested_txt != txt:
    raise ValueError(
        "bismark/bismark2summary: Expected '{}' TXT report, "
        "but was: '{}'".format(suggested_txt, txt)
    )

cmds.append("--basename {basename:q}")

# title
title = snakemake.params.get("title", None)
if title:
    cmds.append("--title {title:q}")

cmds.append("{bam}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK_GENOME_PREPARATION

Generate indexes for Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# For *.fa file
rule bismark_genome_preparation_fa:
    input:
        "indexes/{genome}/{genome}.fa"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "0.65.0/bio/bismark/bismark_genome_preparation"

# Fo *.fa.gz file:
rule bismark_genome_preparation_fa_gz:
    input:
        "indexes/{genome}/{genome}.fa.gz"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "0.65.0/bio/bismark/bismark_genome_preparation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark indexes preparing using bismark_genome_preparation."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


from os import path
from snakemake.shell import shell

input_dir = path.dirname(snakemake.input[0])

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("bismark_genome_preparation --verbose --bowtie2 {params_extra} {input_dir} {log}")
BISMARK_METHYLATION_EXTRACTOR

Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
  • perl-gdgraph == 1.54
Example

This wrapper can be used in the following way:

rule bismark_methylation_extractor:
    input: "bams/{sample}.bam"
    output:
        mbias_r1="qc/meth/{sample}.M-bias_R1.png",
        # Only for PE BAMS:
        # mbias_r2="qc/meth/{sample}.M-bias_R2.png",

        mbias_report="meth/{sample}.M-bias.txt",
        splitting_report="meth/{sample}_splitting_report.txt",

        # 1-based start, 1-based end ('inclusive') methylation info: % and counts
        methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
        # BedGraph with methylation percentage: 0-based start, end exclusive
        methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",

        # Primary output files: methylation status at each read cytosine position: (extremely large)
        read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
        # * You could merge CHG, CHH using: --merge_non_CpG
        read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
        read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
    log:
        "logs/meth/{sample}.log"
    params:
        output_dir="meth",  # optional output dir
        extra="--gzip --comprehensive --bedGraph"  # optional params string
    wrapper:
        "0.65.0/bio/bismark/bismark_methylation_extractor"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]

# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
    cmdline_args.append("-o {output_dir:q}")

# trimming options
trimming_options = [
    "ignore",  # meth_bias_r1_5end
    "ignore_3prime",  # meth_bias_r1_3end
    "ignore_r2",  # meth_bias_r2_5end
    "ignore_3prime_r2",  # meth_bias_r2_3end
]
for key in trimming_options:
    value = snakemake.params.get(key, None)
    if value:
        cmdline_args.append("--{} {}".format(key, value))

# Input
cmdline_args.append("{snakemake.input}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

key2prefix_suffix = [
    ("mbias_report", ("", ".M-bias.txt")),
    ("mbias_r1", ("", ".M-bias_R1.png")),
    ("mbias_r2", ("", ".M-bias_R2.png")),
    ("splitting_report", ("", "_splitting_report.txt")),
    ("methylome_CpG_cov", ("", ".bismark.cov.gz")),
    ("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
    ("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
    ("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
    ("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]

log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (key, (prefix, suffix)) in key2prefix_suffix:
    exp_path = snakemake.output.get(key, None)
    if exp_path:
        if len(snakemake.input) != 1:
            raise ValueError(
                "bismark/bismark_methylation_extractor: Error: only one BAM file is"
                " expected in input, but was <{}>".format(snakemake.input)
            )
        bam_file = snakemake.input[0]
        bam_name = os.path.basename(bam_file)
        bam_wo_ext = os.path.splitext(bam_name)[0]

        actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
        if exp_path != actual_path:
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
DEDUPLICATE_BISMARK

Deduplicate Bismark Bam Files and saves as *.bam file (see https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

rule deduplicate_bismark:
    input: "bams/a_genome_pe.bam"
    output:
        bam="bams/{sample}.deduplicated.bam",
        report="bams/{sample}.deduplication_report.txt",
    log:
        "logs/bams/{sample}.deduplicated.log",
    params:
        extra=""  # optional params string
    wrapper:
        "0.65.0/bio/bismark/deduplicate_bismark"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark aligned reads deduplication using deduplicate_bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

bam_path = snakemake.output.get("bam", None)
report_path = snakemake.output.get("report", None)
if not bam_path or not report_path:
    raise ValueError(
        "bismark/deduplicate_bismark: Please specify both 'bam=..' and 'report=..' paths in output section"
    )

output_dir = os.path.dirname(bam_path)
if output_dir != os.path.dirname(report_path):
    raise ValueError(
        "bismark/deduplicate_bismark: BAM and Report files expected to have the same parent directory"
        " but was {} and {}".format(bam_path, report_path)
    )

arg_output_dir = "--output_dir '{}'".format(output_dir) if output_dir else ""
arg_multiple = "--multiple" if len(snakemake.input) > 1 else ""

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
    "deduplicate_bismark {params_extra} --bam {arg_multiple}"
    " {arg_output_dir} {snakemake.input} {log}"
)

# Move outputs into proper position.
fst_input_filename = os.path.basename(snakemake.input[0])
fst_input_basename = os.path.splitext(fst_input_filename)[0]
prefix = os.path.join(output_dir, fst_input_basename)

deduplicated_bam_actual_name = prefix + ".deduplicated.bam"
if arg_multiple:
    # bismark does it exactly like this:
    deduplicated_bam_actual_name = deduplicated_bam_actual_name.replace(
        "deduplicated", "multiple.deduplicated", 1
    )

expected_2_actual_paths = [
    (bam_path, deduplicated_bam_actual_name),
    (
        report_path,
        prefix + (".multiple" if arg_multiple else "") + ".deduplication_report.txt",
    ),
]
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")

BOWTIE2

For bowtie2, the following wrappers are available:

BOWTIE2

Map reads with bowtie2.

Software dependencies
  • bowtie2 ==2.4.1
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule bowtie2:
    input:
        sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bowtie2/{sample}.log"
    params:
        index="index/genome",  # prefix of reference genome index (built with bowtie2-build)
        extra=""  # optional parameters
    threads: 8  # Use at least two threads
    wrapper:
        "0.65.0/bio/bowtie2/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "-U {}".format(*snakemake.input.sample)
else:
    reads = "-1 {} -2 {}".format(*snakemake.input.sample)

shell(
    "(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
    "-x {snakemake.params.index} {reads} "
    "| samtools view -Sbh -o {snakemake.output[0]} -) {log}"
)

BUSCO

Assess assembly and annotation completeness with BUSCO

Software dependencies
  • python ==3.6
  • busco
Example

This wrapper can be used in the following way:

rule run_busco:
    input:
        "sample_data/target.fa"
    output:
        "txome_busco/full_table_txome_busco.tsv",
    log:
        "logs/quality/transcriptome_busco.log"
    threads: 8
    params:
        mode="transcriptome",
        lineage_path="sample_data/example",
        # optional parameters
        extra=""
    wrapper:
        "0.65.0/bio/busco"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for BUSCO assessment"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"

# busco does not allow you to direct output location: handle this by moving output
outdir = path.dirname(snakemake.output[0])
if "/" in outdir:
    out_name = path.basename(outdir)
else:
    out_name = outdir

# note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell(
    "run_busco --in {snakemake.input} --out {out_name} --force "
    " --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
    " {extra} {log}"
)

busco_outname = "run_" + out_name

# move to intended location
shell("cp -r {busco_outname}/* {outdir}")
shell("rm -rf {busco_outname}")

BWA

For bwa, the following wrappers are available:

BWA ALN

Map reads with bwa aln.

Software dependencies
  • bwa ==0.7.17
Example

This wrapper can be used in the following way:

rule bwa_aln:
    input:
        "reads/{sample}.{pair}.fastq"
    output:
        "sai/{sample}.{pair}.sai"
    params:
        index="genome",
        extra=""
    log:
        "logs/bwa_aln/{sample}.{pair}.log"
    threads: 8
    wrapper:
        "0.65.0/bio/bwa/aln"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa aln."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "bwa aln"
    " {extra}"
    " -t {snakemake.threads}"
    " {snakemake.params.index}"
    " {snakemake.input[0]}"
    " > {snakemake.output[0]} {log}"
)
BWA INDEX

Creates a BWA index.

Software dependencies
  • bwa ==0.7.17
Example

This wrapper can be used in the following way:

rule bwa_index:
    input:
        "{genome}.fasta"
    output:
        "{genome}.amb",
        "{genome}.ann",
        "{genome}.bwt",
        "{genome}.pac",
        "{genome}.sa"
    log:
        "logs/bwa_index/{genome}.log"
    params:
        prefix="{genome}",
        algorithm="bwtsw"
    wrapper:
        "0.65.0/bio/bwa/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
    raise ValueError("Only one reference genome can be inputed!")

# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")

if len(prefix) > 0:
    prefix = "-p " + prefix

# Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")

if len(construction_algorithm) != 0:
    construction_algorithm = "-a " + construction_algorithm

shell(
    "bwa index" " {prefix}" " {construction_algorithm}" " {snakemake.input[0]}" " {log}"
)
BWA MEM

Map reads using bwa mem, with optional sorting using samtools or picard.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="genome",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="none",             # Can be 'none', 'samtools' or 'picard'.
        sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
        sort_extra=""            # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "0.65.0/bio/bwa/mem"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Julian de Ruiter
Code
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER

Map reads using bwa mem, mark duplicates by samblaster and sort and index by sambamba.

Software dependencies
  • bwa ==0.7.17
  • sambamba ==0.7.1
  • samblaster ==0.1.24
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        bam="mapped/{sample}.bam",
        index="mapped/{sample}.bam.bai"
    log:
        "logs/bwa_mem_sambamba/{sample}.log"
    params:
        index="genome",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort_extra="" # Extra args for sambamba.
    threads: 8
    wrapper:
        "0.65.0/bio/bwa/mem-samblaster"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | samblaster"
    " {samblaster_extra}"
    " | sambamba view -S -f bam /dev/stdin"
    " -t {snakemake.threads}"
    " | sambamba sort /dev/stdin"
    " -t {snakemake.threads}"
    " -o {snakemake.output.bam}"
    " {sort_extra}"
    ") {log}"
)
BWA SAMPE

Map paired-end reads with bwa sampe.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_sampe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_sampe/{sample}.log"
    wrapper:
        "0.65.0/bio/bwa/sampe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Check inputs.
if not len(snakemake.input.sai) == 2:
    raise ValueError("input.sai must have 2 elements")

if not len(snakemake.input.fastq) == 2:
    raise ValueError("input.fastq must have 2 elements")

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa sampe"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)
BWA SAMSE

Map single-end reads with bwa samse.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_samse:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai"
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_samse/{sample}.log"
    wrapper:
        "0.65.0/bio/bwa/samse"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa samse"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)

BWA-MEM2

For bwa-mem2, the following wrappers are available:

BWA-MEM2 INDEX

Creates a bwa-mem2 index.

Software dependencies
  • bwa-mem2 ==2.0
Example

This wrapper can be used in the following way:

rule bwa_mem2_index:
    input:
        "{genome}"
    output:
        "{genome}.0123",
        "{genome}.amb",
        "{genome}.ann",
        "{genome}.bwt.2bit.64",
        "{genome}.bwt.8bit.32",
        "{genome}.pac",
    log:
        "logs/bwa-mem2_index/{genome}.log"
    params:
        prefix="{genome}"
    wrapper:
        "0.65.0/bio/bwa-mem2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
  • Patrik Smeds
Code
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2020, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
    raise ValueError("Please provide exactly one reference genome as input.")

# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")

if len(prefix) > 0:
    prefix = "-p " + prefix

shell("bwa-mem2 index" " {prefix}" " {snakemake.input[0]}" " {log}")
BWA-MEM2

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. Optional sorting using samtools or picard.

Software dependencies
  • bwa-mem2 ==2.0
  • samtools ==1.10
  • picard ==2.23
Example

This wrapper can be used in the following way:

rule bwa_mem2_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bwa_mem2/{sample}.log"
    params:
        index="genome.fasta",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="none",             # Can be 'none', 'samtools' or 'picard'.
        sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
        sort_extra=""            # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "0.65.0/bio/bwa-mem2/mem"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
  • Johannes Köster
  • Julian de Ruiter
Code
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
    "Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

shell(
    "(bwa-mem2 mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER

Map reads using bwa-mem2, mark duplicates by samblaster and sort and index by sambamba.

Software dependencies
  • bwa-mem2 ==2.0
  • sambamba ==0.7.1
  • samblaster ==0.1.24
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        bam="mapped/{sample}.bam",
        index="mapped/{sample}.bam.bai"
    log:
        "logs/bwa_mem2_sambamba/{sample}.log"
    params:
        index="genome.fasta",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort_extra="-q" # Extra args for sambamba.
    threads: 8
    wrapper:
        "0.65.0/bio/bwa-mem2/mem-samblaster"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")

shell(
    "(bwa-mem2 mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | samblaster"
    " {samblaster_extra}"
    " | sambamba view -S -f bam /dev/stdin"
    " -t {snakemake.threads}"
    " | sambamba sort /dev/stdin"
    " -t {snakemake.threads}"
    " -o {snakemake.output.bam}"
    " {sort_extra}"
    ") {log}"
)

CAIROSVG

Convert SVG files with cairosvg.

Software dependencies
  • cairosvg =2.4.2
Example

This wrapper can be used in the following way:

rule:
    input:
        "{prefix}.svg"
    output:
        "{prefix}.{fmt,(pdf|png)}"
    wrapper:
        "0.65.0/utils/cairosvg"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

_, ext = os.path.splitext(snakemake.output[0])

if ext not in (".png", ".pdf", ".ps", ".svg"):
    raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]

shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")

CLUSTALO

Multiple alignment of nucleic acid and protein sequences.

Software dependencies
  • clustalo ==1.2.4
Example

This wrapper can be used in the following way:

rule clustalo:
    input:
        "{sample}.fa"
    output:
        "{sample}.msa.fa"
    params:
        extra=""
    log:
        "logs/clustalo/test/{sample}.log"
    threads: 8
    wrapper:
        "0.65.0/bio/clustalo"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for clustal omega."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "clustalo {extra}"
    " --threads={snakemake.threads}"
    " --in {snakemake.input[0]}"
    " --out {snakemake.output[0]} "
    " {log}"
)

CUTADAPT

For cutadapt, the following wrappers are available:

CUTADAPT-PE

Trim paired-end reads using cutadapt.

Software dependencies
  • cutadapt ==2.10
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        fastq1="trimmed/{sample}.1.fastq",
        fastq2="trimmed/{sample}.2.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        # https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
        adapters = "-a AGAGCACACGTCTGAACTCCAGTCAC -g AGATCGGAAGAGCACACGT -A AGAGCACACGTCTGAACTCCAGTCAC -G AGATCGGAAGAGCACACGT",
        # https://cutadapt.readthedocs.io/en/stable/guide.html#
        others = "--minimum-length 1 -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "0.65.0/bio/cutadapt/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
  • David Laehnemann
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params.adapters}"
    " {snakemake.params.others}"
    " -o {snakemake.output.fastq1}"
    " -p {snakemake.output.fastq2}"
    " -j {snakemake.threads}"
    " {snakemake.input}"
    " > {snakemake.output.qc} {log}"
)
CUTADAPT-SE

Trim single-end reads using cutadapt.

Software dependencies
  • cutadapt ==2.10
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        "reads/{sample}.fastq"
    output:
        fastq="trimmed/{sample}.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "0.65.0/bio/cutadapt/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params}"
    " -j {snakemake.threads}"
    " -o {snakemake.output.fastq}"
    " {snakemake.input[0]}"
    " > {snakemake.output.qc} {log}"
)

DEEPTOOLS

For deeptools, the following wrappers are available:

DEEPTOOLS COMPUTEMATRIX

deepTools computeMatrix calculates scores per genomic region. The matrix file can be used as input for other tools or for the generation of a deepTools plotHeatmap or deepTools plotProfiles. For usage information about deepTools computeMatrix, please see the documentation. For more information about deepTools, also see the source code.

computeMatrix option Output format

Name of output

variable to be used

Recommended

extension

–outFileName, -out, -o gzipped matrix file

matrix_gz

(required)

“.gz”
–outFileNameMatrix

tab-separated table of

matrix file

matrix_tab “.tab”
–outFileSortedRegions

BED matrix file with sorted

regions after skipping zeros

or min/max threshold values

matrix_bed “.bed”
Software dependencies
  • deeptools ==3.4.3
Example

This wrapper can be used in the following way:

rule compute_matrix:
    input:
         # Please note that the -R and -S options are defined via input files
         bed=expand("{sample}.bed", sample=["a", "b"]),
         bigwig=expand("{sample}.bw", sample=["a", "b"])
    output:
        # Please note that --outFileName, --outFileNameMatrix and --outFileSortedRegions are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/computematrix.html.
        matrix_gz="matrix_files/matrix.gz",   # required
        # optional output files
        matrix_tab="matrix_files/matrix.tab",
        matrix_bed="matrix_files/matrix.bed"
    log:
        "logs/deeptools/compute_matrix.log"
    params:
        # required argument, choose "scale-regions" or "reference-point"
        command="scale-regions",
        # optional parameters
        extra="--regionBodyLength 200 --verbose"
    wrapper:
        "0.65.0/bio/deeptools/computematrix"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_tab = snakemake.output.get("matrix_tab")
out_bed = snakemake.output.get("matrix_bed")

optional_output = ""

if out_tab:
    optional_output += " --outFileNameMatrix {out_tab} ".format(out_tab=out_tab)

if out_bed:
    optional_output += " --outFileSortedRegions {out_bed} ".format(out_bed=out_bed)

shell(
    "(computeMatrix "
    "{snakemake.params.command} "
    "{snakemake.params.extra} "
    "-R {snakemake.input.bed} "
    "-S {snakemake.input.bigwig} "
    "-o {snakemake.output.matrix_gz} "
    "{optional_output}) {log}"
)
DEEPTOOLS PLOTFINGERPRINT

deepTools plotFingerprint plots a profile of cumulative read coverages from a list of indexed BAM files. For usage information about deepTools plotFingerprint, please see the documentation. For more information about deepTools, also see the source code.

In addition to required output, an optional output file of read counts can be generated by setting the output variable “counts” (see example Snakemake rule below).

plotFingerprint option Output

Name of output

variable to be used

Recommended

extension(s)

–plotFile, -plot, -o coverage plot

fingerprint

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outRawCounts

tab-separated table of

read counts per bin

counts “.tab”
Software dependencies
  • deeptools ==3.4.3
Example

This wrapper can be used in the following way:

rule plot_fingerprint:
    input:
         bam_files=expand("samples/{sample}.bam", sample=["a", "b"]),
         bam_idx=expand("samples/{sample}.bam.bai", sample=["a", "b"])
    output:
        # Please note that --plotFile and --outRawCounts are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotfingerprint.html.
        fingerprint="plot_fingerprint/plot_fingerprint.png",  # required
        # optional output
        counts="plot_fingerprint/raw_counts.tab"
    log:
        "logs/deeptools/plot_fingerprint.log"
    params:
        # optional parameters
        "--numberOfSamples 200 "
    wrapper:
        "0.65.0/bio/deeptools/plotfingerprint"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_counts = snakemake.output.get("counts")

optional_output = ""

if out_counts:
    optional_output += " --outRawCounts {out_counts} ".format(out_counts=out_counts)

shell(
    "(plotFingerprint "
    "-b {snakemake.input.bam_files} "
    "-o {snakemake.output.fingerprint} "
    "{optional_output} "
    "{snakemake.params}) {log}"
)
DEEPTOOLS PLOTHEATMAP

deepTools plotHeatmap creates a heatmap for scores associated with genomic regions. As input, it requires a matrix file generated by deepTools computeMatrix. For usage information about deepTools plotHeatmap, please see the documentation. For more information about deepTools, also see the source code.

You can select which optional output files are generated by adding the respective output variable with the recommended extension(s) for them (see example Snakemake rule below).

PlotHeatmap option Output

Name of output

variable to be used

Recommended

extension(s)

–outFileName, -out, -o plot image

heatmap_img

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outFileSortedRegions

BED file with

sorted regions

regions “.bed”
–outFileNameMatrix

tab-separated matrix

of values underlying

the heatmap

heatmap_matrix “.tab”
Software dependencies
  • deeptools ==3.4.3
Example

This wrapper can be used in the following way:

rule plot_heatmap:
    input:
         # matrix file from deepTools computeMatrix tool
         "matrix.gz"
    output:
        # Please note that --outFileSortedRegions and --outFileNameMatrix are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotheatmap.html.
        heatmap_img="plot_heatmap/heatmap.png",  # required
        # optional output files
        regions="plot_heatmap/heatmap_regions.bed",
        heatmap_matrix="plot_heatmap/heatmap_matrix.tab"
    log:
        "logs/deeptools/heatmap.log"
    params:
        # optional parameters
        "--plotType=fill "
    wrapper:
        "0.65.0/bio/deeptools/plotheatmap"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_region = snakemake.output.get("regions")
out_matrix = snakemake.output.get("heatmap_matrix")

optional_output = ""

if out_region:
    optional_output += " --outFileSortedRegions {out_region} ".format(
        out_region=out_region
    )

if out_matrix:
    optional_output += " --outFileNameMatrix {out_matrix} ".format(
        out_matrix=out_matrix
    )

shell(
    "(plotHeatmap "
    "-m {snakemake.input[0]} "
    "-o {snakemake.output.heatmap_img} "
    "{optional_output} "
    "{snakemake.params}) {log}"
)
DEEPTOOLS PLOTPROFILE

deepTools plotProfile plots scores over sets of genomic regions. As input, it requires a matrix file generated by deepToolscomputeMatrix. For usage information about deepTools plotProfile, please see the documentation. For more information about deepTools, also see the source code.

You can select which optional output files are generated by adding the respective output variable with the recommended extension for them (see example Snakemake rule below).

PlotProfile option Output

Name of output

variable to be used

Recommended

extension(s)

–outFileName, -out, -o profile plot

plot_img

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outFileSortedRegions

BED file with

sorted regions

regions “.bed”
–outFileNameData

tab-separated table

for average profile

data “.tab”
Software dependencies
  • deeptools ==3.4.3
Example

This wrapper can be used in the following way:

rule plot_profile:
    input:
         # matrix file from deepTools computeMatrix tool
         "matrix.gz"
    output:
        # Please note that --outFileSortedRegions and --outFileNameData are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotprofile.html.
        # Through the output variables image file and more output options for plot profile can be selected.
        plot_img="plot_profile/plot.png",  # required
        # optional output files
        regions="plot_profile/regions.bed",
        data="plot_profile/data.tab"
    log:
        "logs/deeptools/plot_profile.log"
    params:
        # optional parameters
        "--plotType=fill "
        "--perGroup "
        "--colors red yellow blue "
        "--dpi 150 "
    wrapper:
        "0.65.0/bio/deeptools/plotprofile"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_region = snakemake.output.get("regions")
out_data = snakemake.output.get("data")

optional_output = ""

if out_region:
    optional_output += " --outFileSortedRegions {out_region} ".format(
        out_region=out_region
    )

if out_data:
    optional_output += " --outFileNameData {out_data} ".format(out_data=out_data)

shell(
    "(plotProfile "
    "-m {snakemake.input[0]} "
    "-o {snakemake.output.plot_img} "
    "{optional_output} "
    "{snakemake.params}) {log}"
)

DEEPVARIANT

Call genetic variants using deep neural network. Copyright 2017 Google LLC. BSD 3-Clause “New” or “Revised” https://github.com/google/deepvariant

Software dependencies
  • deepvariant=0.10.0
  • tensorflow-estimator=2.0.0
  • unzip=6.0
Example

This wrapper can be used in the following way:

rule deepvariant:
    input:
        bam="mapped/{sample}.bam",
        ref="genome/genome.fasta"
    output:
        vcf="calls/{sample}.vcf.gz"
    params:
        model="wgs",   # {wgs, wes}
        extra=""
    threads: 2
    log:
        "logs/deepvariant/{sample}/stdout.log"
    wrapper:
        "0.65.0/bio/deepvariant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The extra param alllows for additional program arguments.
  • This snakemake wrapper uses bioconda deepvariant package. Copyright 2018 Brad Chapman.
Authors
  • Tetsuro Hisayoshi
Code
__author__ = "Tetsuro Hisayoshi"
__copyright__ = "Copyright 2020, Tetsuro Hisayoshi"
__email__ = "hisayoshi0530@gmail.com"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

log_dir = os.path.dirname(snakemake.log[0])
output_dir = os.path.dirname(snakemake.output[0])

# sample basename
basename = os.path.splitext(os.path.basename(snakemake.input.bam[0]))[0]


with tempfile.TemporaryDirectory() as tmp_dir:
    shell(
        "(dv_make_examples.py "
        "--cores {snakemake.threads} "
        "--ref {snakemake.input.ref} "
        "--reads {snakemake.input.bam} "
        "--sample {basename} "
        "--examples {tmp_dir} "
        "--logdir {log_dir} "
        "{extra} \n"
        "dv_call_variants.py "
        "--cores {snakemake.threads} "
        "--outfile {tmp_dir}/{basename}.tmp "
        "--sample {basename} "
        "--examples {tmp_dir} "
        "--model {snakemake.params.model} \n"
        "dv_postprocess_variants.py "
        "--ref {snakemake.input.ref} "
        "--infile {tmp_dir}/{basename}.tmp "
        "--outfile {snakemake.output.vcf} ) {log}"
    )

DELLY

Call variants with delly.

Software dependencies
  • delly ==0.8.1
Example

This wrapper can be used in the following way:

rule delly:
    input:
        ref="genome.fasta",
        samples=["mapped/a.bam"],
        # optional exclude template (see https://github.com/dellytools/delly)
        exclude="human.hg19.excl.tsv"
    output:
        "sv/calls.bcf"
    params:
        extra=""  # optional parameters for delly (except -g, -x)
    log:
        "logs/delly.log"
    threads: 2  # It is best to use as many threads as samples
    wrapper:
        "0.65.0/bio/delly"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


exclude = (
    "-x {}".format(snakemake.input.exlude) if snakemake.input.get("exlude", "") else ""
)

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
    "{exclude} -g {snakemake.input.ref} "
    "-o {snakemake.output[0]} {snakemake.input.samples} {log}"
)

EPIC

For epic, the following wrappers are available:

EPIC

Find broad enriched domains in ChIP-Seq data with epic

Software dependencies
  • epic =0.2.7
  • pandas =0.22.0
Example

This wrapper can be used in the following way:

rule epic:
    input:
      treatment = "bed/test.bed",
      background = "bed/control.bed"
    output:
      enriched_regions = "epic/enriched_regions.csv", # required
      bed = "epic/enriched_regions.bed", # optional
      matrix = "epic/matrix.gz" # optional
    log:
        "logs/epic/epic.log"
    params:
      genome = "hg19", # optional, default hg19
      extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
    threads: 1 # optional, defaults to 1
    wrapper:
        "0.65.0/bio/epic/peaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • All/any of the different bigwig options must be given as extra parameters
Authors
  • Endre Bakken Stovner
Code
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1

treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")

# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")

bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")

if len(snakemake.log) > 0:
    log = snakemake.log[0]

genome = snakemake.params.get("genome")

cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"

if bed:
    cmd += " -b {bed}"
if matrix:
    cmd += " -sm {matrix}"
if log:
    cmd += " -l {log}"

cmd += " {extra}"

shell(cmd)

FASTP

trim and QC fastq reads with fastp

Software dependencies
  • fastp ==0.20.0
Example

This wrapper can be used in the following way:

rule fastp_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        trimmed="trimmed/se/{sample}.fastq",
        html="report/se/{sample}.html",
        json="report/se/{sample}.json"
    log:
        "logs/fastp/se/{sample}.log"
    params:
        extra=""
    threads: 1
    wrapper:
        "0.65.0/bio/fastp"


rule fastp_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
        html="report/pe/{sample}.html",
        json="report/pe/{sample}.json"
    log:
        "logs/fastp/pe/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "0.65.0/bio/fastp"

rule fastp_pe_wo_trimming:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        html="report/pe_wo_trimming/{sample}.html",
        json="report/pe_wo_trimming/{sample}.json"
    log:
        "logs/fastp/pe_wo_trimming/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "0.65.0/bio/fastp"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "--in1 {}".format(snakemake.input.sample)
else:
    reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)

trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths is not None:
    if n == 1:
        trimmed = "--out1 {}".format(snakemake.output.trimmed)
    else:
        trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)
else:
    trimmed = ""

html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)

shell(
    "(fastp --thread {snakemake.threads} {snakemake.params.extra} "
    "{reads} "
    "{trimmed} "
    "{json} "
    "{html} ) {log}"
)

FASTQ_SCREEN

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

Software dependencies
  • fastq-screen ==0.5.2
  • bowtie2 ==2.2.6
  • bowtie ==1.1.2
Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq"
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png"
    params:
        fastq_screen_config="fastq_screen.conf",
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "0.65.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.
  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.
  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.
Authors
  • Ryan Dale
Code
import os
import re
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"

_config = snakemake.params["fastq_screen_config"]

subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
    tmp = tempfile.NamedTemporaryFile(delete=False).name
    with open(tmp, "w") as fout:
        for label, indexes in _config["database"].items():
            for aligner, index in indexes.items():
                fout.write(
                    "\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
                )
        for aligner, path in _config["aligner_paths"].items():
            fout.write("\t".join([aligner.upper(), path]) + "\n")
    config_file = tmp
else:
    config_file = _config

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]

tempdir = tempfile.mkdtemp()

shell(
    "fastq_screen --outdir {tempdir} "
    "--force "
    "--aligner {aligner} "
    "--conf {config_file} "
    "--subset {subset} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{snakemake.input[0]} "
    "{log}"
)

# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")

# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
    shell("rm {tmp}")

FASTQC

Generate fastq qc statistics using fastqc.

Software dependencies
  • fastqc ==0.11.9
Example

This wrapper can be used in the following way:

rule fastqc:
    input:
        "reads/{sample}.fastq"
    output:
        html="qc/fastqc/{sample}.html",
        zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
    params: ""
    log:
        "logs/fastqc/{sample}.log"
    threads: 1
    wrapper:
        "0.65.0/bio/fastqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for fastqc."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)

    split_ind = 2 if base.endswith(".fastq.gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} --quiet -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log:q}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")

FGBIO

For fgbio, the following wrappers are available:

FGBIO ANNOTATEBAMWITHUMIS

Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule AnnotateBam:
    input:
        bam="mapped/{sample}.bam",
        umi="umi/{sample}.fastq"
    output:
        "mapped/{sample}.annotated.bam"
    params: ""
    log:
        "logs/fgbio/annotate_bam/{sample}.log"
    wrapper:
        "0.65.0/bio/fgbio/annotatebamwithumis"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input.bam

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
    raise ValueError("Input bam should be a string: " + str(bam_input) + "!")

umi_input = snakemake.input.umi

if umi_input is None:
    raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
    raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")

if not len(snakemake.output) == 1:
    raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]


if output_file is None:
    raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
    raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")

shell(
    "fgbio AnnotateBamWithUmis"
    " -i {bam_input}"
    " -f {umi_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO CALLMOLECULARCONSENSUSREADS

Calls consensus sequences from reads with the same unique molecular tag.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule ConsensusReads:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.m3.bam"
    params:
        extra="-M 3"
    log:
        "logs/fgbio/consensus_reads/{sample}.log"
    wrapper:
        "0.65.0/bio/fgbio/callmolecularconsensusreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio CallMolecularConsensusReads"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO COLLECTDUPLEXSEQMETRICS

Collects a suite of metrics to QC duplex sequencing data.g.

Software dependencies
  • fgbio ==0.6.1
  • r-ggplot2
Example

This wrapper can be used in the following way:

rule CollectDuplexSeqMetrics:
    input:
        "mapped/{sample}.gu.bam"
    output:
        family_sizes="stats/{sample}.family_sizes.txt",
        duplex_family_sizes="stats/{sample}.duplex_family_sizes.txt",
        duplex_yield_metrics="stats/{sample}.duplex_yield_metrics.txt",
        umi_counts="stats/{sample}.umi_counts.txt",
        duplex_qc="stats/{sample}.duplex_qc.pdf",
        duplex_umi_counts="stats/{sample}.duplex_umi_counts.txt",
    params:
        extra=lambda wildcards: "-d " + wildcards.sample
    log:
        "logs/fgbio/collectduplexseqmetrics/{sample}.log"
    wrapper:
        "0.65.0/bio/fgbio/collectduplexseqmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from os import path

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

family_sizes = snakemake.output.family_sizes
duplex_family_sizes = snakemake.output.duplex_family_sizes
duplex_yield_metrics = snakemake.output.duplex_yield_metrics
umi_counts = snakemake.output.umi_counts
duplex_qc = snakemake.output.duplex_qc
duplex_umi_counts = snakemake.output.get("duplex_umi_counts", None)

file_path = str(path.dirname(family_sizes))
name = str(path.basename(family_sizes)).split(".")[0]
path_name_prefix = str(path.join(file_path, name))

if not family_sizes == path_name_prefix + ".family_sizes.txt":
    raise Exception(
        "Unexpected family_sizes path/name format, expected {}, got {}.".format(
            path_name_prefix + ".family_sizes.txt", family_sizes
        )
    )
if not duplex_family_sizes == path_name_prefix + ".duplex_family_sizes.txt":
    raise Exception(
        "Unexpected duplex_family_sizes path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_family_sizes.txt", duplex_family_sizes
        )
    )
if not duplex_yield_metrics == path_name_prefix + ".duplex_yield_metrics.txt":
    raise Exception(
        "Unexpected duplex_yield_metrics path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_yield_metrics.txt", duplex_yield_metrics
        )
    )
if not umi_counts == path_name_prefix + ".umi_counts.txt":
    raise Exception(
        "Unexpected umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".umi_counts.txt", umi_counts
        )
    )
if not duplex_qc == path_name_prefix + ".duplex_qc.pdf":
    raise Exception(
        "Unexpected duplex_qc path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_qc.pdf", duplex_qc
        )
    )
if (
    duplex_umi_counts is not None
    and not duplex_umi_counts == path_name_prefix + ".duplex_umi_counts.txt"
):
    raise Exception(
        "Unexpected duplex_umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_umi_counts.txt", duplex_umi_counts
        )
    )

duplex_umi_counts_flag = ""
if duplex_umi_counts is not None:
    duplex_umi_counts_flag = "-u "

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

shell(
    "fgbio CollectDuplexSeqMetrics"
    " -i {bam_input}"
    " -o {path_name_prefix}"
    " {duplex_umi_counts_flag}"
    " {extra_params}"
    " {log}"
)
FGBIO FILTERCONSENSUSREADS

Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule FilterConsensusReads:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.filtered.bam"
    params:
        extra="",
        min_base_quality=2,
        min_reads=[2, 2, 2],
        ref="genome.fasta"
    log:
        "logs/fgbio/filterconsensusreads/{sample}.log"
    threads: 1
    wrapper:
        "0.65.0/bio/fgbio/filterconsensusreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
  • min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
  • For more inforamtion see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
    raise ValueError("min_base_quality needs to be provided as an Int!")

min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
    raise ValueError(
        "min_reads needs to be provided as list of Ints, min length 1, max length 3!"
    )

ref = snakemake.params.get("ref", None)
if ref is None:
    raise ValueError("A reference needs to be provided!")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

bam_output = snakemake.output[0]

if not isinstance(bam_output, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(bam_output) + "!")

shell(
    "fgbio FilterConsensusReads"
    " -i {bam_input}"
    " -o {bam_output}"
    " -r {ref}"
    " --min-reads {min_reads}"
    " --min-base-quality {min_base_quality}"
    " {extra_params}"
    " {log}"
)
FGBIO GROUPREADSBYUMI

Groups reads together that appear to have come from the same original molecule.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule GroupReads:
    input:
        "mapped/a.bam"
    output:
        bam="mapped/{sample}.gu.bam",
        hist="mapped/{sample}.gu.histo.tsv",
    params:
        extra="-s adjacency --edits 1"
    log:
        "logs/fgbio/group_reads/{sample}.log"
    wrapper:
        "0.65.0/bio/fgbio/groupreadsbyumi"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_bam_file = snakemake.output.bam

if not isinstance(output_bam_file, str) and len(output_bam_file) != 1:
    raise ValueError("Bam output should be one bam file: " + str(output_bam_file) + "!")

output_histo_file = snakemake.output.hist

if not isinstance(output_histo_file, str) and len(output_histo_file) != 1:
    raise ValueError(
        "Histo output should be one histogram file path: "
        + str(output_histo_file)
        + "!"
    )

shell(
    "fgbio GroupReadsByUmi"
    " -i {bam_input}"
    " -o {output_bam_file}"
    " -f {output_histo_file}"
    " {extra_params}"
    " {log}"
)
FGBIO SETMATEINFORMATION

Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule SetMateInfo:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.mi.bam"
    params: ""
    log:
        "logs/fgbio/set_mate_info/{sample}.log"
    wrapper:
        "0.65.0/bio/fgbio/setmateinformation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio SetMateInformation"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)

FILTLONG

Quality filtering tool for long reads.

Software dependencies
  • filtlong=0.2.0=he941832_2
Example

This wrapper can be used in the following way:

rule filtlong:
    input:
        reads = "{sample}.fastq"
    output:
        "{sample}.filtered.fastq"
    params:
        extra=" --mean_q_weight 5.0",
        target_bases = 10
    log:
        "logs/filtlong/test/{sample}.log"
    wrapper:
        "0.65.0/bio/filtlong"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for filtlong."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
target_bases = int(snakemake.params.get("target_bases", 0))
if target_bases > 0:
    extra += " --target_bases {}".format(target_bases)

# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Executed shell command
shell("filtlong {extra}" " {snakemake.input.reads} > {snakemake.output} {log}")

FREEBAYES

Call small genomic variants with freebayes.

Software dependencies
  • freebayes ==1.3.1
  • bcftools ==1.10
  • parallel ==20190522
  • bedtools >=2.29
  • sed ==4.7
Example

This wrapper can be used in the following way:

rule freebayes:
    input:
        ref="genome.fasta",
        # you can have a list of samples here
        samples="mapped/{sample}.bam",
        # the matching BAI indexes have to present for freebayes
        indexes="mapped/{sample}.bam.bai"
        # optional BED file specifying chromosomal regions on which freebayes
        # should run, e.g. all regions that show coverage
        #regions="/path/to/region-file.bed"
    output:
        "calls/{sample}.vcf"  # either .vcf or .bcf
    log:
        "logs/freebayes/{sample}.log"
    params:
        extra="",         # optional parameters
        chunksize=100000  # reference genome chunk size for parallelization (default: 100000)
    threads: 2
    wrapper:
        "0.65.0/bio/freebayes"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Felix Mölder
Code
__author__ = "Johannes Köster, Felix Mölder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = snakemake.params.get("extra", "")

pipe = ""
if snakemake.output[0].endswith(".bcf"):
    pipe = "| bcftools view -Ob -"

if snakemake.threads == 1:
    freebayes = "freebayes"
else:
    chunksize = snakemake.params.get("chunksize", 100000)
    regions = "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
        snakemake=snakemake, chunksize=chunksize
    )
    if snakemake.input.get("regions", ""):
        regions = (
            "<(bedtools intersect -a "
            r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
            "{regions}) -b {snakemake.input.regions} | "
            r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
        ).format(regions=regions, snakemake=snakemake)
    freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
        snakemake=snakemake, regions=regions
    )

shell(
    "({freebayes} {params} -f {snakemake.input.ref}"
    " {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}"
)

GATK

For gatk, the following wrappers are available:

GATK APPLYBQSR

Run gatk ApplyBQSR.

Software dependencies
  • gatk4 ==4.1.4.1
  • openjdk =8
Example

This wrapper can be used in the following way:

rule gatk_applybqsr:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        recal_table="recal/{sample}.grp"
    output:
        bam="recal/{sample}.bam"
    log:
        "logs/gatk/gatk_applybqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/applybqsr"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
    "gatk --java-options '{java_opts}' ApplyBQSR {extra} -R {snakemake.input.ref} -I {snakemake.input.bam} "
    "--bqsr-recal-file {snakemake.input.recal_table} "
    "-O {snakemake.output.bam} {log}"
)
GATK BASERECALIBRATOR

Run gatk BaseRecalibrator.

Software dependencies
  • gatk4 ==4.1.4.1
  • openjdk =8
Example

This wrapper can be used in the following way:

rule gatk_baserecalibrator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        known="dbsnp.vcf.gz"  # optional known sites
    output:
        recal_table="recal/{sample}.grp"
    log:
        "logs/gatk/baserecalibrator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
    known = "--known-sites {}".format(known)

shell(
    "gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
    "-R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output.recal_table} {known} {log}"
)
GATK BASERECALIBRATORSPARK

Run gatk BaseRecalibratorSpark.

Software dependencies
  • gatk4 ==4.1.4.1
  • openjdk =8
Example

This wrapper can be used in the following way:

rule gatk_baserecalibratorspark:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        known="dbsnp.vcf.gz"  # optional known sites
    output:
        recal_table="recal/{sample}.grp"
    log:
        "logs/gatk/baserecalibrator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_0.65.0="",  # optional
        #spark_extra="", # optional
    threads: 8
    wrapper:
        "0.65.0/bio/gatk/baserecalibratorspark"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param allows for additional program arguments for baserecalibratorspark.
  • The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
  • The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
  • The ‘spark_extra’ param allows for additional spark arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA-
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
    "spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
    known = "--known-sites {}".format(known)

shell(
    "gatk --java-options '{java_opts}' BaseRecalibratorSpark {extra} "
    "-R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output.recal_table} {known} "
    "-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
    "{log}"
)
GATK COMBINEGVCFS

Run gatk CombineGVCFs.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
        ref="genome.fasta"
    output:
        gvcf="calls/all.g.vcf",
    log:
        "logs/gatk/combinegvcfs.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    wrapper:
        "0.65.0/bio/gatk/combinegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' CombineGVCFs {extra} "
    "{gvcfs} "
    "-R {snakemake.input.ref} "
    "-O {snakemake.output.gvcf} {log}"
)
GATK GENOTYPEGVCFS

Run gatk GenotypeGVCFs.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcf="calls/all.g.vcf",  # combined gvcf over multiple samples
        ref="genome.fasta"
    output:
        vcf="calls/all.vcf",
    log:
        "logs/gatk/genotypegvcfs.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/genotypegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
    "-V {snakemake.input.gvcf} "
    "-R {snakemake.input.ref} "
    "-O {snakemake.output.vcf} {log}"
)
GATK HAPLOTYPECALLER

Run gatk HaplotypeCaller.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule haplotype_caller:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
        # known="dbsnp.vcf"  # optional
    output:
        gvcf="calls/{sample}.g.vcf",
    log:
        "logs/gatk/haplotypecaller/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell

known = snakemake.input.get("known", "")
if known:
    known = "--dbsnp " + known

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
bams = snakemake.input.bam
if isinstance(bams, str):
    bams = [bams]
bams = list(map("-I {}".format, bams))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
    "-R {snakemake.input.ref} {bams} "
    "-ERC GVCF "
    "-O {snakemake.output.gvcf} {known} {log}"
)
GATK MUTECT2

Call somatic SNVs and indels via local assembly of haplotypes

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule mutect2:
    input:
        fasta = "genome/genome.fasta",
        map = "mapped/{sample}.bam"
    output:
        vcf = "variant/{sample}.vcf"
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads:
        1
    log:
        "logs/mutect_{sample}.log"
    wrapper:
         "0.65.0/bio/gatk/mutect"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for GATK4 Mutect2"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "gatk Mutect2 "  # Tool and its subprocess
    "--input {snakemake.input.map} "  # Path to input mapping file
    "--output {snakemake.output.vcf} "  # Path to output vcf file
    "--reference {snakemake.input.fasta} "  # Path to reference fasta file
    "{extra} "  # Extra parameters
    "{log}"  # Logging behaviour
)
GATK SELECTVARIANTS

Run gatk SelectVariants.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule gatk_select:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.vcf"
    log:
        "logs/gatk/select/snvs.log"
    params:
        extra="--select-type-to-include SNP",  # optional filter arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/selectvariants"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} -O {snakemake.output.vcf} {log}"
)
GATK SPLITNCIGARREADS

Run gatk SplitNCigarReads.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule splitncigarreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
    output:
        "split/{sample}.bam"
    log:
        "logs/gatk/splitNCIGARreads/{sample}.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    wrapper:
        "0.65.0/bio/gatk/splitncigarreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SplitNCigarReads {extra} "
    " -R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output} {log}"
)
GATK VARIANTFILTRATION

Run gatk VariantFiltration.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule gatk_filter:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.filtered.vcf"
    log:
        "logs/gatk/filter/snvs.log"
    params:
        filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
        extra="",  # optional arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/variantfiltration"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
filters = [
    "--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
    for name, expr in snakemake.params.filters.items()
]

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} {filters} -O {snakemake.output.vcf} {log}"
)
GATK VARIANTRECALIBRATOR

Run gatk VariantRecalibrator.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

from snakemake.remote import GS

# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()


def gatk_bundle(f):
    return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))


rule haplotype_caller:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
        # resources have to be given as named input files
        hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
        omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
        g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
        dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
        # use aux to e.g. download other necessary file
        aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
             gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
    output:
        vcf="calls/all.recal.vcf",
        tranches="calls/all.tranches"
    log:
        "logs/gatk/variantrecalibrator.log"
    params:
        mode="SNP",  # set mode, must be either SNP, INDEL or BOTH
        # resource parameter definition. Key must match named input files from above.
        resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
                   "omni":   {"known": False, "training": True, "truth": False, "prior": 12.0},
                   "g1k":   {"known": False, "training": True, "truth": False, "prior": 10.0},
                   "dbsnp":  {"known": True, "training": False, "truth": False, "prior": 2.0}},
        annotation=["QD", "FisherStrand"],  # which fields to use with -an (see VariantRecalibrator docs)
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.65.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")


def fmt_res(resname, resparams):
    fmt_bool = lambda b: str(b).lower()
    try:
        f = snakemake.input.get(resname)
    except KeyError:
        raise RuntimeError(
            "There must be a named input file for every resource (missing: {})".format(
                resname
            )
        )
    return "{},known={},training={},truth={},prior={}:{}".format(
        resname,
        fmt_bool(resparams["known"]),
        fmt_bool(resparams["training"]),
        fmt_bool(resparams["truth"]),
        resparams["prior"],
        f,
    )


resources = [
    "--resource {}".format(fmt_res(resname, resparams))
    for resname, resparams in snakemake.params["resources"].items()
]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
    tranches = "--tranches-file " + snakemake.output.tranches

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantRecalibrator {extra} {resources} "
    "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "-mode {snakemake.params.mode} "
    "--output {snakemake.output.vcf} "
    "{tranches} {annotation} {log}"
)

GATK3

For gatk3, the following wrappers are available:

GATK3 BASERECALIBRATOR

Run gatk3 BaseRecalibrator.

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule baserecalibrator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        "{sample}.recal_data_table"
    log:
        "logs/gatk3/bqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    threads: 16
    wrapper:
        "bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + "  --knownSites {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T BaseRecalibrator"
    " -nct {snakemake.threads}"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 INDELREALIGNER

Run gatk3 IndelRealigner

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule indelrealigner:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz",
        target_intervals="{sample}.intervals"
    output:
        bam="realigned/{sample}.bam"
    log:
        "logs/gatk3/indelrealigner/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    threads: 16
    wrapper:
        "bio/gatk/indelrealigner"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
input_target_intervals = snakemake.input.target_intervals

bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + " -known {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T IndelRealigner"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " --targetIntervals {input_target_intervals}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 PRINTREADS

Run gatk3 PrintReads

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule printreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        recal_data="{sample}.recal_data_table"
    output:
        "alignment/{sample}.bqsr.bam"
    log:
        "logs/gatk/bqsr/{sample}..log"
    params:
        extra="",  # optional
        java_opts="",
    threads: 16
    wrapper:
        "bio/gatk3/printreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_recal_data = snakemake.input.recal_data
input_ref = snakemake.input.ref

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T PrintReads"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " -BQSR {input_recal_data}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 REALIGNERTARGETCREATOR

Run gatk3 RealignerTargetCreator

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule realignertargetcreator:
    input:
        bam="mapped/{sample}.bam"
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        "{sample}.intervals"
    log:
        "logs/gatk/realignertargetcreator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="",
    threads: 16
    wrapper:
        "bio/gatk3/realignertargetcreator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + " --known {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk3 {java_opts} -T RealignerTargetCreator"
    " -nt {snakemake.threads}"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " -o {snakemake.output}"
    " {log}"
)

GDC-API

For gdc-api, the following wrappers are available:

GDC API-BASED DATA DOWNLOAD OF BAM SLICES

Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.

Software dependencies
  • curl ==7.69.1
Example

This wrapper can be used in the following way:

rule gdc_api_bam_slice_download:
    output:
        bam="raw/{sample}.bam",
    log:
        "logs/gdc-api/bam-slicing/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
        # a gdc_token is required for controlled access and all BAM files
        # on GDC seem to be controlled access (adjust if this changes)
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # provide wanted `region=` or `gencode=` slices joined with `&`
        slices="region=chr22&region=chr5:1000-2000&region=unmapped&gencode=BRCA2",
        # extra command line arguments passed to curl
        extra=""
    wrapper:
        "0.65.0/bio/gdc-api/bam-slicing"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • David Lähnemann
Code
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
    raise ValueError(
        "You need to provide a GDC data access token file via the 'token' in 'params'."
    )
token = ""
with open(token_file) as tf:
    token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)

slices = snakemake.params.get("slices", "")
if slices == "":
    raise ValueError(
        "You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s)  via the 'slices' in 'params'."
    )

extra = snakemake.params.get("extra", "")

shell(
    "curl --silent"
    " --header $CURL_HEADER_TOKEN"
    " 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
    " {extra}"
    " --output {snakemake.output.bam} {log}"
)

if os.path.getsize(snakemake.output.bam) < 100000:
    with open(snakemake.output.bam) as f:
        if "error" in f.read():
            shell("cat {snakemake.output.bam} {log}")
            raise RuntimeError(
                "Your GDC API request returned an error, check your log file for the error message."
            )

GDC-CLIENT

For gdc-client, the following wrappers are available:

GDC DATA TRANSFER TOOL DATA DOWNLOAD

Download GDC data files with the gdc-client.

Software dependencies
  • gdc-client ==1.5.0
Example

This wrapper can be used in the following way:

rule gdc_download:
    output:
        # the file extension (up to two components, here .maf.gz), has
        # to uniquely map to one of the files downloaded for that UUID
        "raw/{sample}.maf.gz"
    log:
        "logs/gdc-client/download/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
        # a gdc_token is only required for controlled access samples,
        # leave blank otherwise (`gdc_token=""`) or skip this param entirely
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # for valid extra command line arguments, check command line help or:
        # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
        extra = ""
    threads: 4
    wrapper:
        "0.65.0/bio/gdc-client/download"

rule gdc_download_bam:
    output:
        # specify all the downloaded files you want to keep, as all other
        # downloaded files will be removed automatically e.g. for
        # BAM data this could be
        "raw/{sample}.bam",
        "raw/{sample}.bam.bai",
        "raw/{sample}.annotations.txt",
        directory("raw/{sample}/logs")
    log:
        "logs/gdc-client/download/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
        # a gdc_token is only required for controlled access samples,
        # leave blank otherwise (`gdc_token=""`) or skip this param entirely
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # for valid extra command line arguments, check command line help or:
        # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
        extra = ""
    threads: 4
    wrapper:
        "0.65.0/bio/gdc-client/download"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Lähnemann
Code
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
from tempfile import TemporaryDirectory
import glob

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

extra = snakemake.params.get("extra", "")
token = snakemake.params.get("gdc_token", "")
if token != "":
    token = "--token-file {}".format(token)

with TemporaryDirectory() as tempdir:
    shell(
        "gdc-client download"
        " {token}"
        " {extra}"
        " -n {snakemake.threads} "
        " --log-file {snakemake.log} "
        " --dir {tempdir}"
        " {uuid}"
    )

    for out_path in snakemake.output:
        tmp_path = path.join(tempdir, uuid, path.basename(out_path))
        if not path.exists(tmp_path):
            (root, ext1) = path.splitext(out_path)
            paths = glob.glob(path.join(tempdir, uuid, "*" + ext1))
            if len(paths) > 1:
                (root, ext2) = path.splitext(root)
                paths = glob.glob(path.join(tempdir, uuid, "*" + ext2 + ext1))
            if len(paths) == 0:
                raise ValueError(
                    "{} file extension {} does not match any downloaded file.\n"
                    "Are you sure that UUID {} provides a file of such format?\n".format(
                        out_path, ext1, uuid
                    )
                )
            if len(paths) > 1:
                raise ValueError(
                    "Found more than one downloaded file with extension '{}':\n"
                    "{}\n"
                    "Cannot match requested output file {} unambiguously.\n".format(
                        ext2 + ext1, paths, out_path
                    )
                )
            tmp_path = paths[0]
        shell("mv {tmp_path} {out_path}")

GENOMEPY

Download genomes the easy way: https://github.com/vanheeringen-lab/genomepy

Software dependencies
  • bioconda::genomepy==0.8.3
Example

This wrapper can be used in the following way:

rule genomepy:
    output:
        multiext("{assembly}/{assembly}", ".fa", ".fa.fai", ".fa.sizes", ".gaps.bed",
                 ".annotation.gtf.gz", ".blacklist.bed")
    log:
        "logs/genomepy_{assembly}.log"
    params:
        provider="UCSC"  # optional, defaults to ucsc. Choose from ucsc, ensembl, and ncbi
    cache: True  # mark as eligible for between workflow caching
    wrapper:
        "0.65.0/bio/genomepy"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Maarten van der Sande
Code
__author__ = "Maarten van der Sande"
__copyright__ = "Copyright 2020, Maarten van der Sande"
__email__ = "M.vanderSande@science.ru.nl"
__license__ = "MIT"


from snakemake.shell import shell

# Optional parameters
provider = snakemake.params.get("provider", "UCSC")

# set options for plugins
all_plugins = "blacklist,bowtie2,bwa,gmap,hisat2,minimap2,star"
req_plugins = ","
if any(["blacklist" in out for out in snakemake.output]):
    req_plugins = "blacklist,"

annotation = ""
if any(["annotation" in out for out in snakemake.output]):
    annotation = "--annotation"

# parse the genome dir
genome_dir = "./"
if snakemake.output[0].count("/") > 1:
    genome_dir = "/".join(snakemake.output[0].split("/")[:-1])

log = snakemake.log

# Finally execute genomepy
shell(
    """
    # set a trap so we can reset to original user's settings
    active_plugins=$(genomepy config show | grep -Po '(?<=- ).*' | paste -s -d, -) || echo ""
    trap "genomepy plugin disable {{{all_plugins}}} >> {log} 2>&1;\
          genomepy plugin enable {{$active_plugins,}} >> {log} 2>&1" EXIT

    # disable all, then enable the ones we need
    genomepy plugin disable {{{all_plugins}}} >  {log} 2>&1
    genomepy plugin enable  {{{req_plugins}}} >> {log} 2>&1

    # install the genome
    genomepy install {snakemake.wildcards.assembly} \
    {provider} {annotation} -g {genome_dir} >> {log} 2>&1
    """
)

GRIDSS

For gridss, the following wrappers are available:

GRIDSS ASSEMBLE

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. assemble performs GRIDSS breakend assembly. Documentation at: https://github.com/PapenfussLab/gridss

Software dependencies
  • gridss ==2.9.4
Example

This wrapper can be used in the following way:

WORKING_DIR = "working_dir"
samples = ["A", "B"]

preprocess_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".idsv_metrics",
    ".insert_size_histogram.pdf",
    ".insert_size_metrics",
    ".mapq_metrics",
    ".sv.bam",
    ".sv.bam.bai",
    ".sv_metrics",
    ".tag_metrics",
    )

assembly_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".downsampled_0.bed",
    ".excluded_0.bed",
    ".idsv_metrics",
    ".mapq_metrics",
    ".quality_distribution.pdf",
    ".quality_distribution_metrics",
    ".subsetCalled_0.bed",
    ".sv.bam",
    ".sv.bam.bai",
    ".tag_metrics",
    )

reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")

rule gridss_assemble:
    input:
        bams=expand("mapped/{sample}.bam", sample=samples),
        bais=expand("mapped/{sample}.bam.bai", sample=samples),
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", *reference_index_endings),
        preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings)
    output:
        assembly="assembly/group.bam",
        assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/assemble/group.log"
    threads:
        100
    wrapper:
        "0.65.0/bio/gridss/assemble"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss assemble"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")

if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s assemble "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "  # Threads
    "--workingdir {snakemake.params.workingdir} "  # Working directory
    "--assembly {snakemake.output.assembly} "  # Assembly output
    "{snakemake.input.bams} "
    "{extra}) {log}"
)
GRIDSS CALL

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. call performs variant calling. Documentation at: https://github.com/PapenfussLab/gridss

Software dependencies
  • gridss ==2.9.4
  • cpulimit =0.2
Example

This wrapper can be used in the following way:

WORKING_DIR = "working_dir"
samples = ["A", "B"]

preprocess_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".idsv_metrics",
    ".insert_size_histogram.pdf",
    ".insert_size_metrics",
    ".mapq_metrics",
    ".sv.bam",
    ".sv.bam.bai",
    ".sv_metrics",
    ".tag_metrics",
    )

assembly_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".downsampled_0.bed",
    ".excluded_0.bed",
    ".idsv_metrics",
    ".mapq_metrics",
    ".quality_distribution.pdf",
    ".quality_distribution_metrics",
    ".subsetCalled_0.bed",
    ".sv.bam",
    ".sv.bam.bai",
    ".tag_metrics",
    )

reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")

rule gridss_call:
    input:
        bams=expand("mapped/{sample}.bam", sample=samples),
        bais=expand("mapped/{sample}.bam.bai", sample=samples),
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", *reference_index_endings),
        preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings),
        assembly="assembly/group.bam",
        assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
    output:
        vcf="vcf/group.vcf",
        idx="vcf/group.vcf.idx",
        tmpidx=temp(WORKING_DIR + "/group.vcf.gridss.working/group.vcf.allocated.vcf.idx") # be aware the group occurs two times here
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/call/group.log"
    threads:
        100
    wrapper:
        "0.65.0/bio/gridss/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss call"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(export JAVA_OPTS='-XX:ActiveProcessorCount={snakemake.threads}' & "
    "gridss -s call "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "  # Threads
    "--workingdir {snakemake.params.workingdir} "  # Working directory
    "--assembly {snakemake.input.assembly} "  # Assembly input from gridss assemble
    "--output {snakemake.output.vcf} "  # Assembly vcf
    "{snakemake.input.bams} "
    "{extra}) {log}"
)
GRIDSS PREPROCESS

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess pre-processes input BAM files. Can be run per input file. Documentation at: https://github.com/PapenfussLab/gridss

Software dependencies
  • gridss ==2.9.4
Example

This wrapper can be used in the following way:

WORKING_DIR="working_dir"

rule gridss_preprocess:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
    output:
        multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics")
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
    threads:
        8
    wrapper:
        "0.65.0/bio/gridss/preprocess"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss preprocess"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s preprocess "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "
    "--workingdir {snakemake.params.workingdir} "
    "{snakemake.input.bam} "
    "{extra}) {log}"
)
GRIDSS SETUPREFERENCE

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. setupreference is a once-off setup generating additional files in the same directory as the reference. WARNING multiple instances of GRIDSS attempting to perform setupreference at the same time will result in file corruption. Make sure these files are generated before running parallel GRIDSS jobs. Documentation at: https://github.com/PapenfussLab/gridss

Software dependencies
  • gridss ==2.9.4
Example

This wrapper can be used in the following way:

rule gridss_setupreference:
    input:
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
    output:
        multiext("reference/genome.fasta", ".gridsscache", ".img")
    params:
        extra="--jvmheap 1g"
    log:
        "log/gridss/setupreference.log"
    wrapper:
        "0.65.0/bio/gridss/setupreference"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss setupreference"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference", None)

if not snakemake.input.reference:
    raise ValueError("A reference genome has to be provided!")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s setupreference "  # Tool
    "--reference {reference} "  # Reference
    "{extra}) {log}"
)

HAP.PY

For hap.py, the following wrappers are available:

PRE.PY

Preprocessing/normalisation of vcf/bcf files. Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).

Software dependencies
  • hap.py =0.3.10
Example

This wrapper can be used in the following way:

rule preprocess_variants:
    input:
        ##vcf/bcf
        variants="variants.vcf"
    output:
        "normalized/variants.vcf"
    params:
        ## path to reference genome
        genome="genome.fasta",
        ## parameters such as -L to left-align variants
        extra="-L"
    threads: 2
    wrapper:
        "0.65.0/bio/hap.py/pre.py"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "(pre.py"
    " --threads {snakemake.threads}"
    " -r {snakemake.params.genome}"
    " {extra}"
    " {snakemake.input.variants}"
    " {snakemake.output})"
    " {log}"
)

HISAT2

For hisat2, the following wrappers are available:

HISAT2 ALIGN

Map reads with hisat2.

Software dependencies
  • hisat2 ==2.1.0
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule hisat2_align:
    input:
      idx = "index/",
      reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"]
    output:
      "mapped/{sample}.bam"
    log:
        "logs/hisat2_align_{sample}.log"
    params:
      extra = ""
    threads: 2
    wrapper:
      "0.65.0/bio/hisat2/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The -S flag must not be used since output is already directly piped to samtools for compression.
  • The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
  • The wrapper does not yet handle SRA input accessions.
  • No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()

# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
    input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
    input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
    input_flags = "-1 {0} -2 {1}".format(*reads)
else:
    raise RuntimeError(
        "Reads parameter must contain at least 1 and at most 2" " input files."
    )

# Executed shell command
shell(
    "(hisat2 {extra} "
    "--threads {snakemake.threads} "
    " -x {snakemake.input.idx} {input_flags} "
    " | samtools view -Sbh -o {snakemake.output[0]} -) "
    " {log}"
)
HISAT2 INDEX

Create index with hisat2.

Software dependencies
  • hisat2 ==2.1.0
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule hisat2_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("index_{genome}")
    params:
        prefix = "index_{genome}/"
    log:
        "logs/hisat2_index_{genome}.log"
    threads: 2
    wrapper:
        "0.65.0/bio/hisat2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for HISAT2 index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
    input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta

hisat_dir = snakemake.params.get("prefix", "")
if hisat_dir:
    os.makedirs(hisat_dir)

shell(
    "hisat2-build {extra} "
    "-p {snakemake.threads} "
    "{input_seq} "
    "{snakemake.params.prefix} "
    "{log}"
)

HMMER

For hmmer, the following wrappers are available:

HMMBUILD

hmmbuild: construct profile HMM(s) from multiple sequence alignment(s)

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmbuild_profile:
    input:
        "test-profile.sto"
    output:
        "test-profile.hmm"
    log:
        "logs/test-profile-hmmbuild.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "0.65.0/bio/hmmer/hmmbuild"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmbuild"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmbuild {extra} --cpu {snakemake.threads} "
    " {snakemake.output} {snakemake.input} {log} "
)
HMMPRESS

Format an HMM database into a binary format for hmmscan.

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmpress_profile:
    input:
        "test-profile.hmm"
    output:
        "test-profile.hmm.h3f",
        "test-profile.hmm.h3i",
        "test-profile.hmm.h3m",
        "test-profile.hmm.h3p"
    log:
        "logs/hmmpress.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "0.65.0/bio/hmmer/hmmpress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# -f Force; overwrites any previous hmmpress-ed datafiles. The default is to bitch about any existing files and ask you to delete them first.

shell("hmmpress -f {snakemake.input} {log}")
HMMSCAN

search protein sequence(s) against a protein profile database

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmscan_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        pfamtblout="test-prot-pfamtbl.txt", # save table of hits and domains to file, in Pfam format <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmscan.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmscan will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "0.65.0/bio/hmmer/hmmscan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# save table of hits and domains to file, in Pfam format <f>
pfamtblout = snakemake.output.get("pfamtblout", "")
if pfamtblout:
    out_cmd += " --pfamtblout {} ".format(pfamtblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "hmmscan {out_cmd} {thresh_cmd} --cpu {snakemake.threads}"
    " {extra} {profile} {snakemake.input.fasta} {log}"
)
HMMSEARCH

search profile(s) against a sequence database

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmsearch_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        alignment_hits="test-prot-alignment-hits.txt", # Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmsearch.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmsearch will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "0.65.0/bio/hmmer/hmmsearch"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmsearch"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
alignment_hits = snakemake.output.get("alignment_hits", "")
if alignment_hits:
    out_cmd += " -A {} ".format(alignment_hits)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmsearch --cpu {snakemake.threads} "
    " {out_cmd} {thresh_cmd} {extra} {profile} "
    " {snakemake.input.fasta} {log}"
)

HOMER

For homer, the following wrappers are available:

HOMER FINDPEAKS

Find ChIP- or ATAC-Seq peaks with the HOMER suite. For more information, please see the documentation.

Software dependencies
  • homer ==4.11
Example

This wrapper can be used in the following way:

rule homer_findPeaks:
    input:
        # tagDirectory of sample
        tag="tagDir/{sample}",
        # tagDirectory of control background sample - optional
        control="tagDir/control"
    output:
        "{sample}_peaks.txt"
    params:
        # one of 7 basic modes of operation, see homer manual
        style="histone",
        extra=""  # optional params, see homer manual
    log:
        "logs/findPeaks/{sample}.log"
    wrapper:
        "0.65.0/bio/homer/findPeaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

control = snakemake.input.get("control", "")
if control == "":
    control_command = ""
else:
    control_command = "-i " + control

shell(
    "(findPeaks"
    " {snakemake.input.tag}"
    " -style {snakemake.params.style}"
    " {extra}"
    " {control_command}"
    " -o {snakemake.output})"
    " {log}"
)
HOMER GETDIFFERENTIALPEAKS

Detect differentially bound ChIP peaks between samples. For more information, please see the documentation.

Software dependencies
  • homer ==4.11
Example

This wrapper can be used in the following way:

rule homer_getDifferentialPeaks:
    input:
        # peak/bed file to be tested
        peaks="{sample}.peaks.bed",
        # tagDirectory of first sample
        first="tagDir/{sample}",
        # tagDirectory of sample to compare
        second="tagDir/second"
    output:
        "{sample}_diffPeaks.txt"
    params:
        extra=""  # optional params, see homer manual
    log:
        "logs/diffPeaks/{sample}.log"
    wrapper:
        "0.65.0/bio/homer/getDifferentialPeaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(getDifferentialPeaks"
    " {snakemake.input.peaks}"
    " {snakemake.input.first}"
    " {snakemake.input.second}"
    " {extra}"
    " > {snakemake.output})"
    " {log}"
)
HOMER MAKETAGDIRECTORY

Create a tag directory with the HOMER suite. For more information, please see the documentation.

Software dependencies
  • homer ==4.11
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule homer_makeTagDir:
    input:
        # input bam, can be one or a list of files
        bam="{sample}.bam",
    output:
        directory("tagDir/{sample}")
    params:
        extra=""  # optional params, see homer manual
    log:
        "logs/makeTagDir/{sample}.log"
    wrapper:
        "0.65.0/bio/homer/makeTagDirectory"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(makeTagDirectory" " {snakemake.output}" " {extra}" " {snakemake.input})" " {log}"
)
HOMER MERGEPEAKS

Merge ChIP-Seq peaks from multiple peak files. For more information, please see the documentation. Please be aware that this wrapper does not yet support use of the -prefix parameter.

Software dependencies
  • homer ==4.11
Example

This wrapper can be used in the following way:

rule homer_mergePeaks:
    input:
        # input peak files
        "peaks/{sample1}.peaks",
        "peaks/{sample2}.peaks"
    output:
        "merged/{sample1}_{sample2}.peaks"
    params:
        extra="-d given"  # optional params, see homer manual
    log:
        "logs/mergePeaks/{sample1}_{sample2}.log"
    wrapper:
        "0.65.0/bio/homer/mergePeaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


class PrefixNotSupportedError(Exception):
    pass


if "-prefix" in extra:
    raise PrefixNotSupportedError(
        "The use of the -prefix parameter is not yet supported in this wrapper"
    )

shell("(mergePeaks" " {snakemake.input}" " {extra}" " > {snakemake.output})" " {log}")

IGV-REPORTS

Create self-contained igv.js HTML pages.

Software dependencies
  • igv-reports =0.9.1
Example

This wrapper can be used in the following way:

rule igv_report:
    input:
        fasta="minigenome.fa",
        vcf="variants.vcf",
        # any number of additional optional tracks, see igv-reports manual
        tracks=["alignments.bam"]
    output:
        "igv-report.html"
    params:
        extra=""  # optional params, see igv-reports manual
    log:
        "logs/igv-report.log"
    wrapper:
        "0.65.0/bio/igv-reports"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
"""Snakemake wrapper for igv-reports."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

tracks = snakemake.input.get("tracks", [])
if tracks:
    if isinstance(tracks, str):
        tracks = [tracks]
    tracks = "--tracks {}".format(" ".join(tracks))

shell(
    "create_report {extra} --standalone --output {snakemake.output[0]} {snakemake.input.vcf} {snakemake.input.fasta} {tracks} {log}"
)

INFERNAL

For infernal, the following wrappers are available:

INFERNAL CMPRESS

Starting from a CM database <cmfile> in standard Infernal-1.1 format, construct binary compressed datafiles for cmscan. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Software dependencies
  • infernal=1.1.2
Example

This wrapper can be used in the following way:

rule infernal_cmpress:
    input:
        "test-covariance-model.cm"
    output:
        "test-covariance-model.cm.i1i",
        "test-covariance-model.cm.i1f",
        "test-covariance-model.cm.i1m",
        "test-covariance-model.cm.i1p"
    log:
        "logs/cmpress.log"
    params:
        extra="",
    wrapper:
        "0.65.0/bio/infernal/cmpress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# -F enables overwrite of old (otherwise cmpress will fail if old versions exist)
shell("cmpress -F {snakemake.input} {log}")
INFERNAL CMSCAN

cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Software dependencies
  • infernal=1.1.2
Example

This wrapper can be used in the following way:

rule cmscan_profile:
    input:
        fasta="test-transcript.fa",
        profile="test-covariance-model.cm.i1i"
    output:
        tblout="tr-infernal-tblout.txt",
    log:
        "logs/cmscan.log"
    params:
        evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
        extra= "",
        #score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
    threads: 4
    wrapper:
        "0.65.0/bio/infernal/cmscan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]

assert profile.endswith(".cm"), 'your profile file should end with ".cm"'

# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)

# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10)  # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = f" -T {float(score_threshold)} "
else:
    thresh_cmd = f" -E {float(evalue_threshold)} "

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)

JANNOVAR

Annotate predicted effect of nucleotide changes with Jannovar

Software dependencies
  • jannovar-cli ==0.31
Example

This wrapper can be used in the following way:

rule jannovar:
    input:
        vcf="{sample}.vcf",
        pedigree="pedigree_ar.ped" # optional, contains familial relationships
    output:
        "jannovar/{sample}.vcf.gz"
    log:
        "logs/jannovar/{sample}.log"
    params:
        database="hg19_small.ser", # path to jannovar reference dataset
        extra="--show-all"         # optional parameters
    wrapper:
        "0.65.0/bio/jannovar"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

pedigree = snakemake.input.get("pedigree", "")
if pedigree:
    pedigree = '--pedigree-file "%s"' % pedigree

shell(
    "jannovar annotate-vcf --database {snakemake.params.database}"
    " --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
    " {pedigree} {extra} {log}"
)

KALLISTO

For kallisto, the following wrappers are available:

KALLISTO INDEX

Index a transcriptome using kallisto.

Software dependencies
  • kallisto ==0.45.0
Example

This wrapper can be used in the following way:

rule kallisto_index:
    input:
        fasta = "{transcriptome}.fasta"
    output:
        index = "{transcriptome}.idx"
    params:
        extra = "--kmer-size=5"
    log:
        "logs/kallisto_index_{transcriptome}.log"
    threads: 1
    wrapper:
        "0.65.0/bio/kallisto/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file is required"
fasta = " ".join(fasta) if isinstance(fasta, list) else fasta

shell(
    "kallisto index "  # Tool
    "{extra} "  # Optional parameters
    "--index={snakemake.output.index} "  # Output file
    "{fasta} "  # Input FASTA files
    "{log}"  # Logging
)
KALLISTO QUANT

Pseudoalign reads and quantify transcripts using kallisto.

Software dependencies
  • kallisto ==0.45.0
Example

This wrapper can be used in the following way:

rule kallisto_quant:
    input:
        fastq = ["reads/{exp}_R1.fastq", "reads/{exp}_R2.fastq"],
        index = "index/transcriptome.idx"
    output:
        directory('quant_results_{exp}')
    params:
        extra = ""
    log:
        "logs/kallisto_quant_{exp}.log"
    threads: 1
    wrapper:
        "0.65.0/bio/kallisto/quant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto quant"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTQ files
fastq = snakemake.input.get("fastq")
assert fastq is not None, "input-> a FASTQ-file is required"
fastq = " ".join(fastq) if isinstance(fastq, list) else fastq

shell(
    "kallisto quant "  # Tool
    "{extra} "  # Optional parameters
    "--threads={snakemake.threads} "  # Number of threads
    "--index={snakemake.input.index} "  # Input file
    "--output-dir={snakemake.output} "  # Output directory
    "{fastq} "  # Input FASTQ files
    "{log}"  # Logging
)

LAST

For last, the following wrappers are available:

LASTAL

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

Software dependencies
  • last=874
Example

This wrapper can be used in the following way:

rule lastal_nucl_x_nucl:
    input:
        data="test-transcript.fa",
        lastdb="test-transcript.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-transcript.maf",
        #tab="test-transcript.tab",
        #blasttab="test-transcript.blasttab",
        #blasttabplus="test-transcript.blasttabplus",
    params:
        #Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
        D_length=1000000,
        extra=""
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.65.0/bio/last/lastal"

rule lastal_nucl_x_prot:
    input:
        data="test-transcript.fa",
        lastdb="test-protein.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-tr-x-prot.maf"
        #tab="test-tr-x-prot.tab",
        #blasttab="test-tr-x-prot.blasttab",
        #blasttabplus="test-tr-x-prot.blasttabplus",
    params:
        frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
        extra="",
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.65.0/bio/last/lastal"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
""" Snakemake wrapper for lastal """

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000))  # last default

# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
    list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"

out_cmd = ""

if maf_out:
    out_cmd = "-f {}".format("MAF")
    outF = maf_out
elif tab_out:
    out_cmd = "-f {}".format("TAB")
    outF = tab_out
if btab_out:
    out_cmd = "-f {}".format("BlastTab")
    outF = btab_out
if btabplus_out:
    out_cmd = "-f {}".format("BlastTab+")
    outF = btabplus_out

frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
    f_cmd = f"-F {frameshift_cost}"


lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]

shell(
    "lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)
LASTDB

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

Software dependencies
  • last=874
Example

This wrapper can be used in the following way:

rule lastdb_transcript:
    input:
        "test-transcript.fa"
    output:
        "test-transcript.fa.prj",
    params:
        protein_input=False,
        extra=""
    log:
        "logs/lastdb/test-transcript.log"
    wrapper:
        "0.65.0/bio/last/lastdb"

rule lastdb_protein:
    input:
        "test-protein.fa"
    output:
        "test-protein.fa.prj",
    params:
        protein_input=True,
        extra=""
    log:
        "logs/lastdb/test-protein.log"
    wrapper:
        "0.65.0/bio/last/lastdb"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

protein_cmd = ""
protein = snakemake.params.get("protein_input", False)

if protein:
    protein_cmd = " -p "

shell("lastdb {extra} {protein_cmd} -P {snakemake.threads} {snakemake.input} {log}")

LOFREQ

For lofreq, the following wrappers are available:

LOFREQ CALL

simply call variants

Software dependencies
  • samtools ==1.6
  • lofreq ==2.1.3.1
Example

This wrapper can be used in the following way:

rule lofreq:
    input:
        bam="data/{sample}.bam",
        bai="data/{sample}.bai"
    output:
        "calls/{sample}.vcf"
    log:
        "logs/lofreq_call/{sample}.log"
    params:
        ref="data/genome.fasta",
        extra=""
    threads: 8
    wrapper:
        "0.65.0/bio/lofreq/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)

if ref is None:
    raise ValueError("A reference must be provided")

bam_input = snakemake.input.bam
bai_input = snakemake.input.bai

if bam_input is None:
    raise ValueError("Missing bam input file!")

if bai_input is None:
    raise ValueError("Missing bai input file!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

shell(
    "lofreq call-parallel "
    " --pp-threads {snakemake.threads}"
    " -f {ref}"
    " {bam_input}"
    " -o {output_file}"
    " {extra}"
    " {log}"
)

MACS2

For macs2, the following wrappers are available:

MACS2 CALLPEAK

MACS2 callpeak model-based analysis tool for ChIP-sequencing that calls peaks from alignment results. For usage information about MACS2 callpeak, please see the documentation and the command line help. For more information about MACS2, also see the source code and published article. Depending on the selected extension(s), the option(s) will be set automatically (please see table below). Please note that there are extensions, that are incompatible with each other, because they require the –broad option either to be enabled or disabled.

Extension for the output files Description Format Option
NAME_peaks.xls

a table with information about called

peaks

excel  
NAME_control_lambda.bdg

local biases estimated for each genomic

location from the control sample

bedGraph –bdg or -B
NAME_treat_pileup.bdg pileup signals from treatment sample bedGraph –bdg or -B
NAME_peaks.broadPeak

similar to _peaks.narrowPeak file,

except for missing the annotating peak

summits

BED 6+3 –broad
NAME_peaks.gappedPeak

contains the broad region and narrow

peaks

BED 12+3 –broad
NAME_peaks.narrowPeak

contains the peak locations, peak

summit, p-value and q-value

BED 6+4 if not set –broad
NAME_summits.bed peak summits locations for every peak BED if not set –broad
Software dependencies
  • macs2>=2.2
Example

This wrapper can be used in the following way:

rule callpeak:
    input:
        treatment="samples/a.bam",   # required: treatment sample(s)
        control="samples/b.bam"      # optional: control sample(s)
    output:
        # all output-files must share the same basename and only differ by it's extension
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
        multiext("callpeak/basename",
                 "_peaks.xls",   ### required
                 ### optional output files
                 "_peaks.narrowPeak",
                 "_summits.bed"
                 )
    log:
        "logs/macs2/callpeak.log"
    params:
        "-f BAM -g hs --nomodel"
    wrapper:
        "0.65.0/bio/macs2/callpeak"

rule callpeak_options:
    input:
        treatment="samples/a.bam",   # required: treatment sample(s)
        control="samples/b.bam"      # optional: control sample(s)
    output:
        # all output-files must share the same basename and only differ by it's extension
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
        multiext("callpeak_options/basename",
                 "_peaks.xls",   ### required
                 ### optional output files
                 # these output extensions internally set the --bdg or -B option:
                 "_treat_pileup.bdg",
                 "_control_lambda.bdg",
                 # these output extensions internally set the --broad option:
                 "_peaks.broadPeak",
                 "_peaks.gappedPeak"
                 )
    log:
        "logs/macs2/callpeak.log"
    params:
        "-f BAM -g hs --nomodel"
    wrapper:
        "0.65.0/bio/macs2/callpeak"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
import sys
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

in_contr = snakemake.input.get("control")
params = "{}".format(snakemake.params)
opt_input = ""
out_dir = ""

ext = "_peaks.xls"
out_file = [o for o in snakemake.output if o.endswith(ext)][0]
out_name = os.path.basename(out_file[: -len(ext)])
out_dir = os.path.dirname(out_file)

if in_contr:
    opt_input = "-c {contr}".format(contr=in_contr)

if out_dir:
    out_dir = "--outdir {dir}".format(dir=out_dir)

if any(out.endswith(("_peaks.narrowPeak", "_summits.bed")) for out in snakemake.output):
    if any(
        out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak"))
        for out in snakemake.output
    ):
        sys.exit(
            "Output files with _peaks.narrowPeak and/or _summits.bed extensions cannot be created together with _peaks.broadPeak and/or _peaks.gappedPeak extended output files.\n"
            "For usable extensions please see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.\n"
        )
    else:
        if " --broad" in params:
            sys.exit(
                "If --broad option in params is given, the _peaks.narrowPeak and _summits.bed files will not be created. \n"
                "Remove --broad option from params if these files are needed.\n"
            )

if any(
    out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output
):
    if "--broad" not in params:
        params += " --broad "

if any(
    out.endswith(("_treat_pileup.bdg", "_control_lambda.bdg"))
    for out in snakemake.output
):
    if all(p not in params for p in ["--bdg", "-B"]):
        params += " --bdg "
else:
    if any(p in params for p in ["--bdg", "-B"]):
        sys.exit(
            "If --bdg or -B option in params is given, the _control_lambda.bdg and _treat_pileup.bdg extended files must be specified in output. \n"
        )

shell(
    "(macs2 callpeak "
    "-t {snakemake.input.treatment} "
    "{opt_input} "
    "{out_dir} "
    "-n {out_name} "
    "{params}) {log}"
)

MINIMAP2

For minimap2, the following wrappers are available:

MINIMAP2

A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2

Software dependencies
  • minimap2 ==2.17
Example

This wrapper can be used in the following way:

rule minimap2:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.paf"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb"  # optional
    threads: 3
    wrapper:
        "0.65.0/bio/minimap2/aligner"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
  • Michael Hall
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

inputQuery = " ".join(snakemake.input.query)

shell(
    "(minimap2 -t {snakemake.threads} {extra} -o {snakemake.output[0]} "
    "{snakemake.input.target} {inputQuery}) {log}"
)
MINIMAP2 INDEX

creates a minimap2 index

Software dependencies
  • minimap2 ==2.17
Example

This wrapper can be used in the following way:

rule minimap2_index:
    input:
        target="target/{input1}.fasta"
    output:
        "{input1}.mmi"
    log:
        "logs/minimap2_index/{input1}.log"
    params:
        extra=""  # optional additional args
    threads: 3
    wrapper:
        "0.65.0/bio/minimap2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)

MSISENSOR

For msisensor, the following wrappers are available:

MSISENSOR MSI

Score your MSI with MSIsensor

Software dependencies
  • msisensor ==0.5
Example

This wrapper can be used in the following way:

rule test_msisensor_msi:
    input:
        normal = "example.normal.bam",
        tumor = "example.tumor.bam",
        microsat = "example.microsate.sites"
    output:
        "example.msi",
        "example.msi_dis",
        "example.msi_germline",
        "example.msi_somatic"
    message:
        "Testing MSIsensor msi"
    threads:
        1
    log:
        "example.log"
    params:
        out_prefix = "example.msi"
    wrapper:
        "0.65.0/bio/msisensor/msi"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
"""Snakemake script for MSISensor msi"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from os.path import commonprefix
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

# Detemining common prefix in output files
# to fill the requested parameter '-o'
prefix = commonprefix(snakemake.output)

shell(
    "msisensor msi"  # Tool and its sub-command
    " -d {snakemake.input.microsat}"  # Path to homopolymer/microsat file
    " -n {snakemake.input.normal}"  # Path to normal bam
    " -t {snakemake.input.tumor}"  # Path to tumor bam
    " -o {prefix}"  # Path to output distribution file
    " -b {snakemake.threads}"  # Maximum number of threads used
    " {extra}"  # Optional extra parameters
    " {log}"  # Logging behavior
)
MSISENSOR SCAN

Scan homopolymers and microsatelites with MSIsensor

Software dependencies
  • msisensor ==0.5
Example

This wrapper can be used in the following way:

rule test_msisensor_scan:
    input:
        "genome.fasta"
    output:
        "microsat.list"
    message:
        "Testing MSISensor scan"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/msisensor_scan.log"
    wrapper:
        "0.65.0/bio/msisensor/scan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
"""Snakemake script for MSISensor Scan"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

shell(
    "msisensor scan "  # Tool and its sub-command
    "-d {snakemake.input} "  # Path to fasta file
    "-o {snakemake.output} "  # Path to output file
    "{extra} "  # Optional extra parameters
    "{log}"  # Logging behavior
)

MULTIQC

Generate qc report using multiqc.

Software dependencies
  • multiqc ==1.9
Example

This wrapper can be used in the following way:

rule multiqc:
    input:
        expand("samtools_stats/{sample}.txt", sample=["a", "b"])
    output:
        "qc/multiqc.html"
    params:
        ""  # Optional: extra parameters for multiqc.
    log:
        "logs/multiqc.log"
    wrapper:
        "0.65.0/bio/multiqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {snakemake.params}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_dirs}"
    " {log}"
)

NANOSIM-H

NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.

Software dependencies
  • nanosim-h ==1.1.0.4
Example

This wrapper can be used in the following way:

rule nanosimh:
    input:
        "{sample}.fa"
    output:
        reads = "{sample}.simulated.fa",
        log = "{sample}.simulated.log",
        errors = "{sample}.simulated.errors.txt"
    params:
        extra = "",
        num_reads = 10,
        perfect_reads = True,
        min_read_len = 10,
    log:
        "logs/nanosim-h/test/{sample}.log"
    wrapper:
        "0.65.0/bio/nanosim-h"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for NanoSim-H."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


def is_header(query):
    return query.startswith(">")


def get_length_of_longest_sequence(fh):
    current_length = 0
    all_lengths = []
    for line in fh:
        if not is_header(line):
            current_length += len(line.rstrip())
        else:
            all_lengths.append(current_length)
            current_length = 0
    all_lengths.append(current_length)

    return max(all_lengths)


# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)

# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
    with open(snakemake.input[0]) as fh:
        max_read_len = get_length_of_longest_sequence(fh)

perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "nanosim-h {extra} "
    "{perfect_reads_flag} "
    "--max-len {max_read_len} "
    "--min-len {min_read_len} "
    "--profile {profile} "
    "--number {num_reads} "
    "--out-pref {prefix} "
    "{snakemake.input} {log}"
)

NGS-DISAMBIGUATE

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.

Software dependencies
  • ngs-disambiguate ==2016.11.10
  • bamtools ==2.4.0
Example

This wrapper can be used in the following way:

rule disambiguate:
    input:
        a="mapped/{sample}.a.bam",
        b="mapped/{sample}.b.bam"
    output:
        a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
        b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
        a_disambiguated='disambiguate/{sample}.graft.bam',
        b_disambiguated='disambiguate/{sample}.host.bam',
        summary='qc/disambiguate/{sample}.txt'
    params:
        algorithm="bwa",
        # optional: Prefix to use for output. If omitted, a
        # suitable value is guessed from the output paths. Prefix
        # is used for the intermediate output paths, as well as
        # sample name in summary file.
        prefix="{sample}",
        extra=""
    wrapper:
        "0.65.0/bio/ngs-disambiguate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")

output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
    prefix = path.splitext(path.basename(snakemake.output.summary))[0]

# Run command.
shell(
    "ngs_disambiguate"
    " {extra}"
    " -o {output_dir}"
    " -s {prefix}"
    " -a {snakemake.params.algorithm}"
    " {snakemake.input.a}"
    " {snakemake.input.b}"
)

# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)

output_map = {
    output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
    output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
    output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
    output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
    output_base + "_summary.txt": snakemake.output.summary,
}

for src, dest in output_map.items():
    if src != dest:
        shell("mv {src} {dest}")

OPTITYPE

Precision 4-digit HLA-I-typing from NGS data based on integer linear programming. Use razers3 beforehand to generate input fastq files only mapping to HLA-regions. Please see https://github.com/FRED-2/OptiType

Software dependencies
  • optitype ==1.3.4
Example

This wrapper can be used in the following way:

rule optitype:
    input:
        # list of input reads
        reads=["reads/{sample}_1.fished.fastq", "reads/{sample}_2.fished.fastq"]
    output:
        multiext("optitype/{sample}", "_coverage_plot.pdf", "_result.tsv")
    log:
        "logs/optitype/{sample}.log"
    params:
        # Type of sequencing data. Can be 'dna' or 'rna'. Default is 'dna'.
        sequencing_type="dna",
        # optiype config file, optional
        config="",
        # additional parameters
        extra=""
    wrapper:
        "0.65.0/bio/optitype"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
outdir = os.path.dirname(snakemake.output[0])

# get sequencing type
seq_type = snakemake.params.get("sequencing_type", "dna")
seq_type = "--{}".format(seq_type)

# check if non-default config.ini is used
config = snakemake.params.get("config", "")
if any(config):
    config = "--config {}".format(config)

shell(
    "(OptiTypePipeline.py"
    " --input {snakemake.input.reads}"
    " --outdir {outdir}"
    " --prefix {snakemake.wildcards.sample}"
    " {seq_type}"
    " {config}"
    " {extra})"
    " {log}"
)

PALADIN

For paladin, the following wrappers are available:

PALADIN ALIGN

Align nucleotide reads to a protein fasta file (that has been indexed with paladin index). PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_align:
    input:
        reads=["reads/reads.left.fq.gz"],
        index="index/prot.fasta.bwt",
    output:
        "paladin_mapped/{sample}.bam" # will output BAM format if output file ends with ".bam", otherwise SAM format
    log:
        "logs/paladin/{sample}.log"
    threads: 4
    wrapper:
        "0.65.0/bio/paladin/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PALADIN alignment"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

r = snakemake.input.get("reads")
assert (
    r is not None
), "reads are required as input. If you have paired end reads, please merge them first (e.g. with PEAR)"
index = snakemake.input.get("index")
assert (
    index is not None
), "please index your assembly and provide the basename (with'.bwt' extension) via the 'index' input param"

index_base = str(index).rsplit(".bwt")[0]

outfile = snakemake.output

# if bam output, pipe to bam!
output_cmd = "  | samtools view -Sb - > " if str(outfile).endswith(".bam") else " -o "

min_orf_len = snakemake.params.get("f", "250")

shell(
    "paladin align -f {min_orf_len} -t {snakemake.threads} {extra} {index_base} {r} {output_cmd} {outfile}"
)
PALADIN INDEX

Index a protein fasta file for mapping with paladin. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_index:
    input:
        "prot.fasta",
    output:
        "index/prot.fasta.bwt"
    log:
        "logs/paladin/prot_index.log"
    params:
      reference_type=3
    wrapper:
        "0.65.0/bio/paladin/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Index."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"


# this wrapper temporarily copies your assembly into the output dir
# so that all the paladin output files end up in the desired spot

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

input_assembly = snakemake.input
annotation = snakemake.input.get("gff", "")
paladin_index = str(snakemake.output)
reference_type = snakemake.params.get("reference_type", "3")
assert int(reference_type) in [1, 2, 3, 4]
ref_type_cmd = "-r" + str(reference_type)

output_base = paladin_index.rsplit(".bwt")[0]

shell("cp {input_assembly} {output_base}")
shell("paladin index {ref_type_cmd} {output_base} {annotation} {extra} {log}")
shell("rm -f {output_base}")
PALADIN PREPARE

Download and prepare uniprot refs for paladin mapping. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_prepare:
    output:
        "uniprot_sprot.fasta.gz",
        "uniprot_sprot.fasta.gz.pro"
    log:
        "logs/paladin/prepare_sprot.log"
    params:
        reference_type=1, # 1=swiss-prot, 2=uniref90
    wrapper:
        "0.65.0/bio/paladin/prepare"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Prepare"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

reference_type = snakemake.params.get(
    "reference_type", "1"
)  # download swissprot as default
assert int(reference_type) in [1, 2]
ref_type_cmd = "-r" + str(reference_type)

shell("paladin prepare {ref_type_cmd} {extra} {log}")

PEAR

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger

Software dependencies
  • pear=0.9.6
Example

This wrapper can be used in the following way:

rule pear_merge:
    input:
        read1="reads/reads.left.fq.gz",
        read2="reads/reads.right.fq.gz"
    output:
        assembled="pear/reads_pear_assembled.fq.gz",
        discarded="pear/reads_pear_discarded.fq.gz",
        unassembled_read1="pear/reads_pear_unassembled_r1.fq.gz",
        unassembled_read2="pear/reads_pear_unassembled_r2.fq.gz",
    log:
        'logs/pear.log'
    params:
        pval=".01",
        extra=""
    threads: 4
    resources:
        mem_mb=4000 # define amount of memory to be used by pear
    wrapper:
        "0.65.0/bio/pear"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

r1 = snakemake.input.get("read1")
r2 = snakemake.input.get("read2")
assert r1 is not None and r2 is not None, "r1 and r2 files are required as input"

assembled = snakemake.output.get("assembled")
assert assembled is not None, "require 'assembled' outfile"
gzip = True if assembled.endswith(".gz") else False

out_base, out_end = assembled.rsplit(".f")
out_end = ".f" + out_end

df_assembled = out_base + ".assembled.fastq"
df_discarded = out_base + ".discarded.fastq"
df_unassembled_r1 = out_base + ".unassembled.forward.fastq"
df_unassembled_r2 = out_base + ".unassembled.reverse.fastq"

df_outputs = [df_assembled, df_discarded, df_unassembled_r1, df_unassembled_r2]

discarded = snakemake.output.get("discarded", out_base + ".discarded" + out_end)
unassembled_r1 = snakemake.output.get(
    "unassembled_read1", out_base + ".unassembled_r1" + out_end
)
unassembled_r2 = snakemake.output.get(
    "unassembled_read2", out_base + ".unassembled_r2" + out_end
)

final_outputs = [assembled, discarded, unassembled_r1, unassembled_r2]


def move_files(in_list, out_list, gzip):
    for f, o in zip(in_list, out_list):
        if f != o:
            if gzip:
                shell("gzip -9 -c {f} > {o}")
                shell("rm -f {f}")
            else:
                shell("cp {f} {o}")
                shell("rm -f {f}")
        elif gzip:
            shell("gzip -9 {f}")


pval = float(snakemake.params.get("pval", ".01"))
max_mem = snakemake.resources.get("mem_mb", "4000")
extra = snakemake.params.get("extra", "")

shell(
    "pear -f {r1} -r {r2} -p {pval} -j {snakemake.threads} -y {max_mem} {extra} -o {out_base} {log}"
)

move_files(df_outputs, final_outputs, gzip)

PICARD

For picard, the following wrappers are available:

PICARD ADDORREPLACEREADGROUPS

Add or replace read groups with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule replace_rg:
    input:
        "mapped/{sample}.bam"
    output:
        "fixed-rg/{sample}.bam"
    log:
        "logs/picard/replace_rg/{sample}.log"
    params:
        "RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
    wrapper:
        "0.65.0/bio/picard/addorreplacereadgroups"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
    "O={snakemake.output} &> {snakemake.log}"
)
PICARD BEDTOINTERVALLIST

picard BedToIntervalList converts a BED file to Picard Interval List format.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule bed_to_interval_list:
    input:
        bed="resources/a.bed",
        dict="resources/genome.dict"
    output:
        "a.interval_list"
    log:
        "logs/picard/bedtointervallist/a.log"
    params:
        # optional parameters
        "SORT=true " # sort output interval list before writing
    wrapper:
        "0.65.0/bio/picard/bedtointervallist"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Fabian Kilpert
Code
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "picard BedToIntervalList "
    "{snakemake.params} "
    "INPUT={snakemake.input.bed} "
    "SEQUENCE_DICTIONARY={snakemake.input.dict} "
    "OUTPUT={snakemake.output} "
    "{log} "
)
PICARD COLLECTALIGNMENTSUMMARYMETRICS

Collect metrics on aligned reads with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam"
    output:
        "stats/{sample}.summary.txt"
    log:
        "logs/picard/alignment-summary/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.65.0/bio/picard/collectalignmentsummarymetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "picard CollectAlignmentSummaryMetrics {snakemake.params} "
    "INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
    "REFERENCE_SEQUENCE={snakemake.input.ref} {log}"
)
PICARD COLLECTHSMETRICS

Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule picard_collect_hs_metrics:
    input:
        bam="mapped/{sample}.bam",
        reference="genome.fasta",
        # Baits and targets should be given as interval lists. These can
        # be generated from bed files using picard BedToIntervalList.
        bait_intervals="regions.intervals",
        target_intervals="regions.intervals"
    output:
        "stats/hs_metrics/{sample}.txt"
    params:
        # Optional extra arguments. Here we reduce sample size
        # to reduce the runtime in our unit test.
        extra="SAMPLE_SIZE=1000"
    log:
        "logs/picard_collect_hs_metrics/{sample}.log"
    wrapper:
        "0.65.0/bio/picard/collecthsmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard CollectHSMetrics."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard CollectHsMetrics"
    " {extra}"
    " INPUT={snakemake.input.bam}"
    " OUTPUT={snakemake.output[0]}"
    " REFERENCE_SEQUENCE={snakemake.input.reference}"
    " BAIT_INTERVALS={snakemake.input.bait_intervals}"
    " TARGET_INTERVALS={snakemake.input.target_intervals}"
    " {log}"
)
PICARD COLLECTINSERTSIZEMETRICS

Collect metrics on insert size of paired end reads with picard tools.

Software dependencies
  • picard ==2.22.1
  • r-base ==3.6.2
Example

This wrapper can be used in the following way:

rule insert_size:
    input:
        "mapped/{sample}.bam"
    output:
        txt="stats/{sample}.isize.txt",
        pdf="stats/{sample}.isize.pdf"
    log:
        "logs/picard/insert_size/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.65.0/bio/picard/collectinsertsizemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "picard CollectInsertSizeMetrics {snakemake.params} "
    "INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
    "HISTOGRAM_FILE={snakemake.output.pdf} {log}"
)
PICARD COLLECTMULTIPLEMETRICS

A picard meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard’s documentation. For more information about picard, also see the source code.

You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below).

Tool Extension(s) for the output files
CollectAlignmentSummaryMetrics “.alignment_summary_metrics”
CollectInsertSizeMetrics

“.insert_size_metrics”,

“.insert_size_histogram.pdf”

QualityScoreDistribution

“.quality_distribution_metrics”,

“.quality_distribution.pdf”

MeanQualityByCycle

“.quality_by_cycle_metrics”,

“.quality_by_cycle.pdf”

CollectBaseDistributionByCycle

“.base_distribution_by_cycle_metrics”,

“.base_distribution_by_cycle.pdf”

CollectGcBiasMetrics

“.gc_bias.detail_metrics”,

“.gc_bias.summary_metrics”,

“.gc_bias.pdf”

RnaSeqMetrics “.rna_metrics”
CollectSequencingArtifactMetrics

“.bait_bias_detail_metrics”,

“.bait_bias_summary_metrics”,

“.error_summary_metrics”,

“.pre_adapter_detail_metrics”,

“.pre_adapter_summary_metrics”

CollectQualityYieldMetrics “.quality_yield_metrics”
Software dependencies
  • picard ==2.23.0
Example

This wrapper can be used in the following way:

rule collect_multiple_metrics:
    input:
         bam="mapped/{sample}.bam",
         ref="genome.fasta"
    output:
        # Through the output file extensions the different tools for the metrics can be selected
        # so that it is not necessary to specify them under params with the "PROGRAM" option.
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
        multiext("stats/{sample}",
                 ".alignment_summary_metrics",
                 ".insert_size_metrics",
                 ".insert_size_histogram.pdf",
                 ".quality_distribution_metrics",
                 ".quality_distribution.pdf",
                 ".quality_by_cycle_metrics",
                 ".quality_by_cycle.pdf",
                 ".base_distribution_by_cycle_metrics",
                 ".base_distribution_by_cycle.pdf",
                 ".gc_bias.detail_metrics",
                 ".gc_bias.summary_metrics",
                 ".gc_bias.pdf",
                 ".rna_metrics",
                 ".bait_bias_detail_metrics",
                 ".bait_bias_summary_metrics",
                 ".error_summary_metrics",
                 ".pre_adapter_detail_metrics",
                 ".pre_adapter_summary_metrics",
                 ".quality_yield_metrics"
                 )
    resources:
        # This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
        #     https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
        mem_gb=3
    log:
        "logs/picard/multiple_metrics/{sample}.log"
    params:
        # optional parameters
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE "
        "REF_FLAT=ref_flat.txt "   # is required if RnaSeqMetrics are used
    wrapper:
        "0.65.0/bio/picard/collectmultiplemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Laehnemann
  • Antonie Vietor
Code
__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import sys
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

res = snakemake.resources.get("mem_gb", "3")
if not res or res is None:
    res = 3

exts_to_prog = {
    ".alignment_summary_metrics": "CollectAlignmentSummaryMetrics",
    ".insert_size_metrics": "CollectInsertSizeMetrics",
    ".insert_size_histogram.pdf": "CollectInsertSizeMetrics",
    ".quality_distribution_metrics": "QualityScoreDistribution",
    ".quality_distribution.pdf": "QualityScoreDistribution",
    ".quality_by_cycle_metrics": "MeanQualityByCycle",
    ".quality_by_cycle.pdf": "MeanQualityByCycle",
    ".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle",
    ".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle",
    ".gc_bias.detail_metrics": "CollectGcBiasMetrics",
    ".gc_bias.summary_metrics": "CollectGcBiasMetrics",
    ".gc_bias.pdf": "CollectGcBiasMetrics",
    ".rna_metrics": "RnaSeqMetrics",
    ".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics",
    ".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics",
    ".error_summary_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics",
    ".quality_yield_metrics": "CollectQualityYieldMetrics",
}
progs = set()

for file in snakemake.output:
    matched = False
    for ext in exts_to_prog:
        if file.endswith(ext):
            progs.add(exts_to_prog[ext])
            matched = True
    if not matched:
        sys.exit(
            "Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
        )

programs = " PROGRAM=" + " PROGRAM=".join(progs)

out = str(snakemake.wildcards.sample)  # as default
output_file = str(snakemake.output[0])
for ext in exts_to_prog:
    if output_file.endswith(ext):
        out = output_file[: -len(ext)]
        break

shell(
    "(picard -Xmx{res}g CollectMultipleMetrics "
    "I={snakemake.input.bam} "
    "O={out} "
    "R={snakemake.input.ref} "
    "{snakemake.params}{programs}) {log}"
)
PICARD COLLECTTARGETEDPCRMETRICS

Collect metric information for target pcr metrics runs, with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule CollectTargetedPcrMetrics:
    input:
        bam="mapped/{sample}.bam",
        amplicon_intervals="amplicon.list",
        target_intervals="target.list"
    output:
        "stats/{sample}.pcr.txt"
    log:
        "logs/picard/collecttargetedpcrmetrics/{sample}.log"
    params:
        extra=""
    wrapper:
        "0.65.0/bio/picard/collecttargetedpcrmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@mail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()

extra = snakemake.params.get("extra", "")

shell(
    "picard CollectTargetedPcrMetrics "
    "{extra} "
    "INPUT={snakemake.input.bam} "
    "OUTPUT={snakemake.output[0]} "
    "AMPLICON_INTERVALS={snakemake.input.amplicon_intervals} "
    "TARGET_INTERVALS={snakemake.input.target_intervals} "
    "{log}"
)
PICARD CREATESEQUENCEDICTIONARY

Create a .dict file for a given FASTA file

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule create_dict:
    input:
        "genome.fasta"
    output:
        "genome.dict"
    log:
        "logs/picard/create_dict.log"
    params:
        extra=""  # optional: extra arguments for picard.
    wrapper:
        "0.65.0/bio/picard/createsequencedictionary"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard "
    "CreateSequenceDictionary "
    "{extra} "
    "R={snakemake.input[0]} "
    "O={snakemake.output[0]} "
    "{log}"
)
PICARD MARKDUPLICATES

Mark PCR and optical duplicates with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/picard/dedup/{sample}.log"
    params:
        "REMOVE_DUPLICATES=true"
    wrapper:
        "0.65.0/bio/picard/markduplicates"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
    "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
    "{log}"
)
PICARD MERGESAMFILES

Merge sam/bam files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule merge_bams:
    input:
        expand("mapped/{sample}.bam", sample=["a", "b"])
    output:
        "merged.bam"
    log:
        "logs/picard_mergesamfiles.log"
    params:
        "VALIDATION_STRINGENCY=LENIENT"
    wrapper:
        "0.65.0/bio/picard/mergesamfiles"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " MergeSamFiles"
    " {snakemake.params}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD MERGEVCFS

Merge vcf files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule merge_vcfs:
    input:
        vcfs=["snvs.chr1.vcf", "snvs.chr2.vcf"]
    output:
        "snvs.vcf"
    log:
        "logs/picard/mergevcfs.log"
    params:
        extra=""
    wrapper:
        "0.65.0/bio/picard/mergevcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(f) for f in snakemake.input.vcfs)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "picard"
    " MergeVcfs"
    " {extra}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD REVERTSAM

Reverts SAM or BAM files to a previous state. .

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule revert_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "revert/{sample}.bam"
    log:
        "logs/picard/revert_sam/{sample}.log"
    params:
        extra="SANITIZE=true" # optional: Extra arguments for picard.
    wrapper:
        "0.65.0/bio/picard/revertsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard RevertSam."""

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " RevertSam"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD SOMTOFASTQ

Converts a SAM or BAM file to FASTQ.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule bam_to_fastq:
    input:
        "mapped/{sample}.bam"
    output:
        fastq1="reads/{sample}.R1.fastq",
        fastq2="reads/{sample}.R2.fastq"
    log:
        "logs/picard/sam_to_fastq/{sample}.log"
    params:
        extra="" # optional: Extra arguments for picard.
    wrapper:
        "0.65.0/bio/picard/samtofastq"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)

if not isinstance(fastq1, str):
    raise ValueError("f1 needs to be provided")

output = " FASTQ=" + fastq1

if isinstance(fastq2, str):
    output += " SECOND_END_FASTQ=" + fastq2

if isinstance(fastq_unpaired, str):
    if not isinstance(fastq2, str):
        raise ValueError("f2 is required if fastq_unpaired is set")
    else:
        output += " UNPAIRED_FASTQ=" + fastq_unpaired

shell(
    "picard" " SamToFastq" " {extra}" " INPUT={snakemake.input[0]}" " {output}" " {log}"
)
PICARD SORTSAM

Sort sam/bam files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule sort_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "sorted/{sample}.bam"
    log:
        "logs/picard/sort_sam/{sample}.log"
    params:
        sort_order="coordinate",
        extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
    wrapper:
        "0.65.0/bio/picard/sortsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " SortSam"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " SORT_ORDER={snakemake.params.sort_order}"
    " {log}"
)

PINDEL

For pindel, the following wrappers are available:

PINDEL

Call variants with pindel.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]


rule pindel:
    input:
        ref="genome.fasta",
        # samples to call
        samples=["mapped/a.bam"],
        # bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
        config="pindel_config.txt"
    output:
        expand("pindel/all_{type}", type=pindel_types)
    params:
        # prefix must be consistent with output files
        prefix="pindel/all",
        extra=""  # optional parameters (except -i, -f, -o)
    log:
        "logs/pindel.log"
    threads: 4
    wrapper:
        "0.65.0/bio/pindel/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
    "-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}"
)
PINDEL2VCF

Convert pindel output to vcf.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

rule pindel2vcf:
    input:
        ref="genome.fasta",
        pindel="pindel/all_{type}"
    output:
        "pindel/all_{type}.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.{type}.log"
    wrapper:
        "0.65.0/bio/pindel/pindel2vcf"

rule pindel2vcf_multi_input:
    input:
        ref="genome.fasta",
        pindel=["pindel/all_D", "pindel/all_INV"]
    output:
        "pindel/all.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.log"
    wrapper:
        "0.65.0/bio/pindel/pindel2vcf"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

expected_endings = [
    "INT",
    "D",
    "SI",
    "INV",
    "INV_final" "TD",
    "LI",
    "BP",
    "CloseEndMapped",
    "RP",
]


def split_file_name(file_parts, file_ending_index):
    return (
        "_".join(file_parts[:file_ending_index]),
        "_".join(file_parts[file_ending_index]),
    )


def process_input_path(input_file):
    """
        :params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
        :return: ""/path/to/file", "all"

    """
    file_path, file_name = os.path.split(input_file)
    file_parts = file_name.split("_")
    # seperate ending and name, to name: all ending: D or name: all ending: INV_final
    file_name, file_ending = split_file_name(
        file_parts, -2 if file_name.endswith("_final") else -1
    )
    if not file_ending in expected_endings:
        raise Exception("Unexpected variant type: " + file_ending)
    return file_path, file_name


with tempfile.TemporaryDirectory() as tmpdirname:
    input_flag = "-p"
    input_file = snakemake.input.get("pindel")
    if isinstance(input_file, list) and len(input_file) > 1:
        input_flag = "-P"
        input_path, input_name = process_input_path(input_file[0])
        input_file = os.path.join(input_path, input_name)
        for variant_input in snakemake.input.pindel:
            if not variant_input.startswith(input_file):
                raise Exception(
                    "Unable to extract common path from multi file input, expect path is: "
                    + input_file
                )
            if not os.path.isfile(variant_input):
                raise Exception('Input "' + input_file + '" is not a file!')
            os.symlink(
                os.path.abspath(variant_input),
                os.path.join(tmpdirname, os.path.basename(variant_input)),
            )
        input_file = os.path.join(tmpdirname, input_name)
    shell(
        "pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}"
    )

PLASS

Plass (Protein-Level ASSembler) is software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.

Software dependencies
  • plass=2.c7e35
Example

This wrapper can be used in the following way:

rule plass_paired:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "plass/prot.fasta"
    log:
        "logs/plass.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "0.65.0/bio/plass"

rule plass_single:
    input:
        single=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
    output:
        "plass/prot_single.fasta"
    log:
        "logs/plass_single.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "0.65.0/bio/plass"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PLASS Protein-Level Assembler."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2018, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

# allow multiple input files for single assembly
left = snakemake.input.get("left")
single = snakemake.input.get("single")
assert (
    left is not None or single is not None
), "please check read inputs: either left/right or single read file inputs are required"
if left:
    left = (
        [snakemake.input.left]
        if isinstance(snakemake.input.left, str)
        else snakemake.input.left
    )
    right = snakemake.input.get("right")
    assert (
        right is not None
    ), "please input 'right' reads or specify that the reads are 'single'"
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) == len(
        right
    ), "left input needs to contain the same number of files as the right input"
    input_str_left = " " + " ".join(left)
    input_str_right = " " + " ".join(right)
    input_cmd = input_str_left + " " + input_str_right
else:
    single = (
        [snakemake.input.single]
        if isinstance(snakemake.input.single, str)
        else snakemake.input.single
    )
    input_cmd = " " + " ".join(single)


outdir = path.dirname(snakemake.output[0])
tmpdir = path.join(outdir, "tmp")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "plass assemble {input_cmd} {snakemake.output} {tmpdir} --threads {snakemake.threads} {snakemake.params.extra} {log}"
)

PRESEQ

For preseq, the following wrappers are available:

PRESEQ LC_EXTRAP

preseq estimates the library complexity of existing sequencing data to then estimate the yield of future experiments based on their design. For usage information, please see preseq’s command line help (this seems more up to date than the available documentation from 2014 ). For more information about preseq, also see the source code.

Software dependencies
  • preseq ==2.0.3
Example

This wrapper can be used in the following way:

rule preseq_lc_extrap_bam:
    input:
        "samples/{sample}.sorted.bam"
    output:
        "test_bam/{sample}.lc_extrap"
    params:
        "-v"   #optional parameters
    log:
        "logs/test_bam/{sample}.log"
    wrapper:
        "0.65.0/bio/preseq/lc_extrap"

rule preseq_lc_extrap_bed:
    input:
        "samples/{sample}.sorted.bed"
    output:
        "test_bed/{sample}.lc_extrap"
    params:
        "-v"   #optional parameters
    log:
        "logs/test_bed/{sample}.log"
    wrapper:
        "0.65.0/bio/preseq/lc_extrap"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
    if "-bam" not in (snakemake.input[0]):
        params = "-bam "

shell(
    "(preseq lc_extrap {params} {snakemake.params} {snakemake.input[0]} -output {snakemake.output[0]}) {log}"
)

PRIMERCLIP

Primer trimming on sam file, https://github.com/swiftbiosciences/primerclip

Software dependencies
  • samtools ==1.9
  • primerclip ==0.3.8
Example

This wrapper can be used in the following way:

rule primerclip:
    input:
        0.65.0_file="0.65.0_file",
        alignment_file="mapped/{sample}.bam"
    output:
        alignment_file="mapped/{sample}.trimmed.bam"
    log:
        "logs/primerclip/{sample}.log"
    params:
        extra=""
    wrapper:
        "0.65.0/bio/primerclip"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

master_file = snakemake.input.master_file
in_alignment_file = snakemake.input.alignment_file
out_alignment_file = snakemake.output.alignment_file

# Check inputs/arguments.
if not isinstance(master_file, str):
    raise ValueError("master_file, path to the master file")

if not isinstance(in_alignment_file, str):
    raise ValueError("in_alignment_file, path to the input alignment file")

if not isinstance(out_alignment_file, str):
    raise ValueError("out_alignment_file, path to the output file")

samtools_input_command = "samtools view -h " + in_alignment_file

samtools_output_command = " | head -n -3 | samtools view -Sh"

if out_alignment_file.endswith(".cram"):
    samtools_output_command += "C -o " + out_alignment_file
elif out_alignment_file.endswith(".sam"):
    samtools_output_command += " -o " + out_alignment_file
else:
    samtools_output_command += "b -o " + out_alignment_file

shell(
    "{samtools_input_command} |"
    " primerclip"
    " {master_file}"
    " /dev/stdin"
    " /dev/stdout"
    " {samtools_output_command}"
    " {log}"
)

PROSOLO

For prosolo, the following wrappers are available:

PROSOLO FDR CONTROL

ProSolo can control the false discovery rate of any combination of its defined single cell events (like the presence of an alternative allele or the dropout of an allele).

Software dependencies
  • prosolo ==0.6.1
Example

This wrapper can be used in the following way:

rule prosolo_fdr_control:
    input:
         "variant_calling/{sc}.{bulk}.prosolo.bcf"
    output:
         "fdr_control/{sc}.{bulk}.prosolo.fdr.bcf"
    threads:
        1
    params:
        # comma-separated set of events for whose (joint)
        # false discovery rate you want to control
        events = "ADO_TO_REF,HET",
        # false discovery rate to control for
        fdr = 0.05
    log:
        "logs/prosolo_{sc}_{bulk}.fdr.log"
    wrapper:
        "0.65.0/bio/prosolo/control-fdr"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Lähnemann
Code
"""Snakemake wrapper for ProSolo FDR control"""

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "( prosolo control-fdr"
    " {snakemake.input}"
    " --events {snakemake.params.events}"
    " --var SNV"
    " --fdr {snakemake.params.fdr}"
    " --output {snakemake.output} )"
    "{log} "
)
PROSOLO

ProSolo calls variants or other events (like allele dropout) in a single cell sample against a bulk background sample. The single cell should stem from the same population of cells as the bulk background sample. The single cell sample should be amplified using multiple displacement amplification to match ProSolo’s statistical model.

Software dependencies
  • prosolo ==0.6.1
Example

This wrapper can be used in the following way:

rule prosolo_calling:
    input:
        single_cell = "data/mapped/{sc}.sorted.bam",
        single_cell_index = "data/mapped/{sc}.sorted.bam.bai",
        bulk = "data/mapped/{bulk}.sorted.bam",
        bulk_index = "data/mapped/{bulk}.sorted.bam.bai",
        ref = "data/genome.fa",
        ref_idx = "data/genome.fa.fai",
        candidates = "data/{sc}.{bulk}.prosolo_candidates.bcf",
    output:
        "variant_calling/{sc}.{bulk}.prosolo.bcf"
    params:
        extra = ""
    threads:
        1
    log:
        "logs/prosolo_{sc}_{bulk}.log"
    wrapper:
        "0.65.0/bio/prosolo/single-cell-bulk"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Lähnemann
Code
"""Snakemake wrapper for ProSolo single-cell-bulk calling"""

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "( prosolo single-cell-bulk "
    "--omit-indels "
    " {snakemake.params.extra} "
    "--candidates {snakemake.input.candidates} "
    "--output {snakemake.output} "
    "{snakemake.input.single_cell} "
    "{snakemake.input.bulk} "
    "{snakemake.input.ref} ) "
    "{log} "
)

PTRIMMER

Tool to trim off the primer sequence from mutiplex amplicon sequencing

Software dependencies
  • ptrimmer ==1.3.3
Example

This wrapper can be used in the following way:

rule ptrimmer_pe:
    input:
        r1="resources/a.lane1_R1.fastq.gz",
        r2="resources/a.lane1_R2.fastq.gz",
        primers="resources/primers.txt"
    output:
        r1="results/a.lane1_R1.fq.gz",
        r2="results/a.lane1_R2.fq.gz"
    log:
        "logs/ptrimmer/a.lane.log"
    wrapper:
        "0.65.0/bio/ptrimmer"

rule ptrimmer_se:
    input:
        r1="resources/a.lane1_R1.fastq.gz",
        primers="resources/primers.txt"
    output:
        r1="results/a.lane1_R1.fq",
    log:
        "logs/ptrimmer/a.lane1.log"
    wrapper:
        "0.65.0/bio/ptrimmer"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Felix Mölder
Code
__author__ = "Felix Mölder"
__copyright__ = "Copyright 2020, Felix Mölder"
__email__ = "felix.moelder@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path
import ntpath

input_reads = "-f {r1}".format(r1=snakemake.input.r1)

out_r1 = ntpath.basename(snakemake.output.r1)
output_reads = "-d {o1}".format(o1=out_r1)

if snakemake.input.get("r2", ""):
    seqmode = "pair"
    input_reads = "{reads} -r {r2}".format(reads=input_reads, r2=snakemake.input.r2)
    out_r2 = ntpath.basename(snakemake.output.r2)
    output_reads = "{reads} -e {o2}".format(reads=output_reads, o2=out_r2)
else:
    seqmode = "single"

primers = snakemake.input.primers

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

ptrimmer_params = "-t {mode} {in_reads} -a {primers} {out_reads}".format(
    mode=seqmode, in_reads=input_reads, primers=primers, out_reads=output_reads
)

process_r1 = "mv {out_read} {final_output_path}".format(
    out_read=out_r1, final_output_path=snakemake.output.r1
)

process_r2 = ""
if snakemake.input.get("r2", ""):
    process_r2 = "&& mv {out_read} {final_output_path}".format(
        out_read=out_r2, final_output_path=snakemake.output.r2
    )

shell("(ptrimmer {ptrimmer_params} && {process_r1} {process_r2}) {log}")

PYFASTAQ

For pyfastaq, the following wrappers are available:

PYFASTAQ REPLACE_BASES

Replaces all occurrences of one letter with another.

Software dependencies
  • pyfastaq ==3.17.0
Example

This wrapper can be used in the following way:

rule replace_bases:
    input:
        "{sample}.rna.fa"
    output:
        "{sample}.dna.fa",
    params:
        old_base = "U",
        new_base = "T",
    log:
        "logs/fastaq/replace_bases/test/{sample}.log"
    wrapper:
        "0.65.0/bio/pyfastaq/replace_bases"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "fastaq replace_bases"
    " {snakemake.input[0]}"
    " {snakemake.output[0]}"
    " {snakemake.params.old_base}"
    " {snakemake.params.new_base}"
    " {log}"
)

RAZERS3

Mapping (short) reads against a reference sequence. Can have multiple output formats, please see https://github.com/seqan/seqan/tree/master/apps/razers3

Software dependencies
  • razers3 ==3.5.8
Example

This wrapper can be used in the following way:

rule razers3:
    input:
        # list of input reads
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        # output format is automatically inferred from file extension. Can be bam/sam or other formats.
        "mapped/{sample}.bam"
    log:
        "logs/razers3/{sample}.log"
    params:
        # the reference genome
        genome="genome.fasta",
        # additional parameters
        extra=""
    threads: 8
    wrapper:
        "0.65.0/bio/razers3"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "(razers3"
    " -tc {snakemake.threads}"
    " {extra}"
    " -o {snakemake.output[0]}"
    " {snakemake.params.genome}"
    " {snakemake.input.reads})"
    " {log}"
)

REFERENCE

For reference, the following wrappers are available:

ENSEMBL-ANNOTATION

Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.

Software dependencies
  • curl
Example

This wrapper can be used in the following way:

rule get_annotation:
    output:
        "refs/annotation.gtf"
    params:
        species="homo_sapiens",
        release="87",
        build="GRCh37",
        fmt="gtf",
        flavor="" # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
    log:
        "logs/get_annotation.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.65.0/bio/reference/ensembl-annotation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess
import sys
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
fmt = snakemake.params.fmt
build = snakemake.params.build
flavor = snakemake.params.get("flavor", "")

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

if flavor:
    flavor += "."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

suffix = ""
if fmt == "gtf":
    suffix = "gtf.gz"
elif fmt == "gff3":
    suffix = "gff3.gz"

url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{fmt}/{species}/{species_cap}.{build}.{release}.{flavor}{suffix}".format(
    release=release,
    build=build,
    species=species,
    fmt=fmt,
    species_cap=species.capitalize(),
    suffix=suffix,
    flavor=flavor,
    branch=branch,
)

try:
    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download annotation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-SEQUENCE

Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.

Software dependencies
  • curl
Example

This wrapper can be used in the following way:

rule get_genome:
    output:
        "refs/genome.fasta"
    params:
        species="saccharomyces_cerevisiae",
        datatype="dna",
        build="R64-1-1",
        release="98"
    log:
        "logs/get_genome.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.65.0/bio/reference/ensembl-sequence"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess as sp
import sys
from itertools import product
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
    build=build, release=release
)

suffixes = ""
datatype = snakemake.params.get("datatype", "")
if datatype == "dna":
    suffixes = ["dna.primary_assembly.fa.gz", "dna.toplevel.fa.gz"]
elif datatype == "cdna":
    suffixes = ["cdna.all.fa.gz"]
elif datatype == "cds":
    suffixes = ["cds.all.fa.gz"]
elif datatype == "ncrna":
    suffixes = ["ncrna.fa.gz"]
elif datatype == "pep":
    suffixes = ["pep.all.fa.gz"]
else:
    raise ValueError("invalid datatype, must be one of dna, cdna, cds, ncrna, pep")

success = False
for suffix in suffixes:
    url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species_cap}.{spec}.{suffix}".format(
        release=release,
        species=species,
        datatype=datatype,
        spec=spec.format(build=build, release=release),
        suffix=suffix,
        species_cap=species.capitalize(),
        branch=branch,
    )

    try:
        shell("curl -sSf {url} > /dev/null 2> /dev/null")
    except sp.CalledProcessError:
        continue

    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
    success = True
    break

if not success:
    print(
        "Unable to download requested sequence data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-VARIATION

Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.

Software dependencies
  • bcftools =1.10
  • curl
Example

This wrapper can be used in the following way:

rule get_variation:
    output:
        vcf="refs/variation.vcf.gz"
        # Optional: add fai to get VCF with annotated contig lengths (as required by GATK)
        # and properly sorted VCFs.
        # fai="refs/genome.fasta.fai"
    params:
        species="saccharomyces_cerevisiae",
        release="98", # releases <98 are unsupported
        build="R64-1-1",
        type="all" # one of "all", "somatic", "structural_variation"
    log:
        "logs/get_variation.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.65.0/bio/reference/ensembl-variation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import tempfile
import subprocess
import sys
import os
from snakemake.shell import shell
from snakemake.exceptions import WorkflowError

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
type = snakemake.params.type

if release < 98:
    print("Ensembl releases <98 are unsupported.", file=open(snakemake.log[0], "w"))
    exit(1)

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

if type == "all":
    if species == "homo_sapiens" and release >= 93:
        suffixes = [
            "-chr{}".format(chrom) for chrom in list(range(1, 23)) + ["X", "Y", "MT"]
        ]
    else:
        suffixes = [""]
elif type == "somatic":
    suffixes = ["_somatic"]
elif type == "structural_variations":
    suffixes = ["_structural_variations"]
else:
    raise ValueError(
        "Unsupported type {} (only all, somatic, structural_variations are allowed)".format(
            type
        )
    )

species_filename = species if release >= 91 else species.capitalize()

urls = [
    "ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.{ext}".format(
        release=release,
        species=species,
        suffix=suffix,
        species_filename=species_filename,
        branch=branch,
        ext=ext,
    )
    for suffix in suffixes
    for ext in ["vcf.gz", "vcf.gz.csi"]
]
names = [os.path.basename(url) for url in urls if url.endswith(".gz")]

try:
    gather = "curl {urls}".format(urls=" ".join(map("-O {}".format, urls)))
    workdir = os.getcwd()
    with tempfile.TemporaryDirectory() as tmpdir:
        if snakemake.input.get("fai"):
            shell(
                "(cd {tmpdir}; {gather} && "
                "bcftools concat -Oz --naive {names} > concat.vcf.gz && "
                "bcftools reheader --fai {workdir}/{snakemake.input.fai} concat.vcf.gz "
                "> {workdir}/{snakemake.output}) {log}"
            )
        else:
            shell(
                "(cd {tmpdir}; {gather} && "
                "bcftools concat -Oz --naive {names} "
                "> {workdir}/{snakemake.output}) {log}"
            )
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download variation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided? ",
        file=sys.stderr,
    )
    exit(1)

REFGENIE

Deploy biomedical reference datasets via refgenie.

Software dependencies
  • refgenie =0.9.2
  • refgenconf =0.9.0
Example

This wrapper can be used in the following way:

rule obtain_asset:
    output:
        # the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
        fai="refs/genome.fasta"
        # Multiple outputs/seek keys are possible here.
    params:
        genome="human_alu",
        asset="fasta",
        tag="default"
    wrapper:
        "0.65.0/bio/refgenie"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import refgenconf

genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag

conf_path = os.environ["REFGENIE"]

rgc = refgenconf.RefGenConf(conf_path, writable=True)

# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)

for seek_key, out in snakemake.output.items():
    path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
    os.symlink(path, out)

RUBIC

RUBIC detects recurrent copy number alterations using copy number breaks.

Software dependencies
  • r-base =3.4.1
  • r-rubic =1.0.3
  • r-data.table =1.10.4
  • r-pracma =2.0.4
  • r-ggplot2 =2.2.1
  • r-gtable =0.2.0
  • r-codetools =0.2_15
  • r-digest =0.6.12
Example

This wrapper can be used in the following way:

rule rubic:
    input:
        seg="{samples}/segments.txt",
        markers="{samples}/markers.txt"
    output:
        out_gains="{samples}/gains.txt",
        out_losses="{samples}/losses.txt",
        out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
    params:
        fdr="",
        genefile=""
    wrapper:
        "0.65.0/bio/rubic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Beatrice F. Tan
Code
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"

library(RUBIC)

all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]

rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])

SALMON

For salmon, the following wrappers are available:

SALMON_INDEX

Index a transcriptome assembly with salmon

Software dependencies
  • salmon ==0.14.1
Example

This wrapper can be used in the following way:

rule salmon_index:
    input:
        "assembly/transcriptome.fasta"
    output:
        directory("salmon/transcriptome_index")
    log:
        "logs/salmon/transcriptome_index.log"
    threads: 2
    params:
        # optional parameters
        extra=""
    wrapper:
        "0.65.0/bio/salmon/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Index."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "salmon index -t {snakemake.input} -i {snakemake.output} "
    " --threads {snakemake.threads} {extra} {log}"
)
SALMON_QUANT

Quantify transcripts with salmon

Software dependencies
  • salmon ==0.14.1
Example

This wrapper can be used in the following way:

rule salmon_quant_reads:
    input:
        # If you have multiple fastq files for a single sample (e.g. technical replicates)
        # use a list for r1 and r2.
        r1 = "reads/{sample}_1.fq.gz",
        r2 = "reads/{sample}_2.fq.gz",
        index = "salmon/transcriptome_index"
    output:
        quant = 'salmon/{sample}/quant.sf',
        lib = 'salmon/{sample}/lib_format_counts.json'
    log:
        'logs/salmon/{sample}.log'
    params:
        # optional parameters
        libtype ="A",
        #zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
        extra=""
    threads: 2
    wrapper:
        "0.65.0/bio/salmon/quant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Quant"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell


def manual_decompression(reads, zip_ext):
    """ Allow *.bz2 input into salmon. Also provide same
    decompression for *gz files, as salmon devs mention
    it may be faster in some cases."""
    if zip_ext and reads:
        if zip_ext == "bz2":
            reads = " < (bunzip2 -c " + reads + ")"
        elif zip_ext == "gz":
            reads = " < (gunzip -c " + reads + ")"
    return reads


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")

r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")

assert (
    r1 is not None and r2 is not None
) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
    r1 = (
        [snakemake.input.r1]
        if isinstance(snakemake.input.r1, str)
        else snakemake.input.r1
    )
    r2 = (
        [snakemake.input.r2]
        if isinstance(snakemake.input.r2, str)
        else snakemake.input.r2
    )
    assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
    r1_cmd = " -1 " + manual_decompression(" ".join(r1), zip_extension)
    r2_cmd = " -2 " + manual_decompression(" ".join(r2), zip_extension)
    read_cmd = " ".join([r1_cmd, r2_cmd])
if r:
    assert (
        r1 is None and r2 is None
    ), "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
    r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
    read_cmd = " -r " + manual_decompression(" ".join(r), zip_extension)

outdir = path.dirname(snakemake.output.get("quant"))

shell(
    "salmon quant -i {snakemake.input.index} "
    " -l {libtype} {read_cmd} -o {outdir} "
    " -p {snakemake.threads} {extra} {log} "
)

SAMBAMBA

For sambamba, the following wrappers are available:

SAMBAMBA SORT

Sort bam file with sambamba

Software dependencies
  • sambamba ==0.6.6
Example

This wrapper can be used in the following way:

rule sambamba_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        ""  # optional parameters
    threads: 8
    wrapper:
        "0.65.0/bio/sambamba/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

shell(
    "sambamba sort {snakemake.params} -t {snakemake.threads} "
    "-o {snakemake.output[0]} {snakemake.input[0]}"
)

SAMTOOLS

For samtools, the following wrappers are available:

SAMTOOLS BAM2FQ INTERLEAVED

Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_bam2fq_interleaved:
    input:
        "mapped/{sample}.bam"
    output:
        "reads/{sample}.fq"
    params:
        " "
    threads: 3
    wrapper:
        "0.65.0/bio/samtools/bam2fq/interleaved"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

shell(
    "samtools bam2fq {snakemake.params} "
    " -@ {snakemake.threads} "
    " {snakemake.input[0]}"
    " >{snakemake.output[0]} "
)
SAMTOOLS BAM2FQ SEPARATE

Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools bam2fq documentation), as are secondary (0x100) and supplementary reads (0x800).

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_bam2fq_separate:
    input:
        "mapped/{sample}.bam"
    output:
        "reads/{sample}.1.fq",
        "reads/{sample}.2.fq"
    params:
        sort = "-m 4G",
        bam2fq = "-n"
    threads:  # Remember, this is the number of samtools' additional threads
        3     # At least 2 threads have to be requested on cluster sumbission.
              # Thus, this value - 2 will be sent to samtools sort -@ argument.
    wrapper:
        "0.65.0/bio/samtools/bam2fq/separate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

prefix = os.path.splitext(snakemake.output[0])[0]

# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools bam2fq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = "" if snakemake.threads <= 2 else " -@ {} ".format(snakemake.threads - 2)

shell(
    "samtools sort -n "
    " {threads} "
    " -T {prefix} "
    " {snakemake.params.sort} "
    " {snakemake.input[0]} | "
    "samtools bam2fq "
    " {snakemake.params.bam2fq} "
    " -1 {snakemake.output[0]} "
    " -2 {snakemake.output[1]} "
    " -0 /dev/null "
    " -s /dev/null "
    " -F 0x900 "
    " - "
)
SAMTOOLS DEPTH

Compute the read depth at each position or region using samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_depth:
    input:
        bams=["mapped/A.bam", "mapped/B.bam"],
        bed="regionToCalcDepth.bed", # optional
    output:
        "depth.txt"
    params:
        # optional bed file passed to -b
        extra="" # optional additional parameters as string
    wrapper:
        "0.65.0/bio/samtools/depth"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Dayne Filer
Code
"""Snakemake wrapper for running samtools depth."""

__author__ = "Dayne L Filer"
__copyright__ = "Copyright 2020, Dayne L Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

params = snakemake.params.get("extra", "")

# check for optional bed file
bed = snakemake.input.get("bed", "")
if bed:
    bed = "-b " + bed

shell(
    "samtools depth {params} {bed} " "-o {snakemake.output[0]} {snakemake.input.bams}"
)
SAMTOOLS FAIDX

index reference sequence in FASTA format from reference sequence

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "{sample}.fa"
    output:
        "{sample}.fa.fai"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/samtools/faidx"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Chambers
Code
__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "greenkidneybean@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools faidx {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS FIXMATE

Use samtools to correct mate information after BWA mapping.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_fixmate:
    input:
        "mapped/{input}"
    output:
        "fixed/{input}"
    message:
        "Fixing mate information in {wildcards.input}"
    threads:
        1
    params:
        extra = ""
    wrapper:
        "0.65.0/bio/samtools/fixmate/"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for samtools fixmate"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

# Samtools' threads parameter lists ADDITIONAL threads.
# that is why threads - 1 has to be given to the -@ parameter
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

makedirs(op.dirname(snakemake.output[0]))

shell(
    "samtools fixmate {extra} {threads}" " {snakemake.input[0]} {snakemake.output[0]}"
)
SAMTOOLS FLAGSTAT

Use samtools to create a flagstat file from a bam or sam file.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_flagstat:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.bam.flagstat"
    wrapper:
        "0.65.0/bio/samtools/flagstat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Preusch
Code
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS IDXSTATS

Use samtools to retrieve and print stats form indexed bam, sam or cram files

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_idxstats:
    input:
        bam="mapped/{sample}.bam",
        idx="mapped/{sample}.bam.bai"
    output:
        "mapped/{sample}.bam.idxstats"
    log:
        "logs/samtools/idxstats/{sample}.log"
    wrapper:
        "0.65.0/bio/samtools/idxstats"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("samtools idxstats {snakemake.input.bam} > {snakemake.output[0]} {log}")
SAMTOOLS INDEX

Index bam file with samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "mapped/{sample}.sorted.bam"
    output:
        "mapped/{sample}.sorted.bam.bai"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/samtools/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE

Merge two bam files with samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_merge:
    input:
        ["mapped/A.bam", "mapped/B.bam"]
    output:
        "merged.bam"
    params:
        "" # optional additional parameters as string
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@
    wrapper:
        "0.65.0/bio/samtools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools merge {threads} {snakemake.params} "
    "{snakemake.output[0]} {snakemake.input}"
)
SAMTOOLS MPILEUP

Generate pileup using samtools.

Software dependencies
  • samtools ==1.10
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule mpilup:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        reference_genome="genome.fasta"
    output:
        "mpileup/{sample}.mpileup.gz"
    log:
        "logs/samtools/mpileup/{sample}.log"
    params:
        extra="-d 10000",  # optional
    wrapper:
        "0.65.0/bio/samtools/mpileup"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for running mpileup."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam_input = snakemake.input.bam
reference_genome = snakemake.input.reference_genome

extra = snakemake.params.get("extra", "")

if not snakemake.output[0].endswith(".gz"):
    raise Exception(
        'output file will be compressed and therefore filename should end with ".gz"'
    )

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "samtools mpileup "
    "{extra} "
    "-f {reference_genome} "
    "{bam_input}  "
    " | pigz > {snakemake.output} "
    "{log}"
)
SAMTOOLS SORT

Sort bam file with samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@.
    wrapper:
        "0.65.0/bio/samtools/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

# Samtools takes additional threads through its option -@
# One thread for samtools
# Other threads are *additional* threads passed to the argument -@
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools sort {snakemake.params} {threads} -o {snakemake.output[0]} "
    "-T {prefix} {snakemake.input[0]}"
)
SAMTOOLS STATS

Generate stats using samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_stats:
    input:
        "mapped/{sample}.bam"
    output:
        "samtools_stats/{sample}.txt"
    params:
        extra="",                       # Optional: extra arguments.
        region="xx:1000000-2000000"      # Optional: region string.
    log:
        "logs/samtools_stats/{sample}.log"
    wrapper:
        "0.65.0/bio/samtools/stats"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell("samtools stats {extra} {snakemake.input} {region} > {snakemake.output} {log}")
SAMTOOLS VIEW

Convert or filter SAM/BAM.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_view:
    input:
        "{sample}.sam"
    output:
        "{sample}.bam"
    params:
        "-b" # optional params string
    wrapper:
        "0.65.0/bio/samtools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")

SEQTK

For seqtk, the following wrappers are available:

SEQTK-SUBSAMPLE-PE

Subsample reads from paired FASTQ files

Software dependencies
  • seqtk ==1.3
Example

This wrapper can be used in the following way:

rule seqtk_subsample_pe:
    input:
        f1="{sample}.1.fastq.gz",
        f2="{sample}.2.fastq.gz"
    output:
        f1="{sample}.1.subsampled.fastq.gz",
        f2="{sample}.2.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "0.65.0/bio/seqtk/subsample/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Fabian Kilpert
Code
"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f1} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f1} "
    "&& "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f2} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f2} "
    ") {log} "
)
SEQTK-SUBSAMPLE-SE

Subsample reads from FASTQ file

Software dependencies
  • seqtk ==1.3
Example

This wrapper can be used in the following way:

rule seqtk_subsample_se:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "0.65.0/bio/seqtk/subsample/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Fabian Kilpert
Code
"""Snakemake wrapper for subsampling reads from FASTQ file using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output} "
    ") {log} "
)

SHOVILL

Assemble bacterial isolate genomes from Illumina paired-end reads.

Software dependencies
  • shovill ==1.1.0
Example

This wrapper can be used in the following way:

rule shovill:
  input:
    r1="reads/{sample}_R1.fq.gz",
    r2="reads/{sample}_R2.fq.gz"
  output:
    raw_assembly="assembly/{sample}.{assembler}.assembly.fa",
    contigs="assembly/{sample}.{assembler}.contigs.fa"
  params:
    extra=""
  log:
    "logs/shovill/{sample}.{assembler}.log"
  threads: 1
  wrapper:
    "0.65.0/bio/shovill"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Sangram Keshari Sahu
Code
"""Snakemake wrapper for shovill."""

__author__ = "Sangram Keshari Sahu"
__copyright__ = "Copyright 2020, Sangram Keshari Sahu"
__email__ = "sangramsahu15@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from tempfile import TemporaryDirectory

# Placeholder for optional parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
params = snakemake.params.get("extra", "")

with TemporaryDirectory() as tempdir:
    shell(
        "(shovill"
        " --assembler {snakemake.wildcards.assembler}"
        " --outdir {tempdir} --force"
        " --R1 {snakemake.input.r1}"
        " --R2 {snakemake.input.r2}"
        " --cpus {snakemake.threads}"
        " {params}) {log}"
    )

    shell(
        "mv {tempdir}/{snakemake.wildcards.assembler}.fasta {snakemake.output.raw_assembly}"
        " && mv {tempdir}/contigs.fa {snakemake.output.contigs}"
    )

SICKLE

For sickle, the following wrappers are available:

SICKLE PE

Trim paired-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    r1="input_R1.fq",
    r2="input_R2.fq"
  output:
    r1="output_R1.fq",
    r2="output_R2.fq",
    rs="output_single.fq",
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    # optional log file
    "logs/sickle/job.log"
  wrapper:
    "0.65.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
    "-o {snakemake.output.r1} -p {snakemake.output.r2} "
    "-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
    "{extra}) {log}"
)
SICKLE SE

Trim single-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    "input_R1.fq"
  output:
    "output_R1.fq"
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    "logs/sickle/job.log"
  wrapper:
    "0.65.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
    "-t {snakemake.params.qual_type} {extra}) {log}"
)

SNP-MUTATOR

Generate mutated sequence files from a reference genome.

Software dependencies
  • snp-mutator ==1.2.0
Example

This wrapper can be used in the following way:

NUM_SIMULATIONS = 2

rule snpmutator:
    input:
        "{sample}.fa"
    output:
        vcf = "{sample}.mutated.vcf",
        sequences = expand(
            "{{sample}}_mutated_{simulation_number}.fasta",
            simulation_number=range(1, NUM_SIMULATIONS + 1)
        )
    params:
        num_simulations = NUM_SIMULATIONS,
        extra = " ".join([
            "--num-substitutions 2",
            "--num-insertions 2",
            "--num-deletions 0"
        ]),
    log:
        "logs/snp-mutator/test/{sample}.log"
    wrapper:
        "0.65.0/bio/snp-mutator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for SNP Mutator."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from pathlib import Path

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
num_simulations = snakemake.params.get("num_simulations", 100)
fasta_outdir = Path(snakemake.output.sequences[0]).absolute().parent
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "snpmutator {extra} "
    "--num-simulations {num_simulations} "
    "--vcf {snakemake.output.vcf} "
    "-F {fasta_outdir} "
    "{snakemake.input} {log} "
)

SNPEFF

For snpeff, the following wrappers are available:

SNPEFF

Annotate predicted effect of nucleotide changes with SnpEff

Software dependencies
  • snpeff ==4.3.1t
  • bcftools =1.10
Example

This wrapper can be used in the following way:

rule snpeff:
    input:
        calls="{sample}.vcf", # (vcf, bcf, or vcf.gz)
        db="resources/snpeff/ebola_zaire" # path to reference db downloaded with the snpeff download wrapper
    output:
        calls="snpeff/{sample}.vcf",   # annotated calls (vcf, bcf, or vcf.gz)
        stats="snpeff/{sample}.html",  # summary statistics (in HTML), optional
        csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
    log:
        "logs/snpeff/{sample}.log"
    params:
        extra="-Xmx4g"           # optional parameters (e.g., max memory 4g)
    wrapper:
        "0.65.0/bio/snpeff/annotate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell
from os import path
import shutil
import tempfile
from pathlib import Path

outcalls = snakemake.output.calls
if outcalls.endswith(".vcf.gz"):
    outprefix = "| bcftools view -Oz"
elif outcalls.endswith(".bcf"):
    outprefix = "| bcftools view -Ob"
else:
    outprefix = ""

incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
    incalls = "< <(bcftools view {})".format(incalls)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

data_dir = Path(snakemake.input.db).parent.resolve()

stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = "" if not csvstats else "-csvStats {}".format(csvstats)
stats_opt = "-noStats" if not stats else "-stats {}".format(stats)

reference = path.basename(snakemake.input.db)

shell(
    "snpEff -dataDir {data_dir} {stats_opt} {csvstats_opt} {extra} "
    "{reference} {incalls} "
    "{outprefix} > {outcalls} {log}"
)
SNPEFF DOWNLOAD

Download snpeff DB for a given species.

Software dependencies
  • snpeff ==4.3.1t
  • bcftools =1.10
Example

This wrapper can be used in the following way:

rule snpeff_download:
    output:
        # wildcard {reference} may be anything listed in `snpeff databases`
        directory("resources/snpeff/{reference}")
    log:
        "logs/snpeff/download/{reference}.log"
    params:
        reference="{reference}"
    wrapper:
        "0.65.0/bio/snpeff/download"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path

reference = snakemake.params.reference
outdir = Path(snakemake.output[0]).parent.resolve()
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("snpEff download -dataDir {outdir} {reference} {log}")

SNPSIFT

For snpsift, the following wrappers are available:

SNPSIFT ANNOTATE

Annotate using fields from another VCF file.

Software dependencies
  • snpsift ==4.3.1t
  • bcftools ==1.10.2
  • pbgzip ==2016.08.04
Example

This wrapper can be used in the following way:

rule test_snpsift_annotate:
    input:
        call="in.vcf",
        database="annotation.vcf"
    output:
        call="annotated/out.vcf"
    log:
        "annotate.log"
    wrapper:
        "0.65.0/bio/snpsift/annotate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift annotate"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1

incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
    min_threads += 1
    incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
    min_threads += 1
    incall = "< <(gunzip -c {})".format(incall)

outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
    min_threads += 1
    outcall = "| gzip -c > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
    min_threads += 1
    outcall = "| bcftools view > {}".format(outcall)
else:
    outcall = "> {}".format(outcall)

if snakemake.threads < min_threads:
    raise ValueError(
        "At least {} threads required, {} provided".format(
            min_threads, snakemake.threads
        )
    )

shell(
    "SnpSift annotate"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    " {snakemake.input.database}"  # Path to annotation vcf file
    " {incall} "  # Path to input vcf file
    " {outcall} "  # Path to output vcf file
    " {log}"  # Logging behaviour
)
SNPSIFT VARTYPE

Add an INFO field denoting variant type.

Software dependencies
  • snpsift =4.3.1t
Example

This wrapper can be used in the following way:

rule test_snpsift_vartype:
    input:
        vcf="in.vcf"
    output:
        vcf="annotated/out.vcf"
    message:
        "Testing SnpSift varType"
    log:
        "varType.log"
    wrapper:
        "0.65.0/bio/snpsift/varType"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift varType"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "SnpSift varType"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    " {snakemake.input.vcf}"  # Path to input vcf file
    " > {snakemake.output.vcf}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)

SOURMASH

For sourmash, the following wrappers are available:

SOURMASH_COMPUTE

Build a MinHash signature for a transcriptome, genome, or reads

Software dependencies
  • sourmash==2.0.0a7
Example

This wrapper can be used in the following way:

rule sourmash_reads:
    input:
        "reads/a.fastq"
    output:
        "reads.sig"
    log:
        "logs/sourmash/sourmash_compute_reads.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.65.0/bio/sourmash/compute"


rule sourmash_transcriptome:
    input:
        "assembly/transcriptome.fasta"
    output:
        "transcriptome.sig"
    log:
        "logs/sourmash/sourmash_compute_transcriptome.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.65.0/bio/sourmash/compute"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Lisa K. Johnson
Code
"""Snakemake wrapper for sourmash compute."""

__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled", "1000")
k = snakemake.params.get("k", "31")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
    " {extra} {log}"
)

SRA-TOOLS

For sra-tools, the following wrappers are available:

SRA-TOOLS FASTERQ-DUMP

Download FASTQ files from SRA.

Software dependencies
  • sra-tools >2.9.1
Example

This wrapper can be used in the following way:

rule get_fastq_pe:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/{accession}_1.fastq",
        "data/{accession}_2.fastq"
    params:
        # optional extra arguments
        extra=""
    threads: 6  # defaults to 6
    wrapper:
        "0.65.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se:
    output:
        "data/{accession}.fastq"
    params:
        extra=""
    threads: 6
    wrapper:
        "0.65.0/bio/sra-tools/fasterq-dump"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Derek Croote
Code
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = "--outdir {}".format(outdir)

extra = snakemake.params.get("extra", "")

with tempfile.TemporaryDirectory() as tmp:
    shell(
        "fasterq-dump --temp {tmp} --threads {snakemake.threads} "
        "{extra} {outdir} {snakemake.wildcards.accession} {log}"
    )

STAR

For star, the following wrappers are available:

STAR

Map reads with STAR.

Software dependencies
  • star ==2.7.3a
Example

This wrapper can be used in the following way:

rule star_pe_multi:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
        # paired end reads needs to be ordered so each item in the two lists match
        fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
    output:
        # see STAR manual for additional output files
        "star/pe/{sample}/Aligned.out.sam"
    log:
        "logs/star/pe/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.65.0/bio/star/align"

rule star_se:
    input:
        fq1 = "reads/{sample}_R1.1.fastq"
    output:
        # see STAR manual for additional output files
        "star/{sample}/Aligned.out.sam"
    log:
        "logs/star/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.65.0/bio/star/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Tomás Di Domenico
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = (
    [snakemake.input.fq1]
    if isinstance(snakemake.input.fq1, str)
    else snakemake.input.fq1
)
fq2 = snakemake.input.get("fq2")
if fq2:
    fq2 = (
        [snakemake.input.fq2]
        if isinstance(snakemake.input.fq2, str)
        else snakemake.input.fq2
    )
    assert len(fq1) == len(
        fq2
    ), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])

if fq1[0].endswith(".gz"):
    readcmd = "--readFilesCommand zcat"
else:
    readcmd = ""

outprefix = os.path.dirname(snakemake.output[0]) + "/"

shell(
    "STAR "
    "{extra} "
    "--runThreadN {snakemake.threads} "
    "--genomeDir {snakemake.params.index} "
    "--readFilesIn {input_str} "
    "{readcmd} "
    "--outFileNamePrefix {outprefix} "
    "--outStd Log "
    "{log}"
)
STAR INDEX

Index fasta sequences with STAR

Software dependencies
  • star ==2.7.3a
Example

This wrapper can be used in the following way:

rule star_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("{genome}")
    message:
        "Testing STAR index"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/star_index_{genome}.log"
    wrapper:
        "0.65.0/bio/star/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
  • Tomás Di Domenico
Code
"""Snakemake wrapper for STAR index"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
sjdb_overhang = snakemake.params.get("sjdbOverhang", "100")

gtf = snakemake.input.get("gtf")
if gtf is not None:
    gtf = "--sjdbGTFfile " + gtf
    sjdb_overhang = "--sjdbOverhang " + sjdb_overhang
else:
    gtf = sjdb_overhang = ""

makedirs(snakemake.output)

shell(
    "STAR "  # Tool
    "--runMode genomeGenerate "  # Indexation mode
    "{extra} "  # Optional parameters
    "--runThreadN {snakemake.threads} "  # Number of threads
    "--genomeDir {snakemake.output} "  # Path to output
    "--genomeFastaFiles {snakemake.input.fasta} "  # Path to fasta files
    "{sjdb_overhang} "  # Read-len - 1
    "{gtf} "  # Highly recommended GTF
    "{log}"  # Logging
)

STRELKA

For strelka, the following wrappers are available:

STRELKA GERMLINE

Call germline variants with Strelka.

Software dependencies
  • strelka ==2.9.10
Example

This wrapper can be used in the following way:

rule strelka_germline:
    input:
        # the required bam file
        bam="mapped/{sample}.bam",
        # path to reference genome fasta and index
        fasta="genome.fasta",
        fasta_index="genome.fasta.fai"
    output:
        # Strelka results - either use directory or complete file path
        directory("strelka/{sample}")
    log:
        "logs/strelka/germline/{sample}.log"
    params:
        # optional parameters
        config_extra="",
        run_extra=""
    threads: 8
    wrapper:
        "0.65.0/bio/strelka/germline"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"


import os
from pathlib import Path
from snakemake.shell import shell

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam = snakemake.input.get("bam")  # input bam file, required
assert bam is not None, "input-> bam is a required input parameter"

if snakemake.output[0].endswith(".vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "configureStrelkaGermlineWorkflow.py "  # configure the strelka run
    "--bam {bam} "  # input bam
    "--referenceFasta {snakemake.input.fasta} "  # reference genome
    "--runDir {run_dir} "  # output directory
    "{config_extra} "  # additional parameters for the configuration
    "&& {run_dir}/runWorkflow.py "  # run the strelka workflow
    "-m local "  # run in local mode
    "-j {snakemake.threads} "  # number of threads
    "{run_extra} "  # additional parameters for the run
    "{log}"
)  # logging
STRELKA

Strelka calls somatic and germline small variants from mapped sequencing reads

Software dependencies
  • strelka ==2.9.10
Example

This wrapper can be used in the following way:

rule strelka:
    input:
        # The normal bam and its index
        # are optional input
        # normal = "data/b.bam",
        # normal_index = "data/b.bam.bai"
        tumor = "data/{tumor}.bam",
        tumor_index = "data/{tumor}.bam.bai",
        fasta = "data/genome.fasta",
        fasta_index = "data/genome.fasta.fai"
    output:
        # Strelka output - can be directory or full file path
        directory("{tumor}_vcf")
    threads:
        1
    params:
        run_extra = "",
        config_extra = ""
    log:
        "logs/strelka_{tumor}.log"
    wrapper:
        "0.65.0/bio/strelka/somatic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Strelka"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")

# If a normal bam is given in input,
# then it should be provided in the input
# block, so Snakemake will perform additional
# tests on file existance.
normal = (
    "--normalBam {}".format(snakemake.input["normal"])
    if "normal" in snakemake.input.keys()
    else ""
)

if snakemake.output[0].endswith("vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "(configureStrelkaSomaticWorkflow.py "  # Configuration script
    "{normal} "  # Path to normal bam (if any)
    "--tumorBam {snakemake.input.tumor} "  # Path to tumor bam
    "--referenceFasta {snakemake.input.fasta} "  # Path to fasta file
    "--runDir {run_dir} "  # Path to output directory
    "{config_extra} "  # Extra parametersfor configuration
    " && "
    "{run_dir}/runWorkflow.py "  # Run the pipeline
    "--mode local "  # Stop internal job submission
    "--jobs {snakemake.threads} "  # Nomber of threads
    "{run_extra}) "  # Extra parameters for runWorkflow
    "{log}"  # Logging behaviour
)

STRLING

For strling, the following wrappers are available:

STRLING CALL

STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. call calls genotypes/estimate allele sizes for all loci in each sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html

Software dependencies
  • strling ==0.3
Example

This wrapper can be used in the following way:

rule strling_call:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        bin="extract/{sample}.bin",
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
        bounds="merged/group-bounds.txt" # optional, produced by strling merge
    output:
        "call/{sample}-bounds.txt", # must end with -bounds.txt
        "call/{sample}-genotype.txt", # must end with -genotype.txt
        "call/{sample}-unplaced.txt" # must end with -unplaced.txt
    params:
        extra="" # optional extra command line arguments
    log:
        "log/strling/call/{sample}.log"
    wrapper:
        "0.65.0/bio/strling/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling call"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
bin = snakemake.input.get("bin", None)
reference = snakemake.input.get("reference", None)
bounds = snakemake.input.get("bounds", None)

if not bam or (isinstance(bam, list) and len(bam) != 1):
    raise ValueError("Please provide exactly one 'bam' as input.")

if not path.exists(bam + ".bai"):
    raise ValueError(
        "Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
    )

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not bounds:  # optional
    bounds_string = ""
else:
    bounds_string = "-b {}".format(bounds)

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if not any(o.endswith("-bounds.txt") for o in snakemake.output):
    raise ValueError("Please provide a file that ends with -bounds.txt in the output.")

for filename in snakemake.output:
    if filename.endswith("-bounds.txt"):
        prefix = filename[: -len("-bounds.txt")]
        break

if not any(o == "{}-genotype.txt".format(prefix) for o in snakemake.output):
    raise ValueError(
        "Please provide an output file that ends with -genotype.txt and has the same prefix as -bounds.txt"
    )

if not any(o == "{}-unplaced.txt".format(prefix) for o in snakemake.output):
    raise ValueError(
        "Please provide an output file that ends with -unplaced.txt and has the same prefix as -bounds.txt"
    )

shell(
    "(strling call "
    "{bam} "
    "{bin} "
    "{bounds_string} "
    "-o {prefix} "
    "{extra}) {log}"
)
STRLING EXTRACT

STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. extract retrieves informative read pairs to a binary format for a single sample (same as above, you can use the same bin files). Documentation at: https://strling.readthedocs.io/en/latest/run.html

Software dependencies
  • strling ==0.3
Example

This wrapper can be used in the following way:

rule strling_extract:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
        index="reference/genome.fasta.str" # optional
    output:
        "extract/{sample}.bin"
    log:
        "log/strling/extract/{sample}.log"
    params:
       extra="" # optionally add further command line arguments
    wrapper:
        "0.65.0/bio/strling/extract"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling extract"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
reference = snakemake.input.get("reference", None)
index = snakemake.input.get("index", None)

if not bam or (isinstance(bam, list) and len(bam) != 1):
    raise ValueError("Please provide exactly one 'bam' input.")

if not path.exists(bam + ".bai"):
    raise ValueError(
        "Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
    )

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if not index:  # optional
    index_string = ""
else:
    index_string = "-g {}".format(index)

if len(snakemake.output) != 1:
    raise ValueError("Please provide exactly one output file (.bin).")

shell(
    "(strling extract "
    "{bam} "
    "{snakemake.output[0]} "
    "-f {reference} "
    "{index_string} "
    "{extra}) {log}"
)
STRLING INDEX

STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. index creates a bed file of large STR regions in the reference genome. This step is performed automatically as part of strling extract. However, when running multiple samples, it is more efficient to do it once, then pass the file to strling extract using the -g option. Documentation at: https://strling.readthedocs.io/en/latest/run.html

Software dependencies
  • strling ==0.3
Example

This wrapper can be used in the following way:

rule strling_index:
    input:
        "reference/genome.fasta"
    output:
        index="reference/genome.fasta.str",
        fai="reference/genome.fasta.fai"
    params:
        extra="" # optionally add further command line arguments
    log:
        "log/strling/index.log"
    wrapper:
        "0.65.0/bio/strling/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling index"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
if len(snakemake.input) != 1:
    raise ValueError("Please provide exactly one reference genome.")

shell(
    "(strling index {snakemake.input[0]} "
    "-g {snakemake.output.index} "
    "{extra}) {log}"
)
STRLING MERGE

STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. merge prepares joint calling of STR loci across all given samples. Requires minimum read evidence from at least one sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html

Software dependencies
  • strling ==0.3
Example

This wrapper can be used in the following way:

rule strling_merge:
    input:
        bins=["extract/A.bin", "extract/B.bin"],
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
    output:
        "merged/group-bounds.txt" # must end with "-bounds.txt"
    params:
        extra="" # optionally add further command line arguments
    log:
        "log/strling/merge/group.log"
    wrapper:
        "0.65.0/bio/strling/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling merge"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bins = snakemake.input.get("bins", None)
reference = snakemake.input.get("reference", None)
fai = snakemake.input.get("fai", None)

if not bins or len(bins) < 2:
    raise ValueError("Please provide at least two 'bins' as input.")

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if len(snakemake.output) != 1:
    raise ValueError("Please provide exactly one output file (.bin).")

if not snakemake.output[0].endswith("-bounds.txt"):
    raise ValueError(
        "Output file must end with '-bounds.txt'. Please change the output file name."
    )

prefix = snakemake.output[0][: -len("-bounds.txt")]

shell("(strling merge " "{bins} " "-o {prefix} " "{extra}) {log}")

TABIX

Process given file with tabix (e.g., create index).

Software dependencies
  • htslib ==1.10
Example

This wrapper can be used in the following way:

rule tabix:
    input:
        "{prefix}.vcf.gz"
    output:
        "{prefix}.vcf.gz.tbi"
    params:
        # pass arguments to tabix (e.g. index a vcf)
        "-p vcf"
    log:
        "logs/tabix/{prefix}.log"
    wrapper:
        "0.65.0/bio/tabix"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("tabix {snakemake.params} {snakemake.input[0]} {log}")

TRANSDECODER

For transdecoder, the following wrappers are available:

TRANSDECODER LONGORFS

TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.

Software dependencies
  • transdecoder=5.5.0
Example

This wrapper can be used in the following way:

rule transdecoder_longorfs:
    input:
        fasta="test.fa.gz", # required
        gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
    output:
        "test.fa.transdecoder_dir/longest_orfs.pep"
    log:
        "logs/transdecoder/test-longorfs.log"
    params:
        extra=""
    wrapper:
        "0.65.0/bio/transdecoder/longorfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder LongOrfs"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
    gtm_cmd = " --gene_trans_map " + gtm

output_dir = path.dirname(str(snakemake.output))

# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")
TRANSDECODER PREDICT

Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.

Software dependencies
  • transdecoder=5.5.0
Example

This wrapper can be used in the following way:

rule transdecoder_predict:
    input:
        fasta="test.fa.gz", # required input; optionally gzipped
        pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
        blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
        # you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
        #longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
    output:
        "test.fa.transdecoder.bed",
        "test.fa.transdecoder.cds",
        "test.fa.transdecoder.pep",
        "test.fa.transdecoder.gff3"
    log:
        "logs/transdecoder/test-predict.log"
    params:
        extra=""
    wrapper:
        "0.65.0/bio/transdecoder/predict"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder Predict"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
    addl_outputs += " --retain_pfam_hits " + pfam

blast = snakemake.input.get("blastp_hits", "")
if blast:
    addl_outputs += " --retain_blastp_hits " + blast

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")

TRIM_GALORE

For trim_galore, the following wrappers are available:

TRIM_GALORE-PE

Trim paired-end reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.5
Example

This wrapper can be used in the following way:

rule trim_galore_pe:
    input:
        ["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
    output:
        "trimmed/{sample}.1_val_1.fq.gz",
         "trimmed/{sample}.1.fastq.gz_trimming_report.txt",
         "trimmed/{sample}.2_val_2.fq.gz",
         "trimmed/{sample}.2.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.65.0/bio/trim_galore/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " --paired"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)
TRIM_GALORE-SE

Trim unpaired reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.3
Example

This wrapper can be used in the following way:

rule trim_galore_se:
    input:
        "reads/{sample}.fastq.gz"
    output:
        "trimmed/{sample}_trimmed.fq.gz",
         "trimmed/{sample}.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.65.0/bio/trim_galore/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)

TRIMMOMATIC

For trimmomatic, the following wrappers are available:

TRIMMOMATIC PE

Trim paired-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic_pe:
    input:
        r1="reads/{sample}.1.fastq.gz",
        r2="reads/{sample}.2.fastq.gz"
    output:
        r1="trimmed/{sample}.1.fastq.gz",
        r2="trimmed/{sample}.2.fastq.gz",
        # reads where trimming entirely removed the mate
        r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
        r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        compression_level="-9"
    threads:
        32
    wrapper:
        "0.65.0/bio/trimmomatic/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/pe

Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
    gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
    snakemake.output.r1,
    snakemake.output.r1_unpaired,
    snakemake.output.r2,
    snakemake.output.r2_unpaired,
]

trimmomatic_threads, input_threads, output_threads = distribute_threads(
    input_files, output_files, snakemake.threads
)

input_r1, input_r2 = [
    compose_input_gz(filename, input_threads) for filename in input_files
]

output_r1, output_r1_unp, output_r2, output_r2_unp = [
    compose_output_gz(filename, output_threads, compression_level)
    for filename in output_files
]

shell(
    "trimmomatic PE -threads {trimmomatic_threads} {extra} "
    "{input_r1} {input_r2} "
    "{output_r1} {output_r1_unp} "
    "{output_r2} {output_r2_unp} "
    "{trimmer} "
    "{log}"
)
TRIMMOMATIC SE

Trim single-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic:
    input:
        "reads/{sample}.fastq.gz"  # input and output can be uncompressed or compressed
    output:
        "trimmed/{sample}.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        # optional compression levels from -0 to -9 and -11
        compression_level="-9"
    threads:
        32
    wrapper:
        "0.65.0/bio/trimmomatic/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/se

Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_file, output_file, available_threads):
    gzipped_input_files = 1 if input_file.endswith(".gz") else 0
    gzipped_output_files = 1 if output_file.endswith(".gz") else 0
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
trimmomatic_threads, input_threads, output_threads = distribute_threads(
    snakemake.input[0], snakemake.output[0], snakemake.threads
)

# Collect files
input = compose_input_gz(snakemake.input[0], input_threads)
output = compose_output_gz(snakemake.output[0], output_threads, compression_level)

shell(
    "trimmomatic SE -threads {trimmomatic_threads} {extra} {input} {output} {trimmer} {log}"
)

TRINITY

Generate transcriptome assembly with Trinity

Software dependencies
  • trinity ==2.8.4
Example

This wrapper can be used in the following way:

rule trinity:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "trinity_out_dir/Trinity.fasta"
    log:
        'logs/trinity/trinity.log'
    params:
        extra=""
    threads: 4
    wrapper:
        "0.65.0/bio/trinity"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Trinity."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
max_memory = snakemake.params.get("max_memory", "10G")

# allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = (
    [snakemake.input.left]
    if isinstance(snakemake.input.left, str)
    else snakemake.input.left
)
right = snakemake.input.get("right")
if right:
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) >= len(
        right
    ), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
    input_str_left = " --left " + ",".join(left)
    input_str_right = " --right " + ",".join(right)
else:
    input_str_left = " --single " + ",".join(left)
    input_str_right = ""

input_cmd = " ".join([input_str_left, input_str_right])

# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
    if "fq" in left[0] or "fastq" in left[0]:
        seqtype = "fq"
    elif "fa" in left[0] or "fasta" in left[0]:
        seqtype = "fa"
    else:  # assertion is redundant - warning or error instead?
        assert (
            seqtype is not None
        ), "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"

outdir = path.dirname(snakemake.output[0])
assert "trinity" in outdir, "output directory name must contain 'trinity'"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "Trinity {input_cmd} --CPU {snakemake.threads} "
    " --max_memory {max_memory} --seqType {seqtype} "
    " --output {outdir} {snakemake.params.extra} "
    " {log}"
)

TXIMPORT

Import and summarize transcript-level estimates for both transcript-level and gene-level analysis.

Software dependencies
  • bioconductor-tximport==1.14.0
  • r-readr==1.3.1
  • r-jsonlite==1.6
Example

This wrapper can be used in the following way:

rule tximport:
    input:
        quant = expand("quant/A/quant.sf")
        # Optional transcript/gene links as described in tximport
        # tx2gene = /path/to/tx2gene
    output:
        txi = "txi.RDS"
    params:
        extra = "type='salmon', txOut=TRUE"
    wrapper:
        "0.65.0/bio/tximport"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Add any tximport options in the params, they will be transmitted through the R wrapper. Supplementary options will cause unknown parameters error.

Authors
  • Thibault Dayris
Code
#!/bin/R

# Loading library
base::library("tximport");   # Perform actual count importation in R
base::library("readr");      # Read faster!
base::library("jsonlite");   # Importing inferential replicates

# Cast input paths as character to avoid errors
samples_paths <- sapply(               # Sequentially apply
  snakemake@input[["quant"]],          # ... to all quantification paths
  function(quant) as.character(quant)  # ... a cast as character
);

# Collapse path into a character vector
samples_paths <- base::paste0(samples_paths, collapse = '", "');

# Building function arguments
extra <- base::paste0('files = c("', samples_paths, '")');

# Check if user provided optional transcript to gene table
if ("tx_to_gene" %in% names(snakemake@input)) {
  tx2gene <- readr::read_tsv(snakemake@input[["tx_to_gene"]]);
  extra <- base::paste(
    extra,                 # Foreward existing arguments
    ", tx2gene = ",        # Argument name
    "tx2gene"              # Add tx2gene to parameters
  );
}

# Add user defined arguments
if ("extra" %in% names(snakemake@params)) {
  if (snakemake@params[["extra"]] != "") {
    extra <- base::paste(
      extra,                       # Foreward existing parameters
      snakemake@params[["extra"]], # Add user parameters
      sep = ", "                   # Field separator
    );
  }
}


print(extra);
# Perform tximport work
txi <- base::eval(                        # Evaluate the following
  base::parse(                            # ... parsed expression
    text = base::paste0(
      "tximport::tximport(", extra, ");"  # ... of tximport and its arguments
    )
  )
);

# Save results
base::saveRDS(                       # Save R object
  object = txi,                      # The txi object
  file = snakemake@output[["txi"]]   # Output path is provided by Snakemake
);

UCSC

For ucsc, the following wrappers are available:

BEDGRAPHTOBIGWIG

Convert *.bedGraph file to *.bw file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-bedgraphtobigwig == 377
Example

This wrapper can be used in the following way:

rule bedGraphToBigWig:
    input:
        bedGraph="{sample}.bedGraph",
        chromsizes="genome.chrom.sizes"
    output:
        "{sample}.bw"
    log:
        "logs/{sample}.bed-graph_to_big-wig.log"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/ucsc/bedGraphToBigWig"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.bedGraph to *.bw conversion using UCSC bedGraphToBigWig tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "bedGraphToBigWig {extra}"
    " {snakemake.input.bedGraph} {snakemake.input.chromsizes}"
    " {snakemake.output} {log}"
)
FATOTWOBIT

Convert *.fa file to *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-fatotwobit == 377
Example

This wrapper can be used in the following way:

# Example: from *.fa file
rule faToTwoBit_fa:
    input:
        "{sample}.fa"
    output:
        "{sample}.2bit"
    log:
        "logs/{sample}.fa_to_2bit.log"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/ucsc/faToTwoBit"

# Example: from *.fa.gz file
rule faToTwoBit_fa_gz:
    input:
        "{sample}.fa.gz"
    output:
        "{sample}.2bit"
    log:
        "logs/{sample}.fa-gz_to_2bit.log"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/ucsc/faToTwoBit"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC faToTwoBit tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("faToTwoBit {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITINFO

Generate *.chorom.sizes file by *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-twobitinfo == 377
Example

This wrapper can be used in the following way:

rule twoBitInfo:
    input:
        "{sample}.2bit"
    output:
        "{sample}.chrom.sizes"
    log:
        "logs/{sample}.chrom.sizes.log"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/ucsc/twoBitInfo"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitInfo tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitInfo {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITTOFA

Convert *.2bit file to *.fa file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-twobittofa == 377
Example

This wrapper can be used in the following way:

rule twoBitToFa:
    input:
        "{sample}.2bit"
    output:
        "{sample}.fa"
    log:
        "logs/{sample}.2bit_to_fa.log"
    params:
        "" # optional params string
    wrapper:
        "0.65.0/bio/ucsc/twoBitToFa"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitToFa tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitToFa {extra} {snakemake.input} {snakemake.output} {log}")

UMIS

For umis, the following wrappers are available:

UMIS BAMTAG

Convert a BAM/SAM with fastqtransformed read names to have UMI and

Software dependencies
  • umis ==1.0.3
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule umis_bamtag:
    input:
        "data/{sample}.bam"
    output:
        "data/{sample}.annotated.bam"
    log:
        "logs/umis/bamtag/{sample}.log"
    params:
        extra=""
    threads: 1
    wrapper:
        "0.65.0/bio/umis/bamtag"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
    raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

in_pipe = ""
if bam_input.endswith(".sam"):
    in_pipe = "cat "
else:
    in_pipe = "samtools view -h "

out_pipe = ""
if not output_file.endswith(".sam"):
    out_pipe = " | samtools view -S -b - "

shell(
    " {in_pipe} {bam_input} | " " umis bamtag -" " {out_pipe} > {output_file}" " {log}"
)

VARSCAN

For varscan, the following wrappers are available:

VARSCAN MPILEUP2INDEL

Detect indel in NGS data from mpileup files

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling Indel with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "0.65.0/bio/varscan/mpileup2indel"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2indel"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2indel "  # Tool and its subprocess
    "{extra} "  # Extra parameters
    "<( {pileup} ) "
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN MPILEUP2SNP

Detect variants in NGS data from Samtools mpileup

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling SNP with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "0.65.0/bio/varscan/mpileup2snp"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2snp"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2snp "  # Tool and its subprocess
    "{extra} "  # Extra parameters
    "<( {pileup} ) "
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN SOMATIC

Varscan Somatic calls variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule varscan_somatic:
    input:
        # A pair of pileup files can be used *instead* of the mpileup
        # normal_pileup = ""
        # tumor_pileup = ""
        mpileup = "mpileup/{sample}.mpileup.gz"
    output:
        snp = "vcf/{sample}.snp.vcf",
        indel = "vcf/{sample}.indel.vcf"
    message:
        "Calling somatic variants {wildcards.sample}"
    threads:
        1
    params:
        extra = ""
    wrapper:
        "0.65.0/bio/varscan/somatic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for varscan somatic"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"


import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs

# Defining logging and gathering extra parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

# Building output dirs
makedirs(op.dirname(snakemake.output.snp))
makedirs(op.dirname(snakemake.output.indel))

# Output prefix
prefix = op.splitext(snakemake.output.snp)[0]

# Searching for input files
pileup_pair = ["normal_pileup", "tumor_pileup"]

in_pileup = ""
mpileup = ""
if "mpileup" in snakemake.input.keys():
    # Case there is a mpileup with both normal and tumor
    in_pileup = snakemake.input.mpileup
    mpileup = "--mpileup 1"
elif all(pileup in snakemake.input.keys() for pileup in pileup_pair):
    # Case there are two separate pileup files
    in_pileup = " {snakemake.input.normal_pileup}" " {snakemakeinput.tumor_pileup} "
else:
    raise KeyError("Could not find either a mpileup, or a pair of pileup files")

shell(
    "varscan somatic"  # Tool and its subcommand
    " {in_pileup}"  # Path to input file(s)
    " {prefix}"  # Path to output
    " {extra}"  # Extra parameters
    " {mpileup}"
    " --output-snp {snakemake.output.snp}"  # Path to snp output file
    " --output-indel {snakemake.output.indel}"  # Path to indel output file
)

VCFTOOLS

For vcftools, the following wrappers are available:

VCFTOOLS FILTER

Filter vcf files using vcftools

Software dependencies
  • vcftools ==0.1.16
Example

This wrapper can be used in the following way:

rule filter_vcf:
    input:
        "{sample}.vcf"
    output:
        "{sample}.filtered.vcf"
    params:
        extra="--chr 1 --recode-INFO-all"
    wrapper:
        "0.65.0/bio/vcftools/filter"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
    input_flag = "--gzvcf"

output = " > " + snakemake.output[0]
if output.endswith(".gz"):
    output = " | gzip" + output

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vcftools "
    "{input_flag} "
    "{snakemake.input} "
    "{extra} "
    "--recode "
    "--stdout "
    "{output} "
    "{log}"
)

VEMBRANE

Vembrane allows to simultaneously filter variants based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. If no annotation entry remains, the entire variant is deleted. https://github.com/vembrane/vembrane

Software dependencies
  • vembrane =0.4.1
Example

This wrapper can be used in the following way:

rule vembrane:
    input:
        vcf="in.vcf",
    output:
        vcf="filtered/out.vcf"
    params:
        expression="POS > 4000",
        extra=""
    log:
        "logs/vembrane.log"
    wrapper:
        "0.65.0/bio/vembrane"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for vembrane"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vembrane"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    ' "{snakemake.params.expression}"'
    " {snakemake.input.vcf}"  # Path to input vcf file
    " > {snakemake.output.vcf}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)

VEP

For vep, the following wrappers are available:

VEP ANNOTATE

Annotate variant calls with VEP.

Software dependencies
  • ensembl-vep =100
  • bcftools =1.9
Example

This wrapper can be used in the following way:

rule annotate_variants:
    input:
        calls="variants.bcf",  # .vcf, .vcf.gz or .bcf
        cache="resources/vep/cache",
        plugins="resources/vep/plugins",
    output:
        calls="variants.annotated.bcf",  # .vcf, .vcf.gz or .bcf
        stats="variants.html"
    params:
        # Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
        # Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
        plugins=["LoFtool"],
        extra="--everything"  # optional: extra arguments
    log:
        "logs/vep/annotate.log"
    threads: 4
    wrapper:
        "0.65.0/bio/vep/annotate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell


def get_only_child_dir(path):
    children = [child for child in path.iterdir() if child.is_dir()]
    assert (
        len(children) == 1
    ), "Invalid VEP cache directory, only a single entry is allowed, make sure that cache was created with the snakemake VEP cache wrapper"
    return children[0]


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fork = "--fork {}".format(snakemake.threads) if snakemake.threads > 1 else ""
stats = snakemake.output.stats
cache = snakemake.input.cache
plugins = snakemake.input.plugins

entrypath = get_only_child_dir(get_only_child_dir(Path(cache)))
species = entrypath.parent.name
release, build = entrypath.name.split("_")

load_plugins = " ".join(map("--plugin {}".format, snakemake.params.plugins))

if snakemake.output.calls.endswith(".vcf.gz"):
    fmt = "z"
elif snakemake.output.calls.endswith(".bcf"):
    fmt = "b"
else:
    fmt = "v"

shell(
    "(bcftools view {snakemake.input.calls} | "
    "vep {extra} {fork} "
    "--format vcf "
    "--vcf "
    "--cache "
    "--cache_version {release} "
    "--species {species} "
    "--assembly {build} "
    "--dir_cache {cache} "
    "--dir_plugins {plugins} "
    "--offline "
    "{load_plugins} "
    "--output_file STDOUT "
    "--stats_file {stats} | "
    "bcftools view -O{fmt} > {snakemake.output.calls}) {log}"
)
VEP DOWNLOAD CACHE

Download VEP cache for given species, build and release.

Software dependencies
  • ensembl-vep
Example

This wrapper can be used in the following way:

rule get_vep_cache:
    output:
        directory("resources/vep/cache")
    params:
        species="saccharomyces_cerevisiae",
        build="R64-1-1",
        release="98"
    log:
        "logs/vep/cache.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.65.0/bio/vep/cache"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "vep_install --AUTO cf "
    "--SPECIES {snakemake.params.species} "
    "--ASSEMBLY {snakemake.params.build} "
    "--CACHE_VERSION {snakemake.params.release} "
    "--CACHEDIR {snakemake.output} "
    "--CONVERT "
    "--NO_UPDATE {log}"
)
VEP DOWNLOAD PLUGINS

Download VEP plugins.

Software dependencies
  • python =3
Example

This wrapper can be used in the following way:

rule download_vep_plugins:
    output:
        directory("resources/vep/plugins")
    params:
        release=100
    wrapper:
        "0.65.0/bio/vep/plugins"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import sys
from pathlib import Path
from urllib.request import urlretrieve
from zipfile import ZipFile
from tempfile import NamedTemporaryFile

if snakemake.log:
    sys.stderr = open(snakemake.log[0], "w")

outdir = Path(snakemake.output[0])
outdir.mkdir()

with NamedTemporaryFile() as tmp:
    urlretrieve(
        "https://github.com/Ensembl/VEP_plugins/archive/release/{release}.zip".format(
            release=snakemake.params.release
        ),
        tmp.name,
    )

    with ZipFile(tmp.name) as f:
        for member in f.infolist():
            memberpath = Path(member.filename)
            if len(memberpath.parts) == 1:
                # skip root dir
                continue
            targetpath = outdir / memberpath.relative_to(memberpath.parts[0])
            if member.is_dir():
                targetpath.mkdir()
            else:
                with open(targetpath, "wb") as out:
                    out.write(f.read(member.filename))