The Snakemake Wrappers repository

https://img.shields.io/badge/snakemake-≥5.7.0-brightgreen.svg?style=flat-square https://github.com/snakemake/snakemake-wrappers/workflows/Tests/badge.svg?branch=master

The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.

Usage

The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "0.2.0/bio/samtools/sort"

Here, Snakemake will automatically download and use the corresponding wrapper files from https://github.com/snakemake/snakemake-wrappers/tree/0.2.0/bio/samtools/sort. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id. This ensures reproducibility since changes in the wrapper implementation will only be propagated to your workflow if you update that version tag.

Each wrapper defines required software packages and versions in an environment.yaml file. In combination with the --use-conda flag of Snakemake, this will be deployed automatically.

Alternatively, for example for development, the wrapper directive can also point to full URLs, including the local file://. For this to work, you need to provide the (remote) path to the directory containing the wrapper.* and environment.yaml files. For the above example, the explicit GitHub URL to specify would need to be the /raw/ version of the directory:

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "https://github.com/snakemake/snakemake-wrappers/raw/0.2.0/bio/samtools/sort"

Contributing

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute refer to the contributing guide.

Wrappers

Wrappers allow to quickly use popular tools and libraries in Snakemake workflows.

The menu on the left (expand by clicking (+) if necessary), lists all available wrappers.

ADAPTERREMOVAL

rapid adapter trimming, identification, and read merging.

URL:

Example

This wrapper can be used in the following way:

rule adapterremoval_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        fq="trimmed/se/{sample}.fastq.gz",                               # trimmed reads
        discarded="trimmed/se/{sample}.discarded.fastq.gz",              # reads that did not pass filters
        settings="stats/se/{sample}.settings"                            # parameters as well as overall statistics
    log:
        "logs/adapterremoval/se/{sample}.log"
    params:
        adapters="--adapter1 ACGGCTAGCTA",
        extra="",
    threads: 1
    wrapper:
        "v0.87.0/bio/adapterremoval"


rule adapterremoval_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        fq1="trimmed/pe/{sample}_R1.fastq.gz",                           # trimmed mate1 reads
        fq2="trimmed/pe/{sample}_R2.fastq.gz",                           # trimmed mate2 reads
        collapsed="trimmed/pe/{sample}.collapsed.fastq.gz",              # overlapping mate-pairs which have been merged into a single read
        collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz",  # collapsed reads that were quality trimmed
        singleton="trimmed/pe/{sample}.singleton.fastq.gz",              # mate-pairs for which the mate has been discarded
        discarded="trimmed/pe/{sample}.discarded.fastq.gz",              # reads that did not pass filters
        settings="stats/pe/{sample}.settings"                            # parameters as well as overall statistics
    log:
        "logs/adapterremoval/pe/{sample}.log"
    params:
        adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
        extra="--collapse --collapse-deterministic",
    threads: 2
    wrapper:
        "v0.87.0/bio/adapterremoval"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • adapterremoval=2.3
Input/Output

Input:

  • raw fastq file with R1 reads
  • raw fastq file with R2 reads (PE only)

Output:

  • trimmed fastq file with R1 reads
  • trimmed fastq file with R2 reads (PE only)
  • fastq file with singleton reads (PE only; PE reads for which the mate has been discarded)
  • fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read)
  • fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed)
  • fastq file with discarded reads (reads that did not pass filters)
  • settings and stats file
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path
import re

extra = snakemake.params.get("extra", "") + " "
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Check input files
n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."


# Input files
if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra:
    reads = "--file1 {}".format(snakemake.input.sample)
else:
    reads = "--file1 {} --file2 {}".format(*snakemake.input.sample)


# Gzip or Bzip compressed output?
compress_out = ""
if all(
    [
        Path(value).suffix == ".gz"
        for key, value in snakemake.output.items()
        if key != "settings"
    ]
):
    compress_out = "--gzip"
elif all(
    [
        Path(value).suffix == ".bz2"
        for key, value in snakemake.output.items()
        if key != "settings"
    ]
):
    compress_out = "--bzip2"
else:
    raise ValueError(
        "all output files (except for 'settings') must be compressed the same way"
    )


# Output files
if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra:
    trimmed = f"--output1 {snakemake.output.fq}"
else:
    trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}"

    # Output singleton files
    singleton = snakemake.output.get("singleton", None)
    if singleton:
        trimmed += f" --singleton {singleton}"

    # Output collapsed PE reads
    collapsed = snakemake.output.get("collapsed", None)
    if collapsed:
        if not re.search(r"--collapse\b", extra):
            raise ValueError(
                "output.collapsed specified but '--collapse' option missing from params.extra"
            )
        trimmed += f" --outputcollapsed {collapsed}"

    # Output collapsed and truncated PE reads
    collapsed_trunc = snakemake.output.get("collapsed_trunc", None)
    if collapsed_trunc:
        if not re.search(r"--collapse\b", extra):
            raise ValueError(
                "output.collapsed_trunc specified but '--collapse' option missing from params.extra"
            )
        trimmed += f" --outputcollapsedtruncated {collapsed_trunc}"


shell(
    "(AdapterRemoval --threads {snakemake.threads} "
    "{reads} "
    "{adapters} "
    "{extra} "
    "{compress_out} "
    "{trimmed} "
    "--discarded {snakemake.output.discarded} "
    "--settings {snakemake.output.settings}"
    ") {log}"
)

ARRIBA

Detect gene fusions from chimeric STAR output

URL:

Example

This wrapper can be used in the following way:

rule arriba:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fasta",
        # path to annotation gtf
        annotation="annotation.gtf",
    output:
        # approved gene fusions
        fusions="fusions/{sample}.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.discarded.tsv" # optional
    log:
        "logs/arriba/{sample}.log"
    params:
        # arriba blacklist file
        blacklist="blacklist.tsv", # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
        # file containing known fusions
        known_fusions="", # optional
        # file containing information from structural variant analysis
        sv_file="", # optional
        # optional parameters
        extra="-T -P -i 1,2"
    threads: 1
    wrapper:
        "v0.87.0/bio/arriba"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • arriba==1.1.0
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
    discarded_cmd = "-O " + discarded_fusions
else:
    discarded_cmd = ""

blacklist = snakemake.params.get("blacklist")
if blacklist:
    blacklist_cmd = "-b " + blacklist
else:
    blacklist_cmd = ""

known_fusions = snakemake.params.get("known_fusions")
if known_fusions:
    known_cmd = "-k" + known_fusions
else:
    known_cmd = ""

sv_file = snakemake.params.get("sv_file")
if sv_file:
    sv_cmd = "-d" + sv_file
else:
    sv_cmd = ""

shell(
    "arriba "
    "-x {snakemake.input.bam} "
    "-a {snakemake.input.genome} "
    "-g {snakemake.input.annotation} "
    "{blacklist_cmd} "
    "{known_cmd} "
    "{sv_cmd} "
    "-o {snakemake.output.fusions} "
    "{discarded_cmd} "
    "{extra} "
    "{log}"
)

ART

For art, the following wrappers are available:

ART_PROFILER_ILLUMINA

Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.

URL:

Example

This wrapper can be used in the following way:

rule art_profiler_illumina:
    input:
        "data/{sample}.fq",
    output:
        "profiles/{sample}.txt"
    log:
        "logs/art_profiler_illumina/{sample}.log"
    params: ""
    threads: 2
    wrapper:
        "v0.87.0/bio/art/profiler_illumina"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • art==2016.06.05
Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


from snakemake.shell import shell
import os
import tempfile
import re


# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
    # ensure that .fastq and .fastq.gz input files work, as well
    filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")

    # figure out the exact file extension after the above substitution
    ext = re.search("fq(\.gz)?$", filename)
    if ext:
        fq_extension = ext.group(0)
    else:
        raise IOError(
            "Incompatible extension: This art_profiler_illumina "
            "wrapper requires input files with one of the following "
            "extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
            "your input and the invocation of the wrapper accordingly."
        )

    os.symlink(
        # snakemake paths are relative, but the symlink needs to be absolute
        os.path.abspath(snakemake.input[0]),
        # the following awkward file name generation has reasons:
        # * the file name needs to be unique to the execution of the
        #   rule, as art will create and mv temporary files with its basename
        #   in the output directory, which causes utter confusion when
        #   executing instances of the rule in parallel
        # * temp file name cannot have any read infixes before the file
        #   extension, because otherwise art does read enumeration magic
        #   that messes up output file naming
        os.path.join(
            temp_input,
            filename.replace(
                "." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
            ),
        ),
    )

    # include output folder name in the profile_name command line argument and
    # strip off the file extension, as art will add its own ".txt"
    profile_name = os.path.join(
        os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
    )

    shell(
        "( art_profiler_illumina {snakemake.params} {profile_name}"
        " {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
    )

ASSEMBLY-STATS

Generates report of summary statistics for a genome assembly

URL:

Example

This wrapper can be used in the following way:

rule run_assembly_stats:
    input:
        #Input assembly
        assembly="{sample}.fasta",
    output:
        #Assembly statistics
        assembly_stats="{sample}_stats.txt",
    params:
        # Tab delimited output, with a header, is set as the default. Other options are available:
        #   -l <int>
        #       Minimum length cutoff for each sequence.
        #       Sequences shorter than the cutoff will be ignored [1]
        #   -s
        #       Print 'grep friendly' output
        #   -t
        #       Print tab-delimited output
        #   -u
        #       Print tab-delimited output with no header line
        # If you want to add multiple options just delimit them with a space.
        # Note that you can only pick one output format
        # Check https://github.com/sanger-pathogens/assembly-stats for more details
        extra="-t",
    log:
        "logs/{sample}.assembly-stats.log",
    threads: 1
    wrapper:
        "v0.87.0/bio/assembly-stats"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • assembly-stats=1.0
Input/Output

Input:

  • Genomic assembly (fasta format)

Output:

  • Assembly statistics (format of your choosing, default = tab-delimited)
Notes
Authors
  • Pathogen Informatics, Wellcome Sanger Institute (assembly-stats tool) - https://github.com/sanger-pathogens
  • Max Cummins (Snakemake wrapper [unaffiliated with Wellcome Sanger Institute])
Code
__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "assembly-stats"
    " {snakemake.params.extra}"
    " {snakemake.input.assembly}"
    " > {snakemake.output.assembly_stats}"
    " {log}"
)

BAMTOOLS

For bamtools, the following wrappers are available:

BAMTOOLS FILTER

Filters BAM files. For more information about bamtools see bamtools documentation and bamtools source code.

URL:

Example

This wrapper can be used in the following way:

rule bamtools_filter:
    input:
        "{sample}.bam"
    output:
        "filtered/{sample}.bam"
    params:
        # optional parameters
        tags = [ "NM:<4", "MQ:>=10" ],    # list of key:value pair strings
        min_size = "-2000",
        max_size = "2000",
        min_length = "10",
        max_length = "20",
        # to add more optional parameters (see bamtools filter --help):
        additional_params = "-mapQuality \">=0\" -isMapped \"true\""
    log:
        "logs/bamtools/filtered/{sample}.log"
    wrapper:
        "v0.87.0/bio/bamtools/filter"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bamtools==2.5.1
Input/Output

Input:

  • bam files (.bam)

Output:

  • bam file (.bam)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# extract arguments
params = ""
extra_limits = ""
tags = snakemake.params.get("tags")
min_size = snakemake.params.get("min_size")
max_size = snakemake.params.get("max_size")
min_length = snakemake.params.get("min_length")
max_length = snakemake.params.get("max_length")
additional_params = snakemake.params.get("additional_params")

if tags and tags is not None:
    params = params + " " + " ".join(map('-tag "{}"'.format, tags))

if min_size and min_size is not None:
    params = params + ' -insertSize ">=' + min_size + '"'
    if max_size and max_size is not None:
        extra_limits = extra_limits + ' -insertSize "<=' + max_size + '"'
else:
    if max_size and max_size is not None:
        params = params + ' -insertSize "<=' + max_size + '"'

if min_length and min_length is not None:
    params = params + ' -length ">=' + min_length + '"'
    if max_length and max_length is not None:
        extra_limits = extra_limits + ' -length "<=' + max_length + '"'
else:
    if max_length and max_length is not None:
        params = params + ' -length "<=' + max_length + '"'

if additional_params and additional_params is not None:
    params = params + " " + additional_params

if extra_limits:
    params = params + " | bamtools filter" + extra_limits

shell(
    "(bamtools filter"
    " -in {snakemake.input[0]}" + params + " -out {snakemake.output[0]}) {log}"
)
BAMTOOLS FILTER WITH JSON

Filters BAM files with JSON-script for filtering parameters and rules. For more information about bamtools see bamtools documentation and bamtools source code.

URL:

Example

This wrapper can be used in the following way:

rule bamtools_filter_json:
    input:
        "{sample}.bam"
    output:
        "filtered/{sample}.bam"
    params:
        json="filtering-rules.json",
        region="" # optional parameter for defining a specific region, e.g. "chr1:500..chr3:750"
    log:
        "logs/bamtools/filtered/{sample}.log"
    wrapper:
        "v0.87.0/bio/bamtools/filter_json"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bamtools==2.5.1
Input/Output

Input:

  • bam files (.bam)
  • json file (.json)

Output:

  • bam file (.bam)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

region = snakemake.params.get("region")
region_param = ""

if region and region is not None:
    region_param = ' -region "' + region + '"'

shell(
    "(bamtools filter"
    " -in {snakemake.input[0]}"
    " -out {snakemake.output[0]}"
    + region_param
    + " -script {snakemake.params.json}) {log}"
)
BAMTOOLS SPLIT

Split bam file into sub files, default by reference

URL:

Example

This wrapper can be used in the following way:

rule bamtools_split:
    input:
        "mapped/{sample}.bam",
    output:
        "mapped/{sample}.REF_xx.bam",
    params:
        extra="-reference",
    log:
        "logs/bamtoos_split/{sample}.log",
    wrapper:
        "v0.87.0/bio/bamtools/split"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bamtools==2.5.1
Input/Output

Input:

  • bam file

Output:

  • multiple bam file
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

if len(snakemake.input) != 1:
    raise ValueError("One bam input file expected, got: " + str(len(snakemake.input)))

shell("bamtools split -in {snakemake.input} {extra} {log}")
BAMTOOLS STATS

Use bamtools to collect statistics from a BAM file. For more information about bamtools see bamtools documentation and bamtools source code.

URL:

Example

This wrapper can be used in the following way:

rule bamtools_stats:
    input:
        "{sample}.bam"
    output:
        "{sample}.bamstats"
    params:
        "-insert" # optional summarize insert size data
    log:
        "logs/bamtools/stats/{sample}.log"
    wrapper:
        "v0.87.0/bio/bamtools/stats"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bamtools==2.5.1
Input/Output

Input:

  • bam files (.bam)

Output:

  • bamstats file (.bamstats)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "(bamtools stats {snakemake.params} -in {snakemake.input[0]} > {snakemake.output[0]}) {log}"
)

BBTOOLS

For bbtools, the following wrappers are available:

BBDUK

Run BBDuk.

URL:

Example

This wrapper can be used in the following way:

rule bbduk_se:
    input:
        sample=["reads/se/{sample}.fastq"],
        adapters="reads/adapt.fas",
    output:
        trimmed="trimmed/se/{sample}.fastq.gz",
        singleton="trimmed/se/{sample}.single.fastq.gz",
        discarded="trimmed/se/{sample}.discarded.fastq.gz",
        stats="trimmed/se/{sample}.stats.txt",
    log:
        "logs/bbduk/se/{sample}.log"
    params:
        extra = lambda w, input: "ref={},adapters,artifacts ktrim=r k=23 mink=11 hdist=1 tpe tbo trimpolygright=10 minlen=25 maxns=30 entropy=0.5 entropywindow=50 entropyk=5".format(input.adapters),
    threads: 7
    wrapper:
        "v0.87.0/bio/bbtools/bbduk"


rule bbduk_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"],
        adapters="reads/adapt.fas",
    output:
        trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
        singleton="trimmed/pe/{sample}.single.fastq",
        discarded="trimmed/pe/{sample}.discarded.fastq",
        stats="trimmed/pe/{sample}.stats.txt",
    log:
        "logs/fastp/pe/{sample}.log"
    params:
        extra = lambda w, input: "ref={},adapters,artifacts ktrim=r k=23 mink=11 hdist=1 tpe tbo trimpolygright=10 minlen=25 maxns=30 entropy=0.5 entropywindow=50 entropyk=5".format(input.adapters),
    threads: 7
    wrapper:
        "v0.87.0/bio/bbtools/bbduk"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bbmap==38.90
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • raw fastq file with R1 reads
  • raw fastq file with R2 reads (optional)

Output:

  • trimmed fastq file with R1 reads
  • trimmed fastq file with R2 reads (optional)
  • fastq file with singleton reads (optional)
  • fastq file with discarded reads (optional)
  • stats file (optonal)
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "in={}".format(snakemake.input.sample)
    trimmed = "out={}".format(snakemake.output.trimmed)
else:
    reads = "in={} in2={}".format(*snakemake.input.sample)
    trimmed = "out={} out2={}".format(*snakemake.output.trimmed)


singleton = snakemake.output.get("singleton", "")
if singleton:
    singleton = f"outs={singleton}"


discarded = snakemake.output.get("discarded", "")
if discarded:
    discarded = f"outm={discarded}"


stats = snakemake.output.get("stats", "")
if stats:
    stats = f"stats={stats}"


shell(
    "bbduk.sh {java_opts} t={snakemake.threads} "
    "{reads} "
    "{adapters} "
    "{extra} "
    "{trimmed} {singleton} {discarded} "
    "{stats} "
    "{log}"
)

BCFTOOLS

For bcftools, the following wrappers are available:

BCFTOOLS CALL

Call variants with bcftools call.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_call:
    input:
        pileup="{sample}.pileup.bcf",
    output:
        calls="{sample}.calls.bcf",
    params:
        caller="-m", # valid options include -c/--consensus-caller or -m/--multiallelic-caller
        options="--ploidy 1 --prior 0.001",
    log:
        "logs/bcftools_call/{sample}.log",
    wrapper:
        "v0.87.0/bio/bcftools/call"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
Authors
  • Johannes Köster
  • Michael Hall
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


class CallerOptionError(Exception):
    pass


valid_caller_opts = {"-c", "--consensus-caller", "-m", "--multiallelic-caller"}

caller_opt = snakemake.params.get("caller", "")
if caller_opt.strip() not in valid_caller_opts:
    raise CallerOptionError(
        "bcftools call expects either -m/--multiallelic-caller or "
        "-c/--consensus-caller as caller option."
    )

options = snakemake.params.get("options", "")

shell(
    "bcftools call {options} {caller_opt} --threads {snakemake.threads} "
    "-o {snakemake.output.calls} {snakemake.input.pileup} "
    "{log}"
)
BCFTOOLS CONCAT

Concatenate vcf/bcf files with bcftools. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_concat:
    input:
        calls=["a.bcf", "b.bcf"],
    output:
        "all.bcf",
    log:
        "logs/all.log",
    params:
        uncompressed_bcf=False,
        extra="",  # optional parameters for bcftools concat (except -o)
    threads: 4
    resources:
        mem_mb=10,
    wrapper:
        "v0.87.0/bio/bcftools/concat"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.12
  • snakemake-wrapper-utils==0.2.0
Input/Output

Input:

  • vcf files

Output:

  • Concatenated VCF/BCF file
Notes
  • The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
  • The extra param alllows for additional program arguments (not –threads, `-O/–output-type, -m/–max-mem, or -T/–temp-dir).
  • For more information see, https://samtools.github.io/bcftools/bcftools.html
Authors
  • Johannes Köster
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts


bcftools_opts = get_bcftools_opts(snakemake, parse_memory=False)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "bcftools concat {snakemake.params.extra} {bcftools_opts} -o {snakemake.output[0]} "
    "{snakemake.input.calls} "
    "{log}"
)
BCFTOOLS FILTER

filter vcf/bcf file.

URL:

Example

This wrapper can be used in the following way:

rule bcf_filter_sample:
    input:
        "{prefix}.bcf",  # input bcf/vcf needs to be first input
        samples="samples.txt",  # other inputs, e.g. sample files, are optional
    output:
        "{prefix}.filter_sample.vcf",
    log:
        "log/{prefix}.filter_sample.vcf.log",
    params:
        filter=lambda w, input: f"--exclude 'GT[@{input.samples}]=\"0/1\"'",
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/filter"


rule bcf_filter_o_vcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.filter.vcf",
    log:
        "log/{prefix}.filter.vcf.log",
    params:
        filter="-i 'QUAL > 5'",
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/filter"


rule bcf_filter_o_vcf_gz:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.filter.vcf.gz",
    log:
        "log/{prefix}.filter.vcf.gz.log",
    params:
        filter="-i 'QUAL > 5'",
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/filter"


rule bcf_filter_o_bcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.filter.bcf",
    log:
        "log/{prefix}.filter.bcf.log",
    params:
        filter="-i 'QUAL > 5'",
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/filter"


rule bcf_filter_o_uncompressed_bcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.filter.uncompressed.bcf",
    log:
        "log/{prefix}.filter.uncompressed.bcf.log",
    params:
        uncompressed_bcf=True,
        filter="-i 'QUAL > 5'",
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/filter"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools==1.12
  • snakemake-wrapper-utils==0.2
Input/Output

Input:

  • VCF/BCF file

Output:

  • Filtered VCF/BCF file
Notes
  • The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
  • The bcftools_use_mem param controls whether to pass the resources.mem_mb to bcftools
  • The extra param allows for additional program arguments (not –threads, `-O/–output-type, -m/–max-mem, or -T/–temp-dir).
  • For more information see, https://samtools.github.io/bcftools/bcftools.html
Authors
  • Patrik Smeds
  • Nikos Tsardakas Renhuldt
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts

bcftools_opts = get_bcftools_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

if len(snakemake.output) > 1:
    raise Exception("Only one output file expected, got: " + str(len(snakemake.output)))

filter = snakemake.params.get("filter", "")
extra = snakemake.params.get("extra", "")

shell(
    "bcftools filter {filter} {extra} {snakemake.input[0]} "
    "{bcftools_opts} "
    "-o {snakemake.output[0]} "
    "{log}"
)
BCFTOOLS INDEX

Index vcf/bcf file. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_index:
    input:
        "a.bcf"
    output:
        "a.bcf.csi"
    params:
        extra=""  # optional parameters for bcftools index
    wrapper:
        "v0.87.0/bio/bcftools/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
Authors
  • Jan Forster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

## Extract arguments
extra = snakemake.params.get("extra", "")

shell("bcftools index {extra} {snakemake.input[0]} {log}")
BCFTOOLS MERGE

Merge vcf/bcf files with bcftools. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_merge:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "v0.87.0/bio/bcftools/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "bcftools merge {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls} "
    "{log}"
)
BCFTOOLS MPILEUP

Generate VCF or BCF containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files with bcftools mpileup.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_mpileup:
    input:
        index="genome.fasta.fai",
        ref="genome.fasta", # this can be left out if --no-reference is in options
        alignments="mapped/{sample}.bam",
    output:
        pileup="pileups/{sample}.pileup.bcf",
    params:
        options="--max-depth 100 --min-BQ 15",
    log:
        "logs/bcftools_mpileup/{sample}.log",
    wrapper:
        "v0.87.0/bio/bcftools/mpileup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


class MissingReferenceError(Exception):
    pass


options = snakemake.params.get("options", "")

# determine if a fasta reference is provided or not and add to options
if "--no-reference" not in options:
    ref = snakemake.input.get("ref", "")
    if not ref:
        raise MissingReferenceError(
            "The --no-reference option was not given, but no fasta reference was "
            "provided."
        )
    options += " --fasta-ref {}".format(ref)

shell(
    "bcftools mpileup {options} --threads {snakemake.threads} "
    "--output {snakemake.output.pileup} "
    "{snakemake.input.alignments} "
    "{log}"
)
BCFTOOLS NORM

Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule norm_vcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.norm.vcf",
    log:
        "{prefix}.norm.log",
    params:
        extra="--rm-dup none",  # optional
    wrapper:
        "v0.87.0/bio/bcftools/norm"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
  • snakemake-wrapper-utils=0.2
Authors
  • Dayne Filer
  • Filipe G. Vieira
Code
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts

bcftools_opts = get_bcftools_opts(snakemake, parse_memory=False, parse_temp_dir=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "bcftools norm {bcftools_opts} {extra} {snakemake.input[0]} -o {snakemake.output[0]} {log}"
)
BCFTOOLS REHEADER

Change header or sample names of vcf/bcf file. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_reheader:
    input:
        vcf="a.bcf",
        ## new header, can be omitted if "samples" is set
        header="header.txt",
        ## file containing new sample names, can be omitted if "header" is set
        samples="samples.tsv"
    output:
        "a.reheader.bcf"
    params:
        extra="",  # optional parameters for bcftools reheader
        view_extra="-O b"  # add output format for internal bcftools view call
    wrapper:
        "v0.87.0/bio/bcftools/reheader"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

## Extract arguments
header = snakemake.input.get("header", "")
if header:
    header_cmd = "-h " + header
else:
    header_cmd = ""

samples = snakemake.input.get("samples", "")
if samples:
    samples_cmd = "-s " + samples
else:
    samples_cmd = ""

extra = snakemake.params.get("extra", "")
view_extra = snakemake.params.get("view_extra", "")

shell(
    "bcftools reheader "
    "{extra} "
    "{header_cmd} "
    "{samples_cmd} "
    "{snakemake.input.vcf} "
    "| bcftools view "
    "{view_extra} "
    "> {snakemake.output} "
    "{log}"
)
BCFTOOLS SORT

Sort vcf/bcf file. For more information see BCFtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule bcftools_sort:
    input:
        "{sample}.bcf"
    output:
        "{sample}.sorted.bcf"
    log:
        "logs/bcftools/sort/{sample}.log"
    params:
        tmp_dir = "`mktemp -d`",
        # Set to True, in case you want uncompressed BCF output
        uncompressed_bcf = False,
        # Extra arguments
        extras = ""
    resources:
        mem_mb = 8000
    wrapper:
        "v0.87.0/bio/bcftools/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools==1.11
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"


from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


max_mem = snakemake.resources.get("mem_mb", "")
if max_mem:
    max_mem = "--max-mem {}M".format(max_mem)
else:
    max_mem = snakemake.resources.get("mem_gb", "")
    if max_mem:
        max_mem = "--max-mem {}G".format(max_mem)
    else:
        max_mem = ""


tmp_dir = snakemake.params.get("tmp_dir", "")
if tmp_dir:
    tmp_dir = "--temp-dir {}".format(tmp_dir)
else:
    tmp_dir = ""


uncompressed_bcf = snakemake.params.get("uncompressed_bcf", False)


out_name, out_ext = path.splitext(snakemake.output[0])
if out_ext == ".vcf":
    out_format = "v"
elif out_ext == ".bcf":
    if uncompressed_bcf:
        out_format = "u"
    else:
        out_format = "b"
elif out_ext == ".gz":
    out_name, out_ext = path.splitext(out_name)
    if out_ext == ".vcf":
        out_format = "z"
    else:
        raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")
else:
    raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")


shell(
    "bcftools sort {max_mem} {tmp_dir} {extra} --output-type {out_format} --output-file {snakemake.output[0]} {snakemake.input[0]} {log}"
)
BCFTOOLS STATS

Generate VCF stats using bcftools stats.

URL: https://github.com/samtools/bcftools

Example

This wrapper can be used in the following way:

rule bcf_stats:
    input:
        "{prefix}"
    output:
        "{prefix}.stats.txt"
    log:
        "{prefix}.bcftools.stats.log"
    params:
        ""
    wrapper:
        "v0.87.0/bio/bcftools/stats"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools==1.12
Input/Output

Input:

  • BCF, VCF, or VCF.gz input

Output:

  • stats text file
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"


from snakemake.shell import shell

# bcftools takes additional decompression threads through --threads
# Other threads are *additional* threads passed to the '--threads' argument
threads = (
    "" if snakemake.threads <= 1 else " --threads {} ".format(snakemake.threads - 1)
)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "bcftools stats {threads} {snakemake.params} {snakemake.input} > {snakemake.output} {log}"
)
BCFTOOLS VIEW

View vcf/bcf file in a different format.

URL:

Example

This wrapper can be used in the following way:

rule bcf_view_sample_file:
    input:
        "{prefix}.bcf",  # input bcf/vcf needs to be first input
        index="{prefix}.bcf.csi",  # other inputs are optional
        samples="samples.txt",
    output:
        "{prefix}.view_sample.vcf",
    log:
        "log/{prefix}.view_sample.vcf.log",
    params:
        # optional extra parameters
        extra=lambda w, input: f"-S {input.samples}",
    wrapper:
        "v0.87.0/bio/bcftools/view"


rule bcf_view_o_vcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.view.vcf",
    log:
        "log/{prefix}.view.vcf.log",
    params:
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/view"


rule bcf_view_o_vcf_gz:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.view.vcf.gz",
    log:
        "log/{prefix}.view.vcf.gz.log",
    params:
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/view"


rule bcf_view_o_bcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.view.bcf",
    log:
        "log/{prefix}.view.bcf.log",
    params:
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/view"


rule bcf_view_o_uncompressed_bcf:
    input:
        "{prefix}.bcf",
    output:
        "{prefix}.view.uncompressed.bcf",
    log:
        "log/{prefix}.view.uncompressed.bcf.log",
    params:
        uncompressed_bcf=True,
        extra="",
    wrapper:
        "v0.87.0/bio/bcftools/view"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools==1.12
  • snakemake-wrapper-utils==0.2
Input/Output

Input:

  • VCF/BCF file

Output:

  • Filtered VCF/BCF file
Notes
  • The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
  • The bcftools_use_mem param controls whether to pass the resources.mem_mb to bcftools
  • The extra param allows for additional program arguments (not –threads, `-O/–output-type, -m/–max-mem, or -T/–temp-dir).
  • For more information see, https://samtools.github.io/bcftools/bcftools.html
Authors
  • Johannes Köster
  • Nikos Tsardakas Renhuldt
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts

bcftools_opts = get_bcftools_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "bcftools view {bcftools_opts} "
    "{extra} "
    "{snakemake.input[0]} "
    "-o {snakemake.output} "
    "{log}"
)

BEDTOOLS

For bedtools, the following wrappers are available:

BEDTOOLS COMPLEMENT

Bedtools complement maps all regions of the genome which are not covered by the input.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_complement_bed:
    input:
        in_file="a.bed",
        genome="dummy.genome"
    output:
        "results/bed-complement/a.complement.bed"
    params:
        ## Add optional parameters
        extra="-L"
    log:
        "logs/a.complement.bed.log"
    wrapper:
        "v0.87.0/bio/bedtools/complement"

rule bedtools_complement_vcf:
    input:
        in_file="a.vcf",
        genome="dummy.genome"
    output:
        "results/vcf-complement/a.complement.vcf"
    params:
        ## Add optional parameters
        extra="-L"
    log:
        "logs/a.complement.vcf.log"
    wrapper:
        "v0.87.0/bio/bedtools/complement"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools=2.29
Input/Output

Input:

Output:

  • complemented BED/GFF/VCF file
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools complement"
    " {extra}"
    " -i {snakemake.input.in_file}"
    " -g {snakemake.input.genome}"
    " > {snakemake.output[0]})"
    " {log}"
)
COVERAGEBED

Returns the depth and breadth of coverage of features from B on the intervals in A.

URL:

Example

This wrapper can be used in the following way:

rule coverageBed:
    input:
        a="bed/{sample}.bed",
        b="mapped/{sample}.bam"
    output:
        "stats/{sample}.cov"
    log:
        "logs/coveragebed/{sample}.log"
    params:
        extra=""  # optional parameters
    threads: 8
    wrapper:
        "v0.87.0/bio/bedtools/coveragebed"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools==2.29.0
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

input_a = snakemake.input.a
input_b = snakemake.input.b

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one file: " + str(output_file) + "!")

shell(
    "coverageBed"
    " -a {input_a}"
    " -b {input_b}"
    " {extra_params}"
    " > {output_file}"
    " {log}"
)
BEDTOOLS GENOMECOVERAGEBED

bedtools’s genomeCoverageBed computes the coverage of a feature file as histograms, per-base reports or BEDGRAPH summaries among a given genome. For usage information about genomeCoverageBed, please see bedtools’s documentation. For more information about bedtools, also see the source code.

URL:

Example

This wrapper can be used in the following way:

rule genomecov_bam:
    input:
        "bam_input/{sample}.sorted.bam"
    output:
        "genomecov_bam/{sample}.genomecov"
    log:
        "logs/genomecov_bam/{sample}.log"
    params:
        "-bg"  # optional parameters
    wrapper:
        "v0.87.0/bio/bedtools/genomecov"

rule genomecov_bed:
    input:
        # for genome file format please see:
        # https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format
        bed="bed_input/{sample}.sorted.bed",
        ref="bed_input/genome_file"
    output:
        "genomecov_bed/{sample}.genomecov"
    log:
        "logs/genomecov_bed/{sample}.log"
    params:
        "-bg"  # optional parameters
    wrapper:
        "v0.87.0/bio/bedtools/genomecov"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools==2.29.2
Input/Output

Input:

  • BED/GFF/VCF files grouped by chromosome and genome file (genome file format) OR
  • BAM files sorted by position.

Output:

  • genomecov (.genomecov)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

genome = ""
input_file = ""

if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
    input_file = "-ibam " + snakemake.input[0]

if len(snakemake.input) > 1:
    if (os.path.splitext(snakemake.input[0])[-1]) == ".bed":
        input_file = "-i " + snakemake.input.get("bed")
        genome = "-g " + snakemake.input.get("ref")

shell(
    "(genomeCoverageBed"
    " {snakemake.params}"
    " {input_file}"
    " {genome}"
    " > {snakemake.output[0]}) {log}"
)
BEDTOOLS INTERSECT

Intersect BED/BAM/VCF files with bedtools.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        left="A.bed",
        right="B.bed"
    output:
        "A_B.intersected.bed"
    params:
        ## Add optional parameters
        extra="-wa -wb" ## In this example, we want to write original entries in A and B for each overlap.
    log:
        "logs/intersect/A_B.log"
    wrapper:
        "v0.87.0/bio/bedtools/intersect"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools=2.29.0
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools intersect"
    " {extra}"
    " -a {snakemake.input.left}"
    " -b {snakemake.input.right}"
    " > {snakemake.output})"
    " {log}"
)
BEDTOOLS MERGE

Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        # Multiple bed-files can be added as list
        "A.bed"
    output:
        "A.merged.bed"
    params:
        ## Add optional parameters
        extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
    log:
        "logs/merge/A.log"
    wrapper:
        "v0.87.0/bio/bedtools/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools=2.29.0
Authors
  • Jan Forster
Code
__author__ = "Jan Forster, Felix Mölder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de, felix.moelder@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if len(snakemake.input) > 1:
    if all(f.endswith(".gz") for f in snakemake.input):
        cat = "zcat"
    elif all(not f.endswith(".gz") for f in snakemake.input):
        cat = "cat"
    else:
        raise ValueError("Input files must be all compressed or uncompressed.")
    shell(
        "({cat} {snakemake.input} | "
        "sort -k1,1 -k2,2n | "
        "bedtools merge {extra} "
        "-i stdin > {snakemake.output}) "
        " {log}"
    )
else:
    shell(
        "( bedtools merge"
        " {extra}"
        " -i {snakemake.input}"
        " > {snakemake.output})"
        " {log}"
    )
BEDTOOLS SLOP

Increase the size of each feature in a BED/BAM/VCF by a specified factor.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        "A.bed"
    output:
        "A.slop.bed"
    params:
        ## Genome file, tab-seperated file defining the length of every contig
        genome="genome.txt",
        ## Add optional parameters
        extra = "-b 10" ## in this example, we want to increase the feature by 10 bases to both sides
    log:
        "logs/slop/A.log"
    wrapper:
        "v0.87.0/bio/bedtools/slop"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools=2.29.0
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools slop"
    " {extra}"
    " -i {snakemake.input[0]}"
    " -g {snakemake.params.genome}"
    " > {snakemake.output})"
    " {log}"
)
BEDTOOLS SORT

Sorts bed, vcf or gff files by chromosome and other criteria, for more information please see bedtools sort documentation.

URL:

Example

This wrapper can be used in the following way:

rule bedtools_sort:
    input:
        in_file="a.bed"
    output:
        "results/bed-sorted/a.sorted.bed"
    params:
        ## Add optional parameters for sorting order
        extra="-sizeA"
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v0.87.0/bio/bedtools/sort"

rule bedtools_sort_bed:
    input:
        in_file="a.bed",
        # an optional sort file can be set as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        genome="dummy.genome"
    output:
        "results/bed-sorted/a.sorted_by_file.bed"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.bed.log"
    wrapper:
        "v0.87.0/bio/bedtools/sort"

rule bedtools_sort_vcf:
    input:
        in_file="a.vcf",
        # an optional sort file can be set either as genomefile by the variable genome or
        # as fasta index file by the variable faidx
        faidx="genome.fasta.fai"
    output:
        "results/vcf-sorted/a.sorted_by_file.vcf"
    params:
        ## Add optional parameters
        extra=""
    log:
        "logs/a.sorted.vcf.log"
    wrapper:
        "v0.87.0/bio/bedtools/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bedtools=2.29
Input/Output

Input:

  • BED/GFF/VCF files
  • optional a tab separating file that determines the sorting order and contains the chromosome names in the first column
  • optional a fasta index file

Output:

  • complemented BED/GFF/VCF file
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

if genome:
    extra += " -g {}".format(genome)
elif faidx:
    extra += " -faidx {}".format(faidx)

shell(
    "(bedtools sort"
    " {extra}"
    " -i {snakemake.input.in_file}"
    " > {snakemake.output[0]})"
    " {log}"
)

BENCHMARK

For benchmark, the following wrappers are available:

CHM-EVAL

Evaluate given VCF file with chm-eval (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

URL:

Example

This wrapper can be used in the following way:

rule chm_eval:
    input:
        kit="resources/chm-eval-kit",
        vcf="{sample}.vcf"
    output:
        summary="chm-eval/{sample}.summary", # summary statistics
        bed="chm-eval/{sample}.err.bed.gz" # bed file with errors
    params:
        extra="",
        build="38"
    log:
        "logs/chm-eval/{sample}.log"
    wrapper:
        "v0.87.0/bio/benchmark/chm-eval"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • perl=5.26
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

kit = snakemake.input.kit
vcf = snakemake.input.vcf
build = snakemake.params.build
extra = snakemake.params.get("extra", "")

if not snakemake.output[0].endswith(".summary"):
    raise ValueError("Output file must end with .summary")
out = snakemake.output[0][:-8]

shell("({kit}/run-eval -g {build} -o {out} {extra} {vcf} | sh) {log}")
CHM-EVAL-KIT

Download CHM-eval kit (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

URL:

Example

This wrapper can be used in the following way:

rule chm_eval_kit:
    output:
        directory("resources/chm-eval-kit")
    params:
        # Tag and version must match, see https://github.com/lh3/CHM-eval/releases.
        tag="v0.5",
        version="20180222"
    log:
        "logs/chm-eval-kit.log"
    cache: True
    wrapper:
        "v0.87.0/bio/benchmark/chm-eval-kit"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • curl
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = (
    "https://github.com/lh3/CHM-eval/releases/"
    "download/{tag}/CHM-evalkit-{version}.tar"
).format(version=snakemake.params.version, tag=snakemake.params.tag)

os.makedirs(snakemake.output[0])
shell(
    "(curl -L {url} | tar --strip-components 1 -C {snakemake.output[0]} -xf - &&"
    "(cd {snakemake.output[0]}; chmod +x htsbox run-eval k8)) {log}"
)
CHM-EVAL-SAMPLE

Download CHM-eval sample (https://github.com/lh3/CHM-eval) for benchmarking variant calling.

URL:

Example

This wrapper can be used in the following way:

rule chm_eval_sample:
    output:
        bam="resources/chm-eval-sample.bam",
        bai="resources/chm-eval-sample.bam.bai"
    params:
        # Optionally only grab the first 100 records.
        # This is for testing, remove next line to grab all records.
        first_n=100
    log:
        "logs/chm-eval-sample.log"
    wrapper:
        "v0.87.0/bio/benchmark/chm-eval-sample"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools=1.10
  • curl
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

url = "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR134/ERR1341796/CHM1_CHM13_2.bam"

pipefail = ""
fmt = "-b"
prefix = snakemake.params.get("first_n", "")
if prefix:
    prefix = "| head -n {} | samtools view -h -b".format(prefix)
    fmt = "-h"
    pipefail = "set +o pipefail"

    shell(
        """
        {pipefail}
        {{
            samtools view {fmt} {url} {prefix} > {snakemake.output.bam}
            samtools index {snakemake.output.bam}
        }} {log}
        """
    )
else:
    shell(
        """
        {{
            curl -L {url} > {snakemake.output.bam}
            samtools index {snakemake.output.bam}
        }} {log}
        """
    )

BGZIP

Block compression/decompression utility

URL: https://github.com/samtools/htslib

Example

This wrapper can be used in the following way:

rule bgzip:
    input:
        "{prefix}.vcf",
    output:
        "{prefix}.vcf.gz",
    params:
        extra="", # optional
    threads: 1
    log:
        "logs/bgzip/{prefix}.log",
    wrapper:
        "v0.87.0/bio/bgzip"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • htslib==1.12
Input/Output

Input:

  • file to be compressed or decompressed

Output:

  • compressed or decompressed output
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    """
    (bgzip -c {extra} --threads {snakemake.threads} \
        {snakemake.input} > {snakemake.output}) {log}
    """
)

BIOBAMBAM2

For biobambam2, the following wrappers are available:

BIOBAMBAM2 BAMSORMADUP

Mark PCR and optical duplicates, followed with sorting, with BioBamBam2 tools

URL:

Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        index="dedup/{sample}.bai",
        metrics="dedup/{sample}.metrics.txt",
    log:
        "logs/{sample}.log"
    params:
        extra="SO=coordinate"
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/biobambam2/bamsormadup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • biobambam=2.0
Input/Output

Input:

  • SAM/BAM/CRAM file
  • reference (for CRAM output)

Output:

  • SAM/BAM/CRAM file with marked duplicates
  • BAM index file (optional)
  • metrics file (optional)
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


import os
from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
extra = snakemake.params.get("extra", "")


# File formats
in_name, in_format = os.path.splitext(snakemake.input[0])
in_format = in_format.lstrip(".")
out_name, out_format = os.path.splitext(snakemake.output[0])
out_format = out_format.lstrip(".")


index = snakemake.output.get("index", "")
if index:
    index = f"indexfilename={index}"


metrics = snakemake.output.get("metrics", "")
if metrics:
    metrics = f"M={metrics}"


shell(
    "bamsormadup threads={snakemake.threads} inputformat={in_format} outputformat={out_format} {index} {metrics} {extra} < {snakemake.input[0]} > {snakemake.output[0]} {log}"
)

BISMARK

For bismark, the following wrappers are available:

BAM2NUC

Calculate mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence composition (see https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc).

URL:

Example

This wrapper can be used in the following way:

# Nucleotide stats for genome is required for further stats for BAM file
rule bam2nuc_for_genome:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz"
    output:
        "indexes/{genome}/genomic_nucleotide_frequencies.txt"
    log:
        "logs/indexes/{genome}/genomic_nucleotide_frequencies.txt.log"
    wrapper:
        "v0.87.0/bio/bismark/bam2nuc"

# Nucleotide stats for BAM file
rule bam2nuc_for_bam:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz",
        bam="bams/{sample}_{genome}.bam"
    output:
        report="bams/{sample}_{genome}.nucleotide_stats.txt"
    log:
        "logs/{sample}_{genome}.nucleotide_stats.txt.log"
    wrapper:
        "v0.87.0/bio/bismark/bam2nuc"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • genome_fa: Path to genome in FastA format (e.g. *.fa, *.fasta, *.fa.gz, *.fasta.gz). All genomes FastA from it’s parent folder will be taken
  • bam: Optional BAM or CRAM file (or multiple space separated files). If bam arg isn’t provided, option –genomic_composition_only will be used to generate genomic composition table genomic_nucleotide_frequencies.txt.

Output:

  • Genome nucleotide frequencies genomic_nucleotide_frequencies.txt will be generated in ‘genome_fa’ directory, optional output.
  • report: Report file (or space separated files), pattern ‘{bam_file_name}.nucleotide_stats.txt’.
Params
  • extra: Any additional args
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for bam2nuc tool that calculates mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence
composition."""
# https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmdline_args = ["bam2nuc {extra}"]

genome_fa = snakemake.input.get("genome_fa", None)
if not genome_fa:
    raise ValueError("bismark/bam2nuc: Error 'genome_fa' input not specified.")
genome_folder = os.path.dirname(genome_fa)
cmdline_args.append("--genome_folder {genome_folder:q}")


bam = snakemake.input.get("bam", None)
if bam:
    cmdline_args.append("{bam}")
    bams = bam if isinstance(bam, list) else [bam]

    report = snakemake.output.get("report", None)
    if not report:
        raise ValueError("bismark/bam2nuc: Error 'report' output isn't specified.")

    reports = report if isinstance(report, list) else [report]
    if len(reports) != len(bams):
        raise ValueError(
            "bismark/bam2nuc: Error number of paths in output:report ({} files)"
            " should be same as in input:bam ({} files).".format(
                len(reports), len(bams)
            )
        )
    output_dir = os.path.dirname(reports[0])
    if any(output_dir != os.path.dirname(p) for p in reports):
        raise ValueError(
            "bismark/bam2nuc: Error all reports should be in same directory:"
            " {}".format(output_dir)
        )
    if output_dir:
        cmdline_args.append("--dir {output_dir:q}")
else:
    cmdline_args.append("--genomic_composition_only")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))


# Move outputs into proper position.
if bam:
    log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

    expected_2_actual_paths = []
    for bam_path, report_path in zip(bams, reports):
        bam_name = os.path.basename(bam_path)
        bam_basename = os.path.splitext(bam_name)[0]
        expected_2_actual_paths.append(
            (
                report_path,
                os.path.join(
                    output_dir, "{}.nucleotide_stats.txt".format(bam_basename)
                ),
            )
        )

    for (exp_path, actual_path) in expected_2_actual_paths:
        if exp_path and (exp_path != actual_path):
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK

Align BS-Seq reads using Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark).

URL:

Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark_pe:
    input:
        fq_1="reads/{sample}.1.fastq",
        fq_2="reads/{sample}.2.fastq",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}_pe.bam",
        report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        bam_unmapped_1="bams/{sample}_{genome}_unmapped_reads_1.fq.gz",
        bam_unmapped_2="bams/{sample}_{genome}_unmapped_reads_2.fq.gz",
        ambiguous_1="bams/{sample}_{genome}_ambiguous_reads_1.fq.gz",
        ambiguous_2="bams/{sample}_{genome}_ambiguous_reads_2.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log"
    params:
        # optional params string, e.g: -L32 -N0 -X400 --gzip
        # Useful options to tune:
        # (for bowtie2)
        # -N: The maximum number of mismatches permitted in the "seed", i.e. the first L base pairs
        # of the read (deafault: 1)
        # -L: The "seed length" (deafault: 28)
        # -I: The minimum insert size for valid paired-end alignments. ~ min fragment size filter (for
        # PE reads)
        # -X: The maximum insert size for valid paired-end alignments. ~ max fragment size filter (for
        # PE reads)
        # --gzip: Gzip intermediate fastq files
        # --ambiguous --unmapped
        # -p: bowtie2 parallel execution
        # --multicore: bismark parallel execution
        # --temp_dir: tmp dir for intermediate files instead of output directory
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "v0.87.0/bio/bismark/bismark"

# Example: Single-ended reads
rule bismark_se:
    input:
        fq="reads/{sample}.fq.gz",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}.bam",
        report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}.nucleotide_stats.txt",
        bam_unmapped="bams/{sample}_{genome}_unmapped_reads.fq.gz",
        ambiguous="bams/{sample}_{genome}_ambiguous_reads.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log",
    params:
        # optional params string
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "v0.87.0/bio/bismark/bismark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • In SE mode one reads file with keay ‘fq=…’
  • In PE mode two reads files with keys ‘fq_1=…’, ‘fq_2=…’
  • bismark_indexes_dir: The path to the folder Bisulfite_Genome created by the Bismark_Genome_Preparation script, e.g. ‘indexes/hg19/Bisulfite_Genome’

Output:

  • bam: Bam file. Output file will be renamed if differs from default NAME_pe.bam or NAME_se.bam
  • report: Aligning report file. Output file will be renamed if differs from default NAME_PE_report.txt or NAME_SE_report.txt
  • nucleotide_stats: Optional nucleotides report file. Output file will be renamed if differs from default NAME_pe.nucleotide_stats.txt or NAME_se.nucleotide_stats.txt
Params
  • basename: File base name
  • extra: Any additional args
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for aligning methylation BS-Seq data using Bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from tempfile import TemporaryDirectory


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = os.path.basename(file_path)

    split_ind = 2 if base.endswith(".gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark {extra} --bowtie2"]

outdir = os.path.dirname(snakemake.output.bam)
if outdir:
    cmdline_args.append("--output_dir {outdir}")

genome_indexes_dir = os.path.dirname(snakemake.input.bismark_indexes_dir)
cmdline_args.append("{genome_indexes_dir}")

if not snakemake.output.get("bam", None):
    raise ValueError("bismark/bismark: Error 'bam' output file isn't specified.")
if not snakemake.output.get("report", None):
    raise ValueError("bismark/bismark: Error 'report' output file isn't specified.")

# basename
if snakemake.params.get("basename", None):
    cmdline_args.append("--basename {snakemake.params.basename:q}")
    basename = snakemake.params.basename
else:
    basename = None

# reads input
single_end_mode = snakemake.input.get("fq", None)
if single_end_mode:
    # for SE data, you only have to specify read1 input by -i or --in1, and
    # specify read1 output by -o or --out1.
    cmdline_args.append("--se {snakemake.input.fq:q}")
    mode_prefix = "se"
    if basename is None:
        basename = basename_without_ext(snakemake.input.fq)
else:
    # for PE data, you should also specify read2 input by -I or --in2, and
    # specify read2 output by -O or --out2.
    cmdline_args.append("-1 {snakemake.input.fq_1:q} -2 {snakemake.input.fq_2:q}")
    mode_prefix = "pe"

    if basename is None:
        # default basename
        basename = basename_without_ext(snakemake.input.fq_1) + "_bismark_bt2"

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

# Move outputs into proper position.
expected_2_actual_paths = [
    (
        snakemake.output.bam,
        os.path.join(
            outdir, "{}{}.bam".format(basename, "" if single_end_mode else "_pe")
        ),
    ),
    (
        snakemake.output.report,
        os.path.join(
            outdir,
            "{}_{}_report.txt".format(basename, "SE" if single_end_mode else "PE"),
        ),
    ),
    (
        snakemake.output.get("nucleotide_stats", None),
        os.path.join(
            outdir,
            "{}{}.nucleotide_stats.txt".format(
                basename, "" if single_end_mode else "_pe"
            ),
        ),
    ),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK2BEDGRAPH

Generate bedGraph and coverage files from positional methylation files created by bismark_methylation_extractor (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph).

URL:

Example

This wrapper can be used in the following way:

# Example for CHG+CHH summary coverage:
rule bismark2bedGraph_noncpg:
    input:
        "meth/CHG_context_{sample}.txt.gz",
        "meth/CHH_context_{sample}.txt.gz"
    output:
        bedGraph="meth_non_cpg/{sample}_non_cpg.bedGraph.gz",
        cov="meth_non_cpg/{sample}_non_cpg.bismark.cov.gz"
    log:
        "logs/meth_non_cpg/{sample}_non_cpg.log"
    params:
        extra="--CX"
    wrapper:
        "v0.87.0/bio/bismark/bismark2bedGraph"

# Example for CpG only coverage
rule bismark2bedGraph_cpg:
    input:
        "meth/CpG_context_{sample}.txt.gz"
    output:
        bedGraph="meth_cpg/{sample}_CpG.bedGraph.gz",
        cov="meth_cpg/{sample}_CpG.bismark.cov.gz"
    log:
        "logs/meth_cpg/{sample}_CpG.log"
    wrapper:
        "v0.87.0/bio/bismark/bismark2bedGraph"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • Files generated by bismark_methylation_extractor, e.g. CpG_context*.txt.gz, CHG_context*.txt.gz, CHH_context*.txt.gz. By default only CpG file is required, if ‘–CX’ option is output is build by merged input files.

Output:

  • bedGraph: Bismark methylation level track, *.bedGraph.gz (0-based start, 1-based end coordintates, i.e. end offset exclusive)
  • cov: Optional bismark coverage file *.bismark.cov.gz, file name is calculated by bedGraph name (1-based start and end, i.e. end offset inclusive)
Params
  • extra: Any additional args, e.g. ‘–CX’, ‘–ample_memory’, ‘ –buffer_size 10G’, etc.
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark bismark2bedGraph tool."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

bedGraph = snakemake.output.get("bedGraph", "")
if not bedGraph:
    raise ValueError("bismark/bismark2bedGraph: Please specify bedGraph output path")

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark2bedGraph {params_extra}"]

dir_name = os.path.dirname(bedGraph)
if dir_name:
    cmdline_args.append("--dir {dir_name}")

fname = os.path.basename(bedGraph)
cmdline_args.append("--output {fname}")

cmdline_args.append("{snakemake.input}")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))
BISMARK2REPORT

Generate graphical HTML report from Bismark reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2report).

URL:

Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark2report_pe:
    input:
        alignment_report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_report="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}_pe.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}_pe.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}_pe.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "v0.87.0/bio/bismark/bismark2report"

# Example: Single-ended reads
rule bismark2report_se:
    input:
        alignment_report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_report="bams/{sample}_{genome}.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "v0.87.0/bio/bismark/bismark2report"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • alignment_report: Alignment report (if not specified bismark will try to find it current directory)
  • nucleotide_report: Optional Bismark nucleotide coverage report (if not specified bismark will try to find it current directory)
  • dedup_report: Optional deduplication report (if not specified bismark will try to find it current directory)
  • splitting_report: Optional Bismark methylation extractor report (if not specified bismark will try to find it current directory)
  • mbias_report: Optional Bismark methylation extractor report (if not specified bismark will try to find it current directory)

Output:

  • html: Output HTML file path, if batch mode isn’t used.
  • html_dir: Output dir path for HTML reports if batch mode is used
Params
  • skip_optional_reports: Use ‘true’ of ‘false’ to not look for optional reports not mentioned in input section (passes ‘none’ to bismark2report)
  • extra: Any additional args
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate graphical HTML report from Bismark reports."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2report

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell


def answer2bool(v):
    return str(v).lower() in ("yes", "true", "t", "1")


extra = snakemake.params.get("extra", "")
cmds = ["bismark2report {extra}"]

# output
html_file = snakemake.output.get("html", "")
output_dir = snakemake.output.get("html_dir", None)
if output_dir is None:
    if html_file:
        output_dir = os.path.dirname(html_file)
else:
    if html_file:
        raise ValueError(
            "bismark/bismark2report: Choose one: 'html=...' for a single dir or 'html_dir=...' for batch processing."
        )

if output_dir is None:
    raise ValueError(
        "bismark/bismark2report: Output file or directory not specified. "
        "Use 'html=...' for a single dir or 'html_dir=...' for batch "
        "processing."
    )

if output_dir:
    cmds.append("--dir {output_dir:q}")

if html_file:
    html_file_name = os.path.basename(html_file)
    cmds.append("--output {html_file_name:q}")

# reports
reports = [
    "alignment_report",
    "dedup_report",
    "splitting_report",
    "mbias_report",
    "nucleotide_report",
]
skip_optional_reports = answer2bool(
    snakemake.params.get("skip_optional_reports", False)
)
for report_name in reports:
    path = snakemake.input.get(report_name, "")
    if path:
        locals()[report_name] = path
        cmds.append("--{0} {{{1}:q}}".format(report_name, report_name))
    elif skip_optional_reports:
        cmds.append("--{0} 'none'".format(report_name))

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK2SUMMARY

Generate summary graphical HTML report from several Bismark text report files reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary).

URL:

Example

This wrapper can be used in the following way:

import  os

rule bismark2summary:
    input:
        bam=["bams/a_genome_pe.bam", "bams/b_genome.bam"],

        # Bismark `bismark2summary` discovers reports automatically based
        # on files available in bam file containing folder
        #
        # If your per BAM file reports aren't in the same folder
        # you will need an additional task which symlinks all reports
        # (E.g. your splitting report generated by `bismark_methylation_extractor`
        # tool is in `meth` folder, and alignment related reports in `bams` folder)

        # These dependencies are here just to ensure that corresponding rules
        # has already finished at rule execution time, otherwise some reports
        # will be missing.
        dependencies=[
            "bams/a_genome_PE_report.txt",
            "bams/a_genome_pe.deduplication_report.txt",
            # for example splitting report is missing for 'a' sample

            "bams/b_genome_SE_report.txt",
            "bams/b_genome.deduplication_report.txt",
            "bams/b_genome.deduplicated_splitting_report.txt"
        ]
    output:
        html="qc/{experiment}.bismark2summary.html",
        txt="qc/{experiment}.bismark2summary.txt"
    log:
        "logs/qc/{experiment}.bismark2summary.log"
    wrapper:
        "v0.87.0/bio/bismark/bismark2summary"

rule bismark2summary_prepare_symlinks:
    input:
        "meth/b_genome.deduplicated_splitting_report.txt",
    output:
        temp("bams/b_genome.deduplicated_splitting_report.txt"),
    log:
        "qc/bismark2summary_prepare_symlinks.symlinks.log"
    run:
        wd = os.getcwd()
        shell("echo 'Making symlinks' > {log}")
        for source, target in zip(input, output):
           target_dir = os.path.dirname(target)
           target_name = os.path.basename(target)
           log_path = os.path.join(wd, log[0])
           abs_src_path = os.path.abspath(source)
           shell("cd {target_dir} && ln -f -s {abs_src_path} {target_name} >> {log_path} 2>&1")

        shell("echo 'Done' >> {log}")

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • bam: One or several (space separated) BAM file paths (aligned bam files with bismark reports in same folder). Also, it is recommended to add dependencies for all required reports using rules order or specifing them in input section using any other keys. E.g. deduplicaton report could be missing if rule only depends on aligned bam file. If you add dependency on deduplicated bam file bismark2report will fail because it expects input files to be initial aligned files with aligning report in same directory.

Output:

  • html: Output HTML report path (e.g. ‘bismark_summary_report.html’).
  • txt: Output txt table path (e.g. ‘bismark_summary_report.txt’). Should have same as ‘html’ report but with suffix ‘.txt’.
Params
  • extra: Any additional args
  • title: Optional report custom title.
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate summary graphical HTML report from several Bismark text report files."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmds = ["bismark2summary {extra}"]

# basename
bam = snakemake.input.get("bam", None)
if not bam:
    raise ValueError(
        "bismark/bismark2summary: Please specify aligned BAM file path"
        " (one or several) using 'bam=..'"
    )

html = snakemake.output.get("html", None)
txt = snakemake.output.get("txt", None)
if not html or not txt:
    raise ValueError(
        "bismark/bismark2summary: Please specify both 'html=..' and"
        " 'txt=..' paths in output section"
    )

basename, ext = os.path.splitext(html)
if ext.lower() != ".html":
    raise ValueError(
        "bismark/bismark2summary: HTML report file should end"
        " with suffix '.html' but was {} ({})".format(ext, html)
    )

suggested_txt = basename + ".txt"
if suggested_txt != txt:
    raise ValueError(
        "bismark/bismark2summary: Expected '{}' TXT report, "
        "but was: '{}'".format(suggested_txt, txt)
    )

cmds.append("--basename {basename:q}")

# title
title = snakemake.params.get("title", None)
if title:
    cmds.append("--title {title:q}")

cmds.append("{bam}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK_GENOME_PREPARATION

Generate indexes for Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation).

URL:

Example

This wrapper can be used in the following way:

# For *.fa file
rule bismark_genome_preparation_fa:
    input:
        "indexes/{genome}/{genome}.fa"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "v0.87.0/bio/bismark/bismark_genome_preparation"

# Fo *.fa.gz file:
rule bismark_genome_preparation_fa_gz:
    input:
        "indexes/{genome}/{genome}.fa.gz"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "v0.87.0/bio/bismark/bismark_genome_preparation"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • path to genome *.fa (or *.fasta, *.fa.gz, *.fasta.gz) file

Output:

  • No ouptut, generates bismark indexes in parent directory of input file
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark indexes preparing using bismark_genome_preparation."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


from os import path
from snakemake.shell import shell

input_dir = path.dirname(snakemake.input[0])

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("bismark_genome_preparation --verbose --bowtie2 {params_extra} {input_dir} {log}")
BISMARK_METHYLATION_EXTRACTOR

Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).

URL:

Example

This wrapper can be used in the following way:

rule bismark_methylation_extractor:
    input: "bams/{sample}.bam"
    output:
        mbias_r1="qc/meth/{sample}.M-bias_R1.png",
        # Only for PE BAMS:
        # mbias_r2="qc/meth/{sample}.M-bias_R2.png",

        mbias_report="meth/{sample}.M-bias.txt",
        splitting_report="meth/{sample}_splitting_report.txt",

        # 1-based start, 1-based end ('inclusive') methylation info: % and counts
        methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
        # BedGraph with methylation percentage: 0-based start, end exclusive
        methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",

        # Primary output files: methylation status at each read cytosine position: (extremely large)
        read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
        # * You could merge CHG, CHH using: --merge_non_CpG
        read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
        read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
    log:
        "logs/meth/{sample}.log"
    params:
        output_dir="meth",  # optional output dir
        extra="--gzip --comprehensive --bedGraph"  # optional params string
    wrapper:
        "v0.87.0/bio/bismark/bismark_methylation_extractor"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
  • perl-gdgraph==1.54
Input/Output

Input:

  • Input BAM file aligned by Bismark

Output:

  • Depends on bismark options passed to params.extra, optional for this wrapper
  • mbias_report: M-bias report, *.M-bias.txt (if key is provided, the out file will be renamed to this name)
  • mbias_r1: M-Bias plot for R1, *.M-bias_R1.png (if key is provided, the out file will be renamed to this name)
  • mbias_r2: M-Bias plot for R2, *.M-bias_R2.png (if key is provided, the out file will be renamed to this name)
  • splitting_report: Splitting report, *_splitting_report.txt (if key is provided, the out file will be renamed to this name)
  • methylome_CpG_cov: Bismark coverage file for CpG context, *.bismark.cov.gz (if key is provided, the out file will be renamed to this name)
  • methylome_CpG_mlevel_bedGraph: Bismark methylation level track, *.bedGraph.gz
  • read_base_meth_state_cpg: Per read CpG base methylation info, CpG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
  • read_base_meth_state_chg: Per read CpG base methylation info, CHG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
  • read_base_meth_state_chh: Per read CpG base methylation info, CHH_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
Params
  • output_dir: Output directory (current dir is used if not specified)
  • ignore: Number of bases to trim at 5’ end in R1 (see bismark_methylation_extractor documentation), optional argument
  • ignore_3prime: Number of bases to trim at 3’ end in R1 (see bismark_methylation_extractor documentation), optional argument
  • ignore_r2: Number of bases to trim at 5’ end in R2 (see bismark_methylation_extractor documentation), optional argument
  • ignore_3prime_r2: Number of bases to trim at 3’ end in R2 (see bismark_methylation_extractor documentation), optional argument
  • extra: Any additional args
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]

# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
    cmdline_args.append("-o {output_dir:q}")

# trimming options
trimming_options = [
    "ignore",  # meth_bias_r1_5end
    "ignore_3prime",  # meth_bias_r1_3end
    "ignore_r2",  # meth_bias_r2_5end
    "ignore_3prime_r2",  # meth_bias_r2_3end
]
for key in trimming_options:
    value = snakemake.params.get(key, None)
    if value:
        cmdline_args.append("--{} {}".format(key, value))

# Input
cmdline_args.append("{snakemake.input}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

key2prefix_suffix = [
    ("mbias_report", ("", ".M-bias.txt")),
    ("mbias_r1", ("", ".M-bias_R1.png")),
    ("mbias_r2", ("", ".M-bias_R2.png")),
    ("splitting_report", ("", "_splitting_report.txt")),
    ("methylome_CpG_cov", ("", ".bismark.cov.gz")),
    ("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
    ("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
    ("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
    ("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]

log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (key, (prefix, suffix)) in key2prefix_suffix:
    exp_path = snakemake.output.get(key, None)
    if exp_path:
        if len(snakemake.input) != 1:
            raise ValueError(
                "bismark/bismark_methylation_extractor: Error: only one BAM file is"
                " expected in input, but was <{}>".format(snakemake.input)
            )
        bam_file = snakemake.input[0]
        bam_name = os.path.basename(bam_file)
        bam_wo_ext = os.path.splitext(bam_name)[0]

        actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
        if exp_path != actual_path:
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
DEDUPLICATE_BISMARK

Deduplicate Bismark Bam Files and saves as *.bam file (see https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark).

URL:

Example

This wrapper can be used in the following way:

rule deduplicate_bismark:
    input: "bams/a_genome_pe.bam"
    output:
        bam="bams/{sample}.deduplicated.bam",
        report="bams/{sample}.deduplication_report.txt",
    log:
        "logs/bams/{sample}.deduplicated.log",
    params:
        extra=""  # optional params string
    wrapper:
        "v0.87.0/bio/bismark/deduplicate_bismark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
Input/Output

Input:

  • path to one or multiple *.bam files aligned by Bismark, if multiple passed ‘–multiple’ argument will be added automatically.

Output:

  • bam: Result bam file path. The file will be renamed if differs from NAME.deduplicated.bam for given ‘NAME.bam’ input.
  • report: Result report path. The file will be renamed if differs from NAME.deduplication_report.txt for given ‘NAME.bam’ input.
Params
  • extra: Additional deduplicate_bismark args
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark aligned reads deduplication using deduplicate_bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

bam_path = snakemake.output.get("bam", None)
report_path = snakemake.output.get("report", None)
if not bam_path or not report_path:
    raise ValueError(
        "bismark/deduplicate_bismark: Please specify both 'bam=..' and 'report=..' paths in output section"
    )

output_dir = os.path.dirname(bam_path)
if output_dir != os.path.dirname(report_path):
    raise ValueError(
        "bismark/deduplicate_bismark: BAM and Report files expected to have the same parent directory"
        " but was {} and {}".format(bam_path, report_path)
    )

arg_output_dir = "--output_dir '{}'".format(output_dir) if output_dir else ""
arg_multiple = "--multiple" if len(snakemake.input) > 1 else ""

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
    "deduplicate_bismark {params_extra} --bam {arg_multiple}"
    " {arg_output_dir} {snakemake.input} {log}"
)

# Move outputs into proper position.
fst_input_filename = os.path.basename(snakemake.input[0])
fst_input_basename = os.path.splitext(fst_input_filename)[0]
prefix = os.path.join(output_dir, fst_input_basename)

deduplicated_bam_actual_name = prefix + ".deduplicated.bam"
if arg_multiple:
    # bismark does it exactly like this:
    deduplicated_bam_actual_name = deduplicated_bam_actual_name.replace(
        "deduplicated", "multiple.deduplicated", 1
    )

expected_2_actual_paths = [
    (bam_path, deduplicated_bam_actual_name),
    (
        report_path,
        prefix + (".multiple" if arg_multiple else "") + ".deduplication_report.txt",
    ),
]
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")

BLAST

For blast, the following wrappers are available:

BLAST BLASTN

Blastn performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.

Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.

Alignment view options

Formatting

output option

Format

specifiers

Pairwise 0  
Query-anchored showing identities 1  
Query-anchored no identities 2  
Flat query-anchored showing identities 3  
Flat query-anchored no identities 4  
BLAST XML 5  
Tabular 6 available
Tabular with comment lines 7 available
Seqalign (Text ASN.1) 8  
Seqalign (Binary ASN.1) 9  
Comma-separated values 10 available
BLAST archive (ASN.1) 11  
Seqalign (JSON) 12  
Multiple-file BLAST JSON 13  
Multiple-file BLAST XML2 14  
Single-file BLAST JSON 15  
Single-file BLAST XML2 16  
Sequence Alignment/Map (SAM) 17  
Organism Report 18  

Specifiers for formatting option 6,7 and 10:

Format

specifiers

 
qseqid Query Seq-id
qgi Query GI
qacc Query accesion
qaccver Query accesion.version
qlen Query sequence length
sseqid Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ‘;’
sgi Subject GI
sallgi All subject GIs
sacc Subject accession
saccver Subject accession.version
sallacc All subject accessions
slen Subject sequence length
qstart Start of alignment in query
qend End of alignment in query
sstart Start of alignment in subject
send End of alignment in subject
qseq Aligned part of query sequence
sseq Aligned part of subject sequence
evalue Expect value
bitscore Bit score
score Raw score
length Alignment length
pident Percentage of identical matches
nident Number of identical matches
mismatch Number of mismatches
positive Number of positive-scoring matches
gapopen Number of gap openings
gaps Total number of gaps
ppos Percentage of positive-scoring matches
frames Query and subject frames separated by a ‘/’
qframe Query frame
sframe Subject frame
btop Blast traceback operations (BTOP)
staxid Subject Taxonomy ID
ssciname Subject Scientific Name
scomname Subject Common Name
sblastname Subject Blast Name
sskingdom Subject Super Kingdom
staxids unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)
sscinames unique Subject Scientific Name(s), separated by a ‘;’
scomnames unique Subject Common Name(s), separated by a ‘;’
sblastnames unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)
sskingdoms unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)
stitle Subject Title
salltitles All Subject Title(s), separated by a ‘<>’
sstrand Subject Strand
qcovs Query Coverage Per Subject
qcovhsp Query Coverage Per HSP
qcovus Query Coverage Per Unique Subject (blastn only)

URL:

Example

This wrapper can be used in the following way:

rule blast_nucleotide:
    input:
        query = "{sample}.fasta",
        blastdb=multiext("blastdb/blastdb",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    output:
        "{sample}.blast.txt"
    log:
        "logs/{sample}.blast.log"
    threads:
        2
    params:
        # Usable options and specifiers for the different output formats are listed here:
        # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
        format="6 qseqid sseqid evalue",
        extra=""
    wrapper:
        "v0.87.0/bio/blast/blastn"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • blast==2.11
Input/Output

Input:

Output:

  • depending on the formatting option, different output files can be generated (see tables above)
Authors
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]

if format:
    out_format = " -outfmt '{}'".format(format)

shell(
    "blastn"
    " -query {snakemake.input.query}"
    " {out_format}"
    " {snakemake.params.extra}"
    " -db {db_name}"
    " -num_threads {snakemake.threads}"
    " -out {snakemake.output[0]}"
)
BLAST MAKEBLASTDB FOR FASTA FILES

Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.

URL:

Example

This wrapper can be used in the following way:

rule blast_makedatabase_nucleotide:
    input:
        fasta="genome/{genome}.fasta"
    output:
        multiext("results/{genome}.fasta",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    log:
        "logs/{genome}.log"
    params:
        "-input_type fasta -blastdb_version 5 -parse_seqids"
    wrapper:
        "v0.87.0/bio/blast/makeblastdb"

rule blast_makedatabase_protein:
    input:
        fasta="protein/{protein}.fasta"
    output:
        multiext("results/{protein}.fasta",
            ".pdb",
            ".phr",
            ".pin",
            ".pot",
            ".psq",
            ".ptf",
            ".pto"
        )
    log:
        "logs/{protein}.log"
    params:
        "-input_type fasta -blastdb_version 5"
    wrapper:
        "v0.87.0/bio/blast/makeblastdb"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • blast==2.11.0
Input/Output

Input:

  • FASTA file

Output:

  • multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)
Authors
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log
out = snakemake.output[0]

db_type = ""
(out_name, ext) = path.splitext(out)

if ext.startswith(".n"):
    db_type = "nucl"
elif ext.startswith(".p"):
    db_type = "prot"

shell(
    "makeblastdb"
    " -in {snakemake.input.fasta}"
    " -dbtype {db_type}"
    " {snakemake.params}"
    " -logfile {log}"
    " -out {out_name}"
)

BOWTIE2

For bowtie2, the following wrappers are available:

BOWTIE2

Map reads with bowtie2.

URL:

Example

This wrapper can be used in the following way:

rule bowtie2:
    input:
        sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bowtie2/{sample}.log"
    params:
        index="index/genome",  # prefix of reference genome index (built with bowtie2-build)
        extra=""  # optional parameters
    threads: 8  # Use at least two threads
    wrapper:
        "v0.87.0/bio/bowtie2/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.4
  • samtools==1.10
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "-U {}".format(*snakemake.input.sample)
else:
    reads = "-1 {} -2 {}".format(*snakemake.input.sample)

shell(
    "(bowtie2 --threads {snakemake.threads} {extra} "
    "-x {snakemake.params.index} {reads} "
    "| samtools view -Sbh -o {snakemake.output[0]} -) {log}"
)
BOWTIE2_BUILD

Map reads with bowtie2.

URL:

Example

This wrapper can be used in the following way:

rule bowtie2_build:
    input:
        reference="genome.fasta"
    output:
        multiext(
            "genome",
            ".1.bt2", ".2.bt2", ".3.bt2", ".4.bt2", ".rev.1.bt2", ".rev.2.bt2",
        ),
    log:
        "logs/bowtie2_build/build.log"
    params:
        extra=""  # optional parameters
    threads: 8
    wrapper:
        "v0.87.0/bio/bowtie2/build"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.4
  • samtools==1.10
Authors
  • Daniel Standage
Code
__author__ = "Daniel Standage"
__copyright__ = "Copyright 2020, Daniel Standage"
__email__ = "daniel.standage@nbacc.dhs.gov"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
indexbase = snakemake.output[0].replace(".1.bt2", "")
shell(
    "bowtie2-build --threads {snakemake.threads} {snakemake.params.extra} "
    "{snakemake.input.reference} {indexbase}"
)

BUSCO

Assess assembly and annotation completeness with BUSCO

URL:

Example

This wrapper can be used in the following way:

rule run_busco:
    input:
        "transcripts.fasta"
    output:
        directory("txome_busco")
    log:
        "logs/quality/transcriptome_busco.log"
    threads: 8
    params:
        mode="transcriptome",
        lineage="stramenopiles_odb10",
        downloads_path="resources/busco_downloads",
        # optional parameters
        extra=""
    wrapper:
        "v0.87.0/bio/busco"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • busco==5.1.2
Input/Output

Input:

  • assembly fasta

Output:

  • annotation quality files
Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for BUSCO assessment"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path
import tempfile

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage")
assert lineage is not None, "please input the path to a lineage for busco assessment"

stripped_output = snakemake.output[0].rstrip("/")
out = path.basename(stripped_output)
out_dirname = path.dirname(stripped_output)
out_path = " --out_path {} ".format(out_dirname) if out_dirname else ""

download_path_dir = snakemake.params.get("download_path", "")
download_path = (
    " --download_path {} ".format(download_path_dir) if download_path_dir else ""
)

# note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell(
    "busco --in {snakemake.input} --out {out} --force "
    "{out_path} "
    "--cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
    "{download_path} "
    "{extra} {log}"
)

BWA

For bwa, the following wrappers are available:

BWA ALN

Map reads with bwa aln. For more information about BWA see BWA documentation.

URL:

Example

This wrapper can be used in the following way:

rule bwa_aln:
    input:
        fastq="reads/{sample}.{pair}.fastq",
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        "sai/{sample}.{pair}.sai",
    params:
        extra="",
    log:
        "logs/bwa_aln/{sample}.{pair}.log",
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa/aln"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa aln."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

index = snakemake.input.idx
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

shell(
    "bwa aln"
    " {extra}"
    " -t {snakemake.threads}"
    " {index}"
    " {snakemake.input.fastq}"
    " > {snakemake.output[0]} {log}"
)
BWA INDEX

Creates a BWA index. For more information about BWA see BWA documentation.

URL:

Example

This wrapper can be used in the following way:

rule bwa_index:
    input:
        "{genome}.fasta",
    output:
        idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    log:
        "logs/bwa_index/{genome}.log",
    params:
        algorithm="bwtsw",
    wrapper:
        "v0.87.0/bio/bwa/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"

from os.path import splitext

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
    raise ValueError("Only one reference genome can be inputed!")

# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", splitext(snakemake.output.idx[0])[0])

if len(prefix) > 0:
    prefix = "-p " + prefix

# Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")

if len(construction_algorithm) != 0:
    construction_algorithm = "-a " + construction_algorithm

shell(
    "bwa index" " {prefix}" " {construction_algorithm}" " {snakemake.input[0]}" " {log}"
)
BWA MEM

Map reads using bwa mem, with optional sorting using samtools or picard.

URL:

Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        "mapped/{sample}.bam",
    log:
        "logs/bwa_mem/{sample}.log",
    params:
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sorting="none",  # Can be 'none', 'samtools' or 'picard'.
        sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
        sort_extra="",  # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa/mem"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
  • samtools=1.12
  • picard=2.25
Input/Output

Input:

  • FASTQ file(s)
  • reference genome

Output:

  • SAM/BAM/CRAM file
Notes
  • The extra param allows for additional arguments for bwa-mem.
  • The sorting param allows to enable sorting, and can be either ‘none’, ‘samtools’ or ‘picard’.
  • The sort_extra allows for extra arguments for samtools/picard
  • The tmp_dir param allows to define path to the temp dir.
  • For more inforamtion see, http://bio-bwa.sourceforge.net/bwa.shtml
Authors
  • Johannes Köster
  • Julian de Ruiter
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
import re
import tempfile
from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sorting", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

index = snakemake.input.idx
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]


if re.search(r"-T\b", sort_extra) or re.search(r"--TMP_DIR\b", sort_extra):
    sys.exit(
        "You have specified temp dir (`-T` or `--TMP_DIR`) in params.sort_extra; this is automatically set from params.tmp_dir."
    )

log = snakemake.log_fmt_shell(stdout=False, stderr=True)


# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort -T {tmp} {sort_extra} -o {snakemake.output[0]} -"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} --INPUT /dev/stdin"
        " --OUTPUT {snakemake.output[0]} --SORT_ORDER {sort_order} --TMP_DIR {tmp}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

with tempfile.TemporaryDirectory() as tmp:
    shell(
        "(bwa mem"
        " -t {snakemake.threads}"
        " {extra}"
        " {index}"
        " {snakemake.input.reads}"
        " | " + pipe_cmd + ") {log}"
    )
BWA MEM SAMBLASTER

Map reads using bwa mem, mark duplicates by samblaster and sort and index by sambamba.

URL:

Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        bam="mapped/{sample}.bam",
        index="mapped/{sample}.bam.bai",
    log:
        "logs/bwa_mem_sambamba/{sample}.log",
    params:
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort_extra="",  # Extra args for sambamba.
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa/mem-samblaster"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
  • sambamba==0.7.1
  • samblaster==0.1.24
Authors
  • Christopher Schröder
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")

index = snakemake.input.get("index", "")
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {index}"
    " {snakemake.input.reads}"
    " | samblaster"
    " {samblaster_extra}"
    " | sambamba view -S -f bam /dev/stdin"
    " -t {snakemake.threads}"
    " | sambamba sort /dev/stdin"
    " -t {snakemake.threads}"
    " -o {snakemake.output.bam}"
    " {sort_extra}"
    ") {log}"
)
BWA SAMPE

Map paired-end reads with bwa sampe. For more information about BWA see BWA documentation.

URL:

Example

This wrapper can be used in the following way:

rule bwa_sampe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        "mapped/{sample}.bam",
    params:
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",  # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",  # optional: Sort by 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_sampe/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/sampe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
  • samtools==1.9
  • picard==2.20.1
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell

index = snakemake.input.get("idx", "")
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

# Check inputs.
if not len(snakemake.input.sai) == 2:
    raise ValueError("input.sai must have 2 elements")

if not len(snakemake.input.fastq) == 2:
    raise ValueError("input.fastq must have 2 elements")

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa sampe"
    " {extra}"
    " {index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)
BWA SAMSE

Map single-end reads with bwa samse. For more information about BWA see BWA documentation.

URL:

Example

This wrapper can be used in the following way:

rule bwa_samse:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai",
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        "mapped/{sample}.bam",
    params:
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",  # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",  # optional: Sort by 'queryname' or 'coordinate'
        sort_extra="",  # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_samse/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/samse"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
  • samtools==1.9
  • picard==2.20.1
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell

index = snakemake.input.get("idx", "")
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa samse"
    " {extra}"
    " {index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)
BWA SAM(SE/PE)

Map paired-end reads with either bwa samse or sampe. For more information about BWA see BWA documentation.

URL:

Example

This wrapper can be used in the following way:

rule bwa_sam_pe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
    output:
        "mapped/{sample}.pe.sam",
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",
    log:
        "logs/bwa_sam_pe/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/samxe"


rule bwa_sam_se:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai",
    output:
        "mapped/{sample}.se.sam",
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",
    log:
        "logs/bwa_sam_se/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/samxe"


rule bwa_bam_pe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
    output:
        "mapped/{sample}.pe.bam",
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",
    log:
        "logs/bwa_bam_pe/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/samxe"


rule bwa_bam_se:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai",
    output:
        "mapped/{sample}.se.bam",
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'",  # optional: Extra parameters for bwa.
        sort="none",
    log:
        "logs/bwa_bam_se/{sample}.log",
    wrapper:
        "v0.87.0/bio/bwa/samxe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa==0.7.17
  • samtools==1.9
  • picard==2.20.1
Authors
  • Filipe G. Vieira
Code
"""Snakemake wrapper for both bwa samse and sampe."""

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Check inputs.
fastq = (
    snakemake.input.fastq
    if isinstance(snakemake.input.fastq, list)
    else [snakemake.input.fastq]
)
sai = (
    snakemake.input.sai
    if isinstance(snakemake.input.sai, list)
    else [snakemake.input.sai]
)
if len(fastq) == 1 and len(sai) == 1:
    alg = "samse"
elif len(fastq) == 2 and len(sai) == 2:
    alg = "sampe"
else:
    raise ValueError("input.fastq and input.sai must have 1 or 2 elements each")

# Extract output format
out_name, out_ext = path.splitext(snakemake.output[0])
out_ext = out_ext[1:].upper()

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to output format using samtools view.
    pipe_cmd = (
        "samtools view -h --output-fmt " + out_ext + " -o {snakemake.output[0]} -"
    )

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

    # Define output format
    sort_extra += " --output-fmt {}".format(out_ext)

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa {alg}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)

BWA-MEM2

For bwa-mem2, the following wrappers are available:

BWA-MEM2 INDEX

Creates a bwa-mem2 index.

URL:

Example

This wrapper can be used in the following way:

rule bwa_mem2_index:
    input:
        "{genome}",
    output:
        "{genome}.0123",
        "{genome}.amb",
        "{genome}.ann",
        "{genome}.bwt.2bit.64",
        "{genome}.pac",
    log:
        "logs/bwa-mem2_index/{genome}.log",
    params:
        prefix=lambda w: w.genome,
    wrapper:
        "v0.87.0/bio/bwa-mem2/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa-mem2==2.2.1
Authors
  • Christopher Schröder
  • Patrik Smeds
Code
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2020, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
    raise ValueError("Please provide exactly one reference genome as input.")

# Prefix that should be used for the database
prefix = None
if "prefix" in snakemake.params.keys():
    prefix = snakemake.params["prefix"]
else:
    prefix = splitext(snakemake.output[0])[0]

if len(prefix) > 0:
    prefix = "-p " + prefix

shell("bwa-mem2 index" " {prefix}" " {snakemake.input[0]}" " {log}")
BWA-MEM2

Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. Optional sorting using samtools or picard.

URL:

Example

This wrapper can be used in the following way:

rule bwa_mem2_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
    output:
        "mapped/{sample}.bam",
    log:
        "logs/bwa_mem2/{sample}.log",
    params:
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="none",  # Can be 'none', 'samtools' or 'picard'.
        sort_order="coordinate",  # Can be 'coordinate' (default) or 'queryname'.
        sort_extra="",  # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa-mem2/mem"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa-mem2==2.2.1
  • samtools==1.12
  • picard==2.23
Authors
  • Christopher Schröder
  • Johannes Köster
  • Julian de Ruiter
Code
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
    "Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

index = snakemake.input.get("index", "")
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

shell(
    "(bwa-mem2 mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {index}"
    " {snakemake.input.reads}"
    " | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER

Map reads using bwa-mem2, mark duplicates by samblaster and sort and index by sambamba.

URL:

Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        # Index can be a list of (all) files created by bwa, or one of them
        idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
    output:
        bam="mapped/{sample}.bam",
        index="mapped/{sample}.bam.bai",
    log:
        "logs/bwa_mem2_sambamba/{sample}.log",
    params:
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort_extra="-q",  # Extra args for sambamba.
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa-mem2/mem-samblaster"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bwa-mem2==2.2.1
  • sambamba==0.7.1
  • samblaster==0.1.24
Authors
  • Christopher Schröder
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")

index = snakemake.input.get("index", "")
if isinstance(index, str):
    index = path.splitext(snakemake.input.idx)[0]
else:
    index = path.splitext(snakemake.input.idx[0])[0]

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")

shell(
    "(bwa-mem2 mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {index}"
    " {snakemake.input.reads}"
    " | samblaster"
    " {samblaster_extra}"
    " | sambamba view -S -f bam /dev/stdin"
    " -t {snakemake.threads}"
    " | sambamba sort /dev/stdin"
    " -t {snakemake.threads}"
    " -o {snakemake.output.bam}"
    " {sort_extra}"
    ") {log}"
)

CAIROSVG

Convert SVG files with cairosvg.

URL:

Example

This wrapper can be used in the following way:

rule:
    input:
        "{prefix}.svg"
    output:
        "{prefix}.{fmt,(pdf|png)}"
    wrapper:
        "v0.87.0/utils/cairosvg"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • cairosvg=2.4.2
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

_, ext = os.path.splitext(snakemake.output[0])

if ext not in (".png", ".pdf", ".ps", ".svg"):
    raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]

shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")

CLUSTALO

Multiple alignment of nucleic acid and protein sequences.

URL:

Example

This wrapper can be used in the following way:

rule clustalo:
    input:
        "{sample}.fa"
    output:
        "{sample}.msa.fa"
    params:
        extra=""
    log:
        "logs/clustalo/test/{sample}.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/clustalo"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • clustalo==1.2.4
Authors
  • Michael Hall
Code
"""Snakemake wrapper for clustal omega."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "clustalo {extra}"
    " --threads={snakemake.threads}"
    " --in {snakemake.input[0]}"
    " --out {snakemake.output[0]} "
    " {log}"
)

CUTADAPT

For cutadapt, the following wrappers are available:

CUTADAPT-PE

Trim paired-end reads using cutadapt.

URL:

Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        fastq1="trimmed/{sample}.1.fastq",
        fastq2="trimmed/{sample}.2.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        # https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
        adapters="-a AGAGCACACGTCTGAACTCCAGTCAC -g AGATCGGAAGAGCACACGT -A AGAGCACACGTCTGAACTCCAGTCAC -G AGATCGGAAGAGCACACGT",
        # https://cutadapt.readthedocs.io/en/stable/guide.html#
        extra="--minimum-length 1 -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/cutadapt/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • cutadapt==3.4
Input/Output

Input:

  • two (paired-end) fastq files

Output:

  • two trimmed (paired-end) fastq files
  • text file containing trimming statistics
Authors
  • Julian de Ruiter
  • David Laehnemann
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."

extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

assert (
    extra != "" or adapters != ""
), "No options provided to cutadapt. Please use 'params: adapters=' or 'params: extra='."

shell(
    "cutadapt"
    " {adapters}"
    " {extra}"
    " -o {snakemake.output.fastq1}"
    " -p {snakemake.output.fastq2}"
    " -j {snakemake.threads}"
    " {snakemake.input}"
    " > {snakemake.output.qc} {log}"
)
CUTADAPT-SE

Trim single-end reads using cutadapt.

URL:

Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        "reads/{sample}.fastq"
    output:
        fastq="trimmed/{sample}.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        adapters="-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
        extra="-q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/cutadapt/se"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • cutadapt==3.4
Input/Output

Input:

  • fastq file

Output:

  • trimmed fastq file
  • text file containing trimming statistics
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming single-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 1, "Input must contain 1 (single-end) element."

extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

assert (
    extra != "" or adapters != ""
), "No options provided to cutadapt. Please use 'params: adapters=' or 'params: extra='."

shell(
    "cutadapt"
    " {adapters}"
    " {extra}"
    " -j {snakemake.threads}"
    " -o {snakemake.output.fastq}"
    " {snakemake.input[0]}"
    " > {snakemake.output.qc} {log}"
)

DADA2

For dada2, the following wrappers are available:

DADA2_ADD_SPECIES

DADA2 Adding species-level annotation using dada2 addSpecies function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_add_species:
    input:
        taxtab="results/dada2/taxa.RDS", # Taxonomic assignments
        refFasta="resources/example_species_assignment.fa.gz" # Reference FASTA
    output:
        "results/dada2/taxa-sp.RDS", # Taxonomic + Species assignments
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/add-species/add-species.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/add-species"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • taxa: RDS file containing the taxonomic assignments
  • refFasta: A string with the path to the FASTA reference database

Output:

  • The input RDS file augmented by the species-level annotation
Params
  • optional arguments for ``addSpecies(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for adding species-level
# annotation using dada2 assignTaxonomy function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           taxtab = readRDS(snakemake@input[["taxtab"]]),
           refFasta = snakemake@input[["refFasta"]]
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::addSpecies()")
}

# Learn errors rates for both read types
taxa.sp<-do.call(addSpecies, args)

# Store the taxonomic assignments as a RDS file
saveRDS(taxa.sp, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_ASSIGN_SPECIES

DADA2 Classifying sequences against a reference database using dada2 assignSpecies function. Optional parameters are documented in the manual and an example of the function can be found in the dedicated section of the DADA2 website.

URL:

Example

This wrapper can be used in the following way:

rule dada2_assign_species:
    input:
        seqs="results/dada2/seqTab.nochim.RDS", # Chimera-free sequence table
        refFasta="resources/species.fasta" # Reference FASTA for Genus-Species taxonomy
    output:
        "results/dada2/genus-species-taxa.RDS" # Genus-Species taxonomic assignments
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    allowMultiple=True
    log:
        "logs/dada2/assign-species/assign-species.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/assign-species"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • seqs: RDS file with the chimera-free sequence table
  • refFasta: A string with the path to the genus-species FASTA reference database

Output:

  • RDS file containing the genus and species taxonomic assignments
Params
  • optional arguments for ``assignTaxonomy(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for exact matching of sequences against
# a genus-species reference database using dada2 assignSpecies function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           seqs = readRDS(snakemake@input[["seqs"]]),
           refFasta = snakemake@input[["refFasta"]]
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::assignSpecies()")
}

# Perform Genus-Species taxonomic assignments
taxa<-do.call(assignSpecies, args)

# Store the taxonomic assignments as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_ASSIGN_TAXONOMY

DADA2 Classifying sequences against a reference database using dada2 assignTaxonomy function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_assign_taxonomy:
    input:
        seqs="results/dada2/seqTab.nochim.RDS", # Chimera-free sequence table
        refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
    output:
        "results/dada2/taxa.RDS" # Taxonomic assignments
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/assign-taxonomy/assign-taxonomy.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/assign-taxonomy"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • seqs: RDS file with the chimera-free sequence table
  • refFasta: A string with the path to the FASTA reference database

Output:

  • RDS file containing the taxonomic assignments
Params
  • optional arguments for ``assignTaxonomy(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for classifying sequences against
# a reference database using dada2 assignTaxonomy function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           seqs = readRDS(snakemake@input[["seqs"]]),
           refFasta = snakemake@input[["refFasta"]],
           multithread=snakemake@threads
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::assignTaxonomy()")
}

# Learn errors rates for both read types
taxa<-do.call(assignTaxonomy, args)

# Store the taxonomic assignments as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_COLLAPSE_NOMISMATCH

DADA2 Combine together sequences that are identical up to shifts and/or indels using dada2 collapseNoMismatch function. Optional parameters are documented in the manual. While the function is not included in the tutorial, feel free to browse the dada2 issues for showcases.

URL:

Example

This wrapper can be used in the following way:

rule dada2_collapse_nomismatch:
    input:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    output:
        "results/dada2/seqTab.collapsed.RDS"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/collapse-nomismatch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • RDS file with the chimera-free sequence table

Output:

  • RDS file with the sequence table where the needed sequences were collapsed
Params
  • optional arguments for ``collapseNoMismatch(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for combining together sequences that are identical
# up to shifts and/or indels using dada2 collapseNoMismatch function

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           seqtab = readRDS(snakemake@input[[1]])
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::collapseNoMismatch()")
}

# Collapse sequences
taxa<-do.call(collapseNoMismatch, args)

# Store the resulting table as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_DEREPLICATE_FASTQ

DADA2 Dereplication of FASTQ files using dada2 derepFastq function. Optional parameters are documented in the manual and though the function is not introduced explicitly in the tutorial it is used in under the hood in the learnErrors section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_dereplicate_fastq:
    input:
    # Quality filtered FASTQ file
        "filtered/{fastq}.fastq"
    output:
    # Dereplicated sequences stored as `derep-class` object in a RDS file
        "uniques/{fastq}.RDS"
    log:
        "logs/dada2/dereplicate-fastq/{fastq}.log"
    wrapper:
        "v0.87.0/bio/dada2/dereplicate-fastq"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • a FASTQ file

Output:

  • RDS file containing a derep-class object
Params
  • optional arguments for ``derepFastq(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for dereplicating FASTQ files using dada2 derepFastq function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list( fls = unlist(snakemake@input))
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::derepFastq()")
}
# Dereplicate
uniques<-do.call(derepFastq, args)

# Store as RDS file
saveRDS(uniques,snakemake@output[[1]])

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_FILTER_TRIM

DADA2 Quality filtering of single or paired-end reads using dada2 filterAndTrim function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_filter_trim_se:
    input:
        # Single-end files without primers sequences
        fwd="trimmed/{sample}.1.fastq.gz"
    output:
        filt="filtered-se/{sample}.1.fastq.gz",
        stats="reports/dada2/filter-trim-se/{sample}.tsv"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    params:
        # Set the maximum expected errors tolerated in filtered reads
        maxEE=1,
        # Set the number of kept bases to 7 for the toy example
        truncLen=7,
        # Set minLen to 1 for the toy example but default is 20
        minLen=1
    log:
        "logs/dada2/filter-trim-se/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/filter-trim"

rule dada2_filter_trim_pe:
    input:
        # Paired-end files without primers sequences
        fwd="trimmed/{sample}.1.fastq",
        rev="trimmed/{sample}.2.fastq"
    output:
        filt="filtered-pe/{sample}.1.fastq.gz",
        filt_rev="filtered-pe/{sample}.2.fastq.gz",
        stats="reports/dada2/filter-trim-pe/{sample}.tsv"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    params:
        # Set the maximum expected errors tolerated in filtered reads
        maxEE=1,
        # Set the number of kept bases in forward and reverse reads
        # respectively to 7 for the toy example
        truncLen=[7,6],
        # Set minLen to 1 for the toy example but default is 20
        minLen=1
    log:
        "logs/dada2/filter-trim-pe/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/filter-trim"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • fwd: a forward FASTQ file (potentially compressed) without primer sequences
  • rev: an (optional) reverse FASTQ file (potentially compressed) without primer sequences

Output:

  • filt: a compressed filtered forward FASTQ file
  • filt_rev: an (optional) compressed filtered reverse FASTQ file
  • stats: a .tsv file with the number of processed and filtered reads per sample
Params
  • optional arguments for ``filterAndTrim(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for filtering single or paired-end reads using dada2 filterAndTrim function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
        fwd = snakemake@input[["fwd"]],
        filt = snakemake@output[["filt"]],
        multithread=snakemake@threads
)
# Test if paired end input is passed
if(!is.null(snakemake@input[["rev"]]) & !is.null(snakemake@output[["filt_rev"]])){
        args<-c(args,
            rev = snakemake@input[["rev"]],
            filt.rev = snakemake@output[["filt_rev"]]
            )
}
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
    # Keeping only the named elements of the list for do.call()
    extra<-snakemake@params[ names(snakemake@params) != "" ]
    # Check if 'compress=' option is passed
    if(!is.null(extra[["compress"]])){
        stop("Remove the `compress=` option from `params`.\n",
            "The `compress` option is implicitly set here from the file extension.")
    } else {
        # Check if output files are given as compressed files
        # ex: in se version, all(TRUE, NULL) gives TRUE
        compressed <- c(
            endsWith(args[["filt"]], '.gz'),
            if(is.null(args[["filt.rev"]])) NULL else {endsWith(args[["filt.rev"]], 'gz')}
        )
        if ( all(compressed) ) {
            extra[["compress"]] <- TRUE
        } else if ( any(compressed) ) {
            stop("Either all or no fastq output should be compressed. Please check `output.filt` and `output.filt_rev` for consistency.")
        } else {
            extra[["compress"]] <- FALSE
        }
    }
    # Add them to the list of arguments
    args<-c(args, extra)
} else {
    message("No optional parameters. Using default parameters from dada2::filterAndTrim()")
}

# Call the function with arguments
filt.stats<-do.call(filterAndTrim, args)

# Write processed reads report
write.table(filt.stats, snakemake@output[["stats"]], sep="\t", quote=F)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_LEARN_ERRORS

DADA2 Learning error rates separately on paired-end data using dada2 learnErrors function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule learn_pe:
    # Run twice dada2_learn_errors: on forward and on reverse reads
    input: expand("results/dada2/model_{orientation}.RDS", orientation=[1,2])

rule dada2_learn_errors:
    input:
    # Quality filtered and trimmed forward FASTQ files (potentially compressed)
        expand("filtered/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
    output:
        err="results/dada2/model_{orientation}.RDS",# save the error model
        plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    randomize=True
    log:
        "logs/dada2/learn-errors/learn-errors_{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/learn-errors"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • A list of quality filtered and trimmed forward FASTQ files (potentially compressed)

Output:

  • err: RDS file with the stored error model
  • plot: plot observed vs estimated errors rates
Params
  • optional arguments for ``learnErrors(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for learning error rates on sequence data using dada2 learnErrors function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           fls = snakemake@input,
           multithread=snakemake@threads
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using defaults parameters from dada2::learnErrors()")
}

# Learn errors rates for both read types
err<-do.call(learnErrors, args)

# Plot estimated versus observed error rates to validate models
perr<-plotErrors(err, nominalQ = TRUE)

# Save the plots
library(ggplot2)
ggsave(snakemake@output[["plot"]], perr, width = 8, height = 8, dpi = 300)

# Store the estimated errors as RDS files
saveRDS(err, snakemake@output[["err"]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_MAKE_TABLE

DADA2 Build a sequence - sample table from denoised samples using dada2 makeSequenceTable function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_make_table_se:
    input:
    # Inferred composition
        expand("denoised/{sample}.1.RDS", sample=['a','b'])
    output:
        "results/dada2/seqTab-se.RDS"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    params:
        names=['a','b'] # Sample names instead of paths
    log:
        "logs/dada2/make-table/make-table-se.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/make-table"

rule dada2_make_table_pe:
    input:
    # Merged composition
        expand("merged/{sample}.RDS", sample=['a','b'])
    output:
        "results/dada2/seqTab-pe.RDS"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    params:
        names=['a','b'], # Sample names instead of paths
        orderBy="nsamples" # Change the ordering of samples
    log:
        "logs/dada2/make-table/make-table-pe.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/make-table"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • A list of RDS files with denoised samples (se), or denoised and merged samples (pe)

Output:

  • RDS file with the table
Params
  • names: A list of sample names instead of paths
  • params: Any other optional arguments for makeSequenceTable(), please provide them as python key=value pairs
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for building a sequence - sample table from denoised samples using dada2 makeSequenceTable function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# If names are provided use them
nm<-if(is.null(snakemake@params[["names"]])) NULL else snakemake@params[["names"]]

# From a list of n lists to one named list of n elements
smps<-setNames(
               object=unlist(snakemake@input),
               nm=nm
               )
# Read the RDS into the list
smps<-lapply(smps, readRDS)

# Prepare arguments (no matter the order)
args<-list( samples = smps)
# Check if extra params are passed (apart from [["names"]])
if(length(snakemake@params) > 1 ){
       # Keeping only the named elements of the list for do.call() (apart from [["names"]])
       extra<-snakemake@params[ names(snakemake@params) != "" & names(snakemake@params) != "names" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::makeSequenceTable()")
}

# Make table
seqTab<-do.call(makeSequenceTable, args)

# Store the table as a RDS file
saveRDS(seqTab, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_MERGE_PAIRS

DADA2 Merging denoised forward and reverse reads using dada2 mergePairs function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_merge_pairs:
    input:
      dadaF="denoised/{sample}.1.RDS",# Inferred composition
      dadaR="denoised/{sample}.2.RDS",
      derepF="uniques/{sample}.1.RDS",# Dereplicated sequences
      derepR="uniques/{sample}.2.RDS"
    output:
        "merged/{sample}.RDS"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/merge-pairs/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/merge-pairs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • dadaF: RDS file with the inferred sample composition from forward reads
  • dadaR: reverse
  • derepF: RDS file with the dereplicated forward reads
  • derepR: reverse

Output:

  • RDS file with the merged pairs
Params
  • optional arguments for ``mergePairs(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for merging denoised forward and reverse reads using dada2 mergePairs function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           dadaF = snakemake@input[["dadaF"]],
           derepF = snakemake@input[["derepF"]],
           dadaR = snakemake@input[["dadaR"]],
           derepR = snakemake@input[["derepR"]]
           )
# Read RDS from the list
args<-sapply(args,readRDS)

# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::mergePairs()")
}

# Merge pairs
merger<-do.call(mergePairs, args)

# Store the estimated errors as RDS files
saveRDS(merger, snakemake@output[[1]],compress = T)

# Close the connection for the log file
sink(type="message")
sink()
DADA2_QUALITY_PROFILES

DADA2 Plotting the quality profile of reads using dada2 plotQualityProfile function. The function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_quality_profile_se:
    input:
        # FASTQ file without primers sequences
        "trimmed/{sample}.{orientation}.fastq"
    output:
        "reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png"
    log:
        "logs/dada2/quality-profile/{sample}.{orientation}-quality-profile-se.log"
    wrapper:
        "v0.87.0/bio/dada2/quality-profile"

rule dada2_quality_profile_pe:
    input:
        # FASTQ file without primers sequences
        expand("trimmed/{{sample}}.{orientation}.fastq",orientation=[1,2])
    output:
        "reports/dada2/quality-profile/{sample}-quality-profile.png"
    log:
        "logs/dada2/quality-profile/{sample}-quality-profile-pe.log"
    wrapper:
        "v0.87.0/bio/dada2/quality-profile"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • a FASTQ file (potentially compressed) without primers sequences

Output:

  • A PNG file of the quality plot
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for plotting the quality profile of reads using dada2 plotQualityProfile function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Plot the quality profile for a given FASTQ file or a list of files
pquality<-plotQualityProfile(unlist(snakemake@input))

# Write the plots to files
library(ggplot2)
ggsave(snakemake@output[[1]], pquality, width = 4, height = 3, dpi = 300)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_REMOVE_CHIMERAS

DADA2 Remove chimera sequences from the sequence table data using dada2 removeBimeraDenovo function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_remove_chimeras:
    input:
        "results/dada2/seqTab.RDS" # Sequence table
    output:
        "results/dada2/seqTab.nochim.RDS" # Chimera-free sequence table
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/remove-chimeras/remove-chimeras.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/remove-chimeras"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • RDS file with the sequence table

Output:

  • RDS file with the chimera-free sequence table
Params
  • optional arguments for ``removeBimeraDenovo(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for removing chimeras sequences from
# the sequence table data using dada2 removeBimeraDenovo function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           unqs = readRDS(snakemake@input[[1]]),
           multithread=snakemake@threads
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::removeBimeraDenovo()")
}

# Remove chimeras
seqTab_nochimeras<-do.call(removeBimeraDenovo, args)

# Store the estimated errors as RDS files
saveRDS(seqTab_nochimeras, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_SAMPLE_INFERENCE

DADA2 Inferring sample composition using dada2 dada function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule dada2_sample_inference:
    input:
    # Dereplicated (aka unique) sequences of the sample
        derep="uniques/{fastq}.RDS",
        err="results/dada2/model_1.RDS" # Error model
    output:
        "denoised/{fastq}.RDS" # Inferred sample composition
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/sample-inference/{fastq}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/sample-inference"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-dada2==1.16
Input/Output

Input:

  • derep: RDS file with the dereplicated sequences
  • err: RDS file with the error model

Output:

  • RDS file with the stored inferred sample composition
Params
  • optional arguments for ``dada(), please provide them as python key=value pairs``:
Authors
  • Charlie Pauvert
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for inferring sample composition using dada2 dada function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           derep = readRDS(snakemake@input[["derep"]]),
           err = readRDS(snakemake@input[["err"]]),
           multithread = snakemake@threads
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::dada()")
}

# Learn errors rates for both read types
inferred_composition<-do.call(dada, args)

# Store the inferred sample composition as RDS files
saveRDS(inferred_composition, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()

DEEPTOOLS

For deeptools, the following wrappers are available:

DEEPTOOLS COMPUTEMATRIX

deepTools computeMatrix calculates scores per genomic region. The matrix file can be used as input for other tools or for the generation of a deepTools plotHeatmap or deepTools plotProfiles. For usage information about deepTools computeMatrix, please see the documentation. For more information about deepTools, also see the source code.

computeMatrix option Output format

Name of output

variable to be used

Recommended

extension

–outFileName, -out, -o gzipped matrix file

matrix_gz

(required)

“.gz”
–outFileNameMatrix

tab-separated table of

matrix file

matrix_tab “.tab”
–outFileSortedRegions

BED matrix file with sorted

regions after skipping zeros

or min/max threshold values

matrix_bed “.bed”

URL:

Example

This wrapper can be used in the following way:

rule compute_matrix:
    input:
         # Please note that the -R and -S options are defined via input files
         bed=expand("{sample}.bed", sample=["a", "b"]),
         bigwig=expand("{sample}.bw", sample=["a", "b"])
    output:
        # Please note that --outFileName, --outFileNameMatrix and --outFileSortedRegions are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/computematrix.html.
        matrix_gz="matrix_files/matrix.gz",   # required
        # optional output files
        matrix_tab="matrix_files/matrix.tab",
        matrix_bed="matrix_files/matrix.bed"
    log:
        "logs/deeptools/compute_matrix.log"
    params:
        # required argument, choose "scale-regions" or "reference-point"
        command="scale-regions",
        # optional parameters
        extra="--regionBodyLength 200 --verbose"
    wrapper:
        "v0.87.0/bio/deeptools/computematrix"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • deeptools==3.4.3
Input/Output

Input:

  • BED or GTF files (.bed or .gtf) AND
  • bigWig files (.bw)

Output:

  • gzipped matrix file (.gz) AND/OR
  • tab-separated table of matrix file (.tab) AND/OR
  • BED matrix file with sorted regions after skiping zeros or min/max threshold values (.bed)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_tab = snakemake.output.get("matrix_tab")
out_bed = snakemake.output.get("matrix_bed")

optional_output = ""

if out_tab:
    optional_output += " --outFileNameMatrix {out_tab} ".format(out_tab=out_tab)

if out_bed:
    optional_output += " --outFileSortedRegions {out_bed} ".format(out_bed=out_bed)

shell(
    "(computeMatrix "
    "{snakemake.params.command} "
    "{snakemake.params.extra} "
    "-R {snakemake.input.bed} "
    "-S {snakemake.input.bigwig} "
    "-o {snakemake.output.matrix_gz} "
    "{optional_output}) {log}"
)
DEEPTOOLS PLOTFINGERPRINT

deepTools plotFingerprint plots a profile of cumulative read coverages from a list of indexed BAM files. For usage information about deepTools plotFingerprint, please see the documentation. For more information about deepTools, also see the source code.

In addition to required output, an optional output file of read counts can be generated by setting the output variable “counts” (see example Snakemake rule below). Also an optional output file of quality control metrics can be generated by setting the variable “qc_metrics”. If the jsd_sample is specified in the input, the results of the Jensen-Shannon distance calculation are also written to this file.

plotFingerprint option Output

Name of output

variable to be used

Recommended

extension(s)

–plotFile, -plot, -o coverage plot

fingerprint

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outRawCounts

tab-separated table of read

counts per bin

counts “.tab”
–outQualityMetrics

tab-separated table of metrics

for quality control and for

results of Jensen-Shannon

distance calculation (optional)

metrics “.txt”

URL:

Example

This wrapper can be used in the following way:

rule plot_fingerprint:
    input:
        bam_files=expand("samples/{sample}.bam", sample=["a", "b"]),
        bam_idx=expand("samples/{sample}.bam.bai", sample=["a", "b"]),
        jsd_sample="samples/b.bam" # optional, requires qc_metrics output
    output:
        # Please note that --plotFile and --outRawCounts are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotfingerprint.html.
        fingerprint="plot_fingerprint/plot_fingerprint.png",  # required
        # optional output
        counts="plot_fingerprint/raw_counts.tab",
        qc_metrics="plot_fingerprint/qc_metrics.txt"
    log:
        "logs/deeptools/plot_fingerprint.log"
    params:
        # optional parameters
        "--numberOfSamples 200 "
    threads:
        8
    wrapper:
        "v0.87.0/bio/deeptools/plotfingerprint"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • deeptools==3.4.3
Input/Output

Input:

  • list of BAM files (.bam) AND
  • list of their index files (.bam.bai)

Output:

  • plot file in image format (.png, .eps, .pdf or .svg)
  • tab-separated table of read counts per bin (.tab) (optional)
  • tab-separated table of metrics and JSD calculation (.txt) (optional)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
import re

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

jsd_sample = snakemake.input.get("jsd_sample")
out_counts = snakemake.output.get("counts")
out_metrics = snakemake.output.get("qc_metrics")
optional_output = ""
jsd = ""

if jsd_sample:
    jsd += " --JSDsample {jsd} ".format(jsd=jsd_sample)

if out_counts:
    optional_output += " --outRawCounts {out_counts} ".format(out_counts=out_counts)

if out_metrics:
    optional_output += " --outQualityMetrics {metrics} ".format(metrics=out_metrics)

shell(
    "(plotFingerprint "
    "-b {snakemake.input.bam_files} "
    "-o {snakemake.output.fingerprint} "
    "{optional_output} "
    "--numberOfProcessors {snakemake.threads} "
    "{jsd} "
    "{snakemake.params}) {log}"
)
# ToDo: remove the 'NA' string replacement when fixed in deepTools, see:
# https://github.com/deeptools/deepTools/pull/999
regex_passes = 2

with open(out_metrics, "rt") as f:
    metrics = f.read()
    for i in range(regex_passes):
        metrics = re.sub("\tNA(\t|\n)", "\tnan\\1", metrics)

with open(out_metrics, "wt") as f:
    f.write(metrics)
DEEPTOOLS PLOTHEATMAP

deepTools plotHeatmap creates a heatmap for scores associated with genomic regions. As input, it requires a matrix file generated by deepTools computeMatrix. For usage information about deepTools plotHeatmap, please see the documentation. For more information about deepTools, also see the source code.

You can select which optional output files are generated by adding the respective output variable with the recommended extension(s) for them (see example Snakemake rule below).

PlotHeatmap option Output

Name of output

variable to be used

Recommended

extension(s)

–outFileName, -out, -o plot image

heatmap_img

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outFileSortedRegions

BED file with

sorted regions

regions “.bed”
–outFileNameMatrix

tab-separated matrix

of values underlying

the heatmap

heatmap_matrix “.tab”

URL:

Example

This wrapper can be used in the following way:

rule plot_heatmap:
    input:
         # matrix file from deepTools computeMatrix tool
         "matrix.gz"
    output:
        # Please note that --outFileSortedRegions and --outFileNameMatrix are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotheatmap.html.
        heatmap_img="plot_heatmap/heatmap.png",  # required
        # optional output files
        regions="plot_heatmap/heatmap_regions.bed",
        heatmap_matrix="plot_heatmap/heatmap_matrix.tab"
    log:
        "logs/deeptools/heatmap.log"
    params:
        # optional parameters
        "--plotType=fill "
    wrapper:
        "v0.87.0/bio/deeptools/plotheatmap"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • deeptools==3.4.3
Input/Output

Input:

  • gzipped matrix file from deepTools computeMatrix (.gz)

Output:

  • plot file in image format (.png, .eps, .pdf or .svg) AND/OR
  • file with sorted regions after skipping zeros or min/max threshold values (.bed) AND/OR
  • tab-separated table for average profile (.tab)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_region = snakemake.output.get("regions")
out_matrix = snakemake.output.get("heatmap_matrix")

optional_output = ""

if out_region:
    optional_output += " --outFileSortedRegions {out_region} ".format(
        out_region=out_region
    )

if out_matrix:
    optional_output += " --outFileNameMatrix {out_matrix} ".format(
        out_matrix=out_matrix
    )

shell(
    "(plotHeatmap "
    "-m {snakemake.input[0]} "
    "-o {snakemake.output.heatmap_img} "
    "{optional_output} "
    "{snakemake.params}) {log}"
)
DEEPTOOLS PLOTPROFILE

deepTools plotProfile plots scores over sets of genomic regions. As input, it requires a matrix file generated by deepToolscomputeMatrix. For usage information about deepTools plotProfile, please see the documentation. For more information about deepTools, also see the source code.

You can select which optional output files are generated by adding the respective output variable with the recommended extension for them (see example Snakemake rule below).

PlotProfile option Output

Name of output

variable to be used

Recommended

extension(s)

–outFileName, -out, -o profile plot

plot_img

(required)

“.png” or

“.eps” or

“.pdf” or

“.svg”

–outFileSortedRegions

BED file with

sorted regions

regions “.bed”
–outFileNameData

tab-separated table

for average profile

data “.tab”

URL:

Example

This wrapper can be used in the following way:

rule plot_profile:
    input:
         # matrix file from deepTools computeMatrix tool
         "matrix.gz"
    output:
        # Please note that --outFileSortedRegions and --outFileNameData are exclusively defined via output files.
        # Usable output variables, their extensions and which option they implicitly call are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotprofile.html.
        # Through the output variables image file and more output options for plot profile can be selected.
        plot_img="plot_profile/plot.png",  # required
        # optional output files
        regions="plot_profile/regions.bed",
        data="plot_profile/data.tab"
    log:
        "logs/deeptools/plot_profile.log"
    params:
        # optional parameters
        "--plotType=fill "
        "--perGroup "
        "--colors red yellow blue "
        "--dpi 150 "
    wrapper:
        "v0.87.0/bio/deeptools/plotprofile"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • deeptools==3.4.3
Input/Output

Input:

  • gzipped matrix file from deepTools computeMatrix (.gz)

Output:

  • plot file in image format (.png, .eps, .pdf or .svg) AND/OR
  • file with sorted regions after skipping zeros or min/max threshold values (.bed) AND/OR
  • tab-separated table for average profile (.tab)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_region = snakemake.output.get("regions")
out_data = snakemake.output.get("data")

optional_output = ""

if out_region:
    optional_output += " --outFileSortedRegions {out_region} ".format(
        out_region=out_region
    )

if out_data:
    optional_output += " --outFileNameData {out_data} ".format(out_data=out_data)

shell(
    "(plotProfile "
    "-m {snakemake.input[0]} "
    "-o {snakemake.output.plot_img} "
    "{optional_output} "
    "{snakemake.params}) {log}"
)

DEEPVARIANT

Call genetic variants using deep neural network. Copyright 2017 Google LLC. BSD 3-Clause “New” or “Revised” https://github.com/google/deepvariant

URL:

Example

This wrapper can be used in the following way:

rule deepvariant:
    input:
        bam="mapped/{sample}.bam",
        ref="genome/genome.fasta"
    output:
        vcf="calls/{sample}.vcf.gz"
    params:
        model="wgs",   # {wgs, wes, pacbio, hybrid}
        sample_name=lambda w: w.sample, # optional
        extra=""
    threads: 2
    log:
        "logs/deepvariant/{sample}/stdout.log"
    wrapper:
        "v0.87.0/bio/deepvariant"


rule deepvariant_gvcf:
    input:
        bam="mapped/{sample}.bam",
        ref="genome/genome.fasta"
    output:
        vcf="gvcf_calls/{sample}.vcf.gz",
        gvcf="gvcf_calls/{sample}.g.vcf.gz"
    params:
        model="wgs",   # {wgs, wes, pacbio, hybrid}
        extra=""
    threads: 2
    log:
        "logs/deepvariant/{sample}/stdout.log"
    wrapper:
        "v0.87.0/bio/deepvariant"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • deepvariant==1.1.0
Input/Output

Input:

  • fasta
  • bam

Output:

  • vcf
  • visual report html
Notes
  • The extra param alllows for additional program arguments.
  • This snakemake wrapper uses bioconda deepvariant package. Copyright 2018 Brad Chapman.
Authors
  • Tetsuro Hisayoshi
  • Nikos Tsardakas Renhuldt
Code
__author__ = "Tetsuro Hisayoshi"
__copyright__ = "Copyright 2020, Tetsuro Hisayoshi"
__email__ = "hisayoshi0530@gmail.com"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

log_dir = os.path.dirname(snakemake.log[0])
output_dir = os.path.dirname(snakemake.output[0])

# sample name defaults to basename
sample_name = snakemake.params.get(
    "sample_name", os.path.splitext(os.path.basename(snakemake.input.bam))[0]
)


make_examples_gvcf = postprocess_gvcf = ""
gvcf = snakemake.output.get("gvcf", None)
if gvcf:
    make_examples_gvcf = "--gvcf {tmp_dir} "
    postprocess_gvcf = (
        "--gvcf_infile {tmp_dir}/{sample_name}.gvcf.tfrecord@{snakemake.threads}.gz "
        "--gvcf_outfile {snakemake.output.gvcf} "
    )

with tempfile.TemporaryDirectory() as tmp_dir:
    shell(
        "(dv_make_examples.py "
        "--cores {snakemake.threads} "
        "--ref {snakemake.input.ref} "
        "--reads {snakemake.input.bam} "
        "--sample {sample_name} "
        "--examples {tmp_dir} "
        "--logdir {log_dir} " + make_examples_gvcf + "{extra} \n"
        "dv_call_variants.py "
        "--cores {snakemake.threads} "
        "--outfile {tmp_dir}/{sample_name}.tmp "
        "--sample {sample_name} "
        "--examples {tmp_dir} "
        "--model {snakemake.params.model} \n"
        "dv_postprocess_variants.py "
        "--ref {snakemake.input.ref} "
        + postprocess_gvcf
        + "--infile {tmp_dir}/{sample_name}.tmp "
        "--outfile {snakemake.output.vcf} ) {log}"
    )

DELLY

Call variants with delly.

URL:

Example

This wrapper can be used in the following way:

rule delly:
    input:
        ref="genome.fasta",
        samples=["mapped/a.bam"],
        # optional exclude template (see https://github.com/dellytools/delly)
        exclude="human.hg19.excl.tsv"
    output:
        "sv/calls.bcf"
    params:
        extra=""  # optional parameters for delly (except -g, -x)
    log:
        "logs/delly.log"
    threads: 2  # It is best to use as many threads as samples
    wrapper:
        "v0.87.0/bio/delly"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • delly==0.8.7
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


exclude = (
    "-x {}".format(snakemake.input.exclude)
    if snakemake.input.get("exclude", "")
    else ""
)

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
    "{exclude} -g {snakemake.input.ref} "
    "-o {snakemake.output[0]} {snakemake.input.samples} {log}"
)

DIAMOND

For diamond, the following wrappers are available:

DIAMOND BLASTP

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. For documentation, see https://github.com/bbuchfink/diamond/wiki

URL:

Example

This wrapper can be used in the following way:

rule diamond_blastp:
    input:
        fname_fasta="{sample}.fasta",  # Query fasta file
        fname_db="db.dmnd",  # Diamond db
    output:
        fname="{sample}.tsv.gz",  # Output file
    log:
        "logs/diamond_blastp/{sample}.log",
    params:
        extra="--header --compress 1",  # Additional arguments
    threads: 8
    wrapper:
        "v0.87.0/bio/diamond/blastp"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • diamond==2.0
Authors
  • Nikos Tsardakas Renhuldt
  • Kim Philipp Jablonski
Code
__author__ = "Kim Philipp Jablonski, Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski, Nikos Tsardakas Renhuldt"
__email__ = "kim.philipp.jablonski@gmail.com, nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell(
    "diamond blastp"
    " --threads {snakemake.threads}"
    " --db {snakemake.input.fname_db}"
    " --query {snakemake.input.fname_fasta}"
    " --out {snakemake.output.fname}"
    " {extra}"
    " {log}"
)
DIAMOND BLASTX

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.

URL:

Example

This wrapper can be used in the following way:

rule diamond_blastx:
    input:
        fname_fastq = "{sample}.fastq",
        fname_db = "db.dmnd"
    output:
        fname = "{sample}.tsv.gz"
    log:
        "logs/diamond_blastx/{sample}.log"
    params:
        extra="--header --compress 1"
    threads: 8
    wrapper:
        "v0.87.0/bio/diamond/blastx"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • diamond==2.0.6
Authors
  • Kim Philipp Jablonski
Code
__author__ = "Kim Philipp Jablonski"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski"
__email__ = "kim.philipp.jablonski@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell(
    "diamond blastx"
    " --threads {snakemake.threads}"
    " --db {snakemake.input.fname_db}"
    " --query {snakemake.input.fname_fastq}"
    " --out {snakemake.output.fname}"
    " {extra}"
    " {log}"
)
DIAMOND MAKEDB

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.

URL:

Example

This wrapper can be used in the following way:

rule diamond_makedb:
    input:
        fname = "{reference}.fasta",
    output:
        fname = "{reference}.dmnd"
    log:
        "logs/diamond_makedb/{reference}.log"
    params:
        extra=""
    threads: 8
    wrapper:
        "v0.87.0/bio/diamond/makedb"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • diamond==2.0.6
Authors
  • Kim Philipp Jablonski
Code
__author__ = "Kim Philipp Jablonski"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski"
__email__ = "kim.philipp.jablonski@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell(
    "diamond makedb"
    " --threads {snakemake.threads}"
    " --in {snakemake.input.fname}"
    " --db {snakemake.output.fname}"
    " {extra}"
    " {log}"
)

EPIC

For epic, the following wrappers are available:

EPIC

Find broad enriched domains in ChIP-Seq data with epic

URL:

Example

This wrapper can be used in the following way:

rule epic:
    input:
      treatment = "bed/test.bed",
      background = "bed/control.bed"
    output:
      enriched_regions = "epic/enriched_regions.csv", # required
      bed = "epic/enriched_regions.bed", # optional
      matrix = "epic/matrix.gz" # optional
    log:
        "logs/epic/epic.log"
    params:
      genome = "hg19", # optional, default hg19
      extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
    threads: 1 # optional, defaults to 1
    wrapper:
        "v0.87.0/bio/epic/peaks"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • epic=0.2.7
  • pandas=0.22.0
Input/Output

Input:

  • treatment: chip .bed(.gz/.bz) files
  • background: input .bed(.gz/.bz) files

Output:

  • enriched_regions: main output file with enriched peaks
  • bed: (optional) contains much of the same info as enriched_regions but in a bed format, suitable for viewing in the UCSC genome browser or downstream use with bedtools
  • matrix: (optional) a gzipped matrix of read counts
Params
  • extra: additional parameters
  • log: (optional) file to write the log output to
Notes
  • All/any of the different bigwig options must be given as extra parameters
Authors
  • Endre Bakken Stovner
Code
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1

treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")

# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")

bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")

if len(snakemake.log) > 0:
    log = snakemake.log[0]

genome = snakemake.params.get("genome")

cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"

if bed:
    cmd += " -b {bed}"
if matrix:
    cmd += " -sm {matrix}"
if log:
    cmd += " -l {log}"

cmd += " {extra}"

shell(cmd)

FASTP

trim and QC fastq reads with fastp

URL:

Example

This wrapper can be used in the following way:

rule fastp_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        trimmed="trimmed/se/{sample}.fastq",
        failed="trimmed/se/{sample}.failed.fastq",
        html="report/se/{sample}.html",
        json="report/se/{sample}.json"
    log:
        "logs/fastp/se/{sample}.log"
    params:
        adapters="--adapter_sequence ACGGCTAGCTA",
        extra=""
    threads: 1
    wrapper:
        "v0.87.0/bio/fastp"


rule fastp_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
        # Unpaired reads separately
        unpaired1="trimmed/pe/{sample}.u1.fastq",
        unpaired2="trimmed/pe/{sample}.u2.fastq",
        # or in a single file
#        unpaired="trimmed/pe/{sample}.singletons.fastq",
        merged="trimmed/pe/{sample}.merged.fastq",
        failed="trimmed/pe/{sample}.failed.fastq",
        html="report/pe/{sample}.html",
        json="report/pe/{sample}.json"
    log:
        "logs/fastp/pe/{sample}.log"
    params:
        adapters="--adapter_sequence ACGGCTAGCTA --adapter_sequence_r2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
        extra="--merge"
    threads: 2
    wrapper:
        "v0.87.0/bio/fastp"

rule fastp_pe_wo_trimming:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        html="report/pe_wo_trimming/{sample}.html",
        json="report/pe_wo_trimming/{sample}.json"
    log:
        "logs/fastp/pe_wo_trimming/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "v0.87.0/bio/fastp"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fastp=0.20
Input/Output

Input:

  • fastq file(s)

Output:

  • trimmed fastq file(s)
  • unpaired reads (optional; eihter in a single file or separate)
  • merged reads (optional)
  • failed reads (optional)
  • json file containing trimming statistics
  • html file containing trimming statistics
Notes
  • The adapters param allows to specify adapter sequences
  • The extra param allows for additional program arguments.
  • For more inforamtion see, https://github.com/OpenGene/fastp
Authors
Code
__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"

from snakemake.shell import shell
import re

extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Assert input
n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."


# Input files
if n == 1:
    reads = "--in1 {}".format(snakemake.input.sample)
else:
    reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)


# Output files
trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths:
    if n == 1:
        trimmed = "--out1 {}".format(snakemake.output.trimmed)
    else:
        trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)

        # Output unpaired files
        unpaired = snakemake.output.get("unpaired", None)
        if unpaired:
            trimmed += f" --unpaired1 {unpaired} --unpaired2 {unpaired}"
        else:
            unpaired1 = snakemake.output.get("unpaired1", None)
            if unpaired1:
                trimmed += f" --unpaired1 {unpaired1}"
            unpaired2 = snakemake.output.get("unpaired2", None)
            if unpaired2:
                trimmed += f" --unpaired2 {unpaired2}"

        # Output merged PE reads
        merged = snakemake.output.get("merged", None)
        if merged:
            if not re.search(r"--merge\b", extra):
                raise ValueError(
                    "output.merged specified but '--merge' option missing from params.extra"
                )
            trimmed += f" --merged_out {merged}"
else:
    trimmed = ""


# Output failed reads
failed = snakemake.output.get("failed", None)
if failed:
    trimmed += f" --failed_out {failed}"


# Stats
html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)


shell(
    "(fastp --thread {snakemake.threads} "
    "{extra} "
    "{adapters} "
    "{reads} "
    "{trimmed} "
    "{json} "
    "{html} ) {log}"
)

FASTQ_SCREEN

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

URL:

Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq"
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png"
    params:
        fastq_screen_config="fastq_screen.conf",
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "v0.87.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fastq-screen==0.5.2
  • bowtie2==2.2.6
  • bowtie==1.1.2
Input/Output

Input:

  • A FASTQ file, gzipped or not.

Output:

  • txt: a text file containing the fraction of reads mapping to each provided index
  • png: a bar plot of the contents of txt, saved as a PNG file
Notes
  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.
  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.
  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.
Authors
  • Ryan Dale
Code
import os
import re
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"

_config = snakemake.params["fastq_screen_config"]

subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
    tmp = tempfile.NamedTemporaryFile(delete=False).name
    with open(tmp, "w") as fout:
        for label, indexes in _config["database"].items():
            for aligner, index in indexes.items():
                fout.write(
                    "\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
                )
        for aligner, path in _config["aligner_paths"].items():
            fout.write("\t".join([aligner.upper(), path]) + "\n")
    config_file = tmp
else:
    config_file = _config

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]

tempdir = tempfile.mkdtemp()

shell(
    "fastq_screen --outdir {tempdir} "
    "--force "
    "--aligner {aligner} "
    "--conf {config_file} "
    "--subset {subset} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{snakemake.input[0]} "
    "{log}"
)

# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")

# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
    shell("rm {tmp}")

FASTQC

Generate fastq qc statistics using fastqc.

URL:

Example

This wrapper can be used in the following way:

rule fastqc:
    input:
        "reads/{sample}.fastq"
    output:
        html="qc/fastqc/{sample}.html",
        zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
    params: "--quiet"
    log:
        "logs/fastqc/{sample}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/fastqc"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fastqc==0.11.9
Input/Output

Input:

  • fastq file

Output:

  • html file containing statistics
  • zip file containing statistics
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for fastqc."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
import re
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)
    # Remove file extension(s) (similar to the internal fastqc approach)
    base = re.sub("\\.gz$", "", base)
    base = re.sub("\\.bz2$", "", base)
    base = re.sub("\\.txt$", "", base)
    base = re.sub("\\.fastq$", "", base)
    base = re.sub("\\.fq$", "", base)
    base = re.sub("\\.sam$", "", base)
    base = re.sub("\\.bam$", "", base)

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")

FASTTREE

build phylogenetic trees using fasttree. Documentation found at http://www.microbesonline.org/fasttree/

URL:

Example

This wrapper can be used in the following way:

rule fasttree:
    input:
        alignment="{sample}.fa",  # Input alignment file
    output:
        tree="{sample}.nwk",  # Output tree file
    log:
        "logs/muscle/{sample}.log",
    params:
        extra="",  # Additional arguments
    wrapper:
        "v0.87.0/bio/fasttree"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fasttree==2.1.10
Input/Output

Input:

  • Alignment FASTA or interleaved phylip file

Output:

  • Newick formatted tree file
Notes
  • fasttree can only be run with a single thread.
Authors
  • Nikos Tsardakas Renhuldt
Code
__author__ = "Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2021, Nikos Tsardakas Renhuldt"
__email__ = "nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"


from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "fasttree "
    "{extra} "
    "{snakemake.input.alignment} "
    "> {snakemake.output.tree} "
    "{log}"
)

FGBIO

For fgbio, the following wrappers are available:

FGBIO ANNOTATEBAMWITHUMIS

Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.

URL:

Example

This wrapper can be used in the following way:

rule AnnotateBam:
    input:
        bam="mapped/{sample}.bam",
        umi="umi/{sample}.fastq",
    output:
        "mapped/{sample}.annotated.bam",
    params: ""
    resources:
        mem_gb="4" # memory to be given to fgbio
    log:
        "logs/fgbio/annotate_bam/{sample}.log",
    wrapper:
        "v0.87.0/bio/fgbio/annotatebamwithumis"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==1.4.0
  • snakemake-wrapper-utils==0.2
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

bam_input = snakemake.input.bam

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
    raise ValueError("Input bam should be a string: " + str(bam_input) + "!")

umi_input = snakemake.input.umi

if umi_input is None:
    raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
    raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")

if not len(snakemake.output) == 1:
    raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]


if output_file is None:
    raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
    raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")

shell(
    "fgbio {java_opts} AnnotateBamWithUmis"
    " -i {bam_input}"
    " -f {umi_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO CALLMOLECULARCONSENSUSREADS

Calls consensus sequences from reads with the same unique molecular tag.

URL:

Example

This wrapper can be used in the following way:

rule ConsensusReads:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.m3.bam"
    params:
        extra="-M 3"
    log:
        "logs/fgbio/consensus_reads/{sample}.log"
    wrapper:
        "v0.87.0/bio/fgbio/callmolecularconsensusreads"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==0.6.1
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio CallMolecularConsensusReads"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO COLLECTDUPLEXSEQMETRICS

Collects a suite of metrics to QC duplex sequencing data.g.

URL:

Example

This wrapper can be used in the following way:

rule CollectDuplexSeqMetrics:
    input:
        "mapped/{sample}.gu.bam"
    output:
        family_sizes="stats/{sample}.family_sizes.txt",
        duplex_family_sizes="stats/{sample}.duplex_family_sizes.txt",
        duplex_yield_metrics="stats/{sample}.duplex_yield_metrics.txt",
        umi_counts="stats/{sample}.umi_counts.txt",
        duplex_qc="stats/{sample}.duplex_qc.pdf",
        duplex_umi_counts="stats/{sample}.duplex_umi_counts.txt",
    params:
        extra=lambda wildcards: "-d " + wildcards.sample
    log:
        "logs/fgbio/collectduplexseqmetrics/{sample}.log"
    wrapper:
        "v0.87.0/bio/fgbio/collectduplexseqmetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==0.6.1
  • r-ggplot2
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from os import path

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

family_sizes = snakemake.output.family_sizes
duplex_family_sizes = snakemake.output.duplex_family_sizes
duplex_yield_metrics = snakemake.output.duplex_yield_metrics
umi_counts = snakemake.output.umi_counts
duplex_qc = snakemake.output.duplex_qc
duplex_umi_counts = snakemake.output.get("duplex_umi_counts", None)

file_path = str(path.dirname(family_sizes))
name = str(path.basename(family_sizes)).split(".")[0]
path_name_prefix = str(path.join(file_path, name))

if not family_sizes == path_name_prefix + ".family_sizes.txt":
    raise Exception(
        "Unexpected family_sizes path/name format, expected {}, got {}.".format(
            path_name_prefix + ".family_sizes.txt", family_sizes
        )
    )
if not duplex_family_sizes == path_name_prefix + ".duplex_family_sizes.txt":
    raise Exception(
        "Unexpected duplex_family_sizes path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_family_sizes.txt", duplex_family_sizes
        )
    )
if not duplex_yield_metrics == path_name_prefix + ".duplex_yield_metrics.txt":
    raise Exception(
        "Unexpected duplex_yield_metrics path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_yield_metrics.txt", duplex_yield_metrics
        )
    )
if not umi_counts == path_name_prefix + ".umi_counts.txt":
    raise Exception(
        "Unexpected umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".umi_counts.txt", umi_counts
        )
    )
if not duplex_qc == path_name_prefix + ".duplex_qc.pdf":
    raise Exception(
        "Unexpected duplex_qc path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_qc.pdf", duplex_qc
        )
    )
if (
    duplex_umi_counts is not None
    and not duplex_umi_counts == path_name_prefix + ".duplex_umi_counts.txt"
):
    raise Exception(
        "Unexpected duplex_umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_umi_counts.txt", duplex_umi_counts
        )
    )

duplex_umi_counts_flag = ""
if duplex_umi_counts is not None:
    duplex_umi_counts_flag = "-u "

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

shell(
    "fgbio CollectDuplexSeqMetrics"
    " -i {bam_input}"
    " -o {path_name_prefix}"
    " {duplex_umi_counts_flag}"
    " {extra_params}"
    " {log}"
)
FGBIO FILTERCONSENSUSREADS

Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

URL:

Example

This wrapper can be used in the following way:

rule FilterConsensusReads:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.filtered.bam"
    params:
        extra="",
        min_base_quality=2,
        min_reads=[2, 2, 2],
        ref="genome.fasta"
    log:
        "logs/fgbio/filterconsensusreads/{sample}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/fgbio/filterconsensusreads"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==0.6.1
Input/Output

Input:

  • bam file
  • vcf files
  • reference genome

Output:

  • filtered bam file
Notes
  • min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
  • min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
  • For more information see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
    raise ValueError("min_base_quality needs to be provided as an Int!")

min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
    raise ValueError(
        "min_reads needs to be provided as list of Ints, min length 1, max length 3!"
    )

ref = snakemake.params.get("ref", None)
if ref is None:
    raise ValueError("A reference needs to be provided!")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

bam_output = snakemake.output[0]

if not isinstance(bam_output, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(bam_output) + "!")

shell(
    "fgbio FilterConsensusReads"
    " -i {bam_input}"
    " -o {bam_output}"
    " -r {ref}"
    " --min-reads {min_reads}"
    " --min-base-quality {min_base_quality}"
    " {extra_params}"
    " {log}"
)
FGBIO GROUPREADSBYUMI

Groups reads together that appear to have come from the same original molecule.

URL:

Example

This wrapper can be used in the following way:

rule GroupReads:
    input:
        "mapped/a.bam"
    output:
        bam="mapped/{sample}.gu.bam",
        hist="mapped/{sample}.gu.histo.tsv",
    params:
        extra="-s adjacency --edits 1"
    log:
        "logs/fgbio/group_reads/{sample}.log"
    wrapper:
        "v0.87.0/bio/fgbio/groupreadsbyumi"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==0.6.1
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_bam_file = snakemake.output.bam

if not isinstance(output_bam_file, str) and len(output_bam_file) != 1:
    raise ValueError("Bam output should be one bam file: " + str(output_bam_file) + "!")

output_histo_file = snakemake.output.hist

if not isinstance(output_histo_file, str) and len(output_histo_file) != 1:
    raise ValueError(
        "Histo output should be one histogram file path: "
        + str(output_histo_file)
        + "!"
    )

shell(
    "fgbio GroupReadsByUmi"
    " -i {bam_input}"
    " -o {output_bam_file}"
    " -f {output_histo_file}"
    " {extra_params}"
    " {log}"
)
FGBIO SETMATEINFORMATION

Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.

URL:

Example

This wrapper can be used in the following way:

rule SetMateInfo:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.mi.bam"
    params: ""
    log:
        "logs/fgbio/set_mate_info/{sample}.log"
    wrapper:
        "v0.87.0/bio/fgbio/setmateinformation"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • fgbio==0.6.1
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio SetMateInformation"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)

FILTLONG

Quality filtering tool for long reads.

URL:

Example

This wrapper can be used in the following way:

rule filtlong:
    input:
        reads = "{sample}.fastq"
    output:
        "{sample}.filtered.fastq"
    params:
        extra=" --mean_q_weight 5.0",
        target_bases = 10
    log:
        "logs/filtlong/test/{sample}.log"
    wrapper:
        "v0.87.0/bio/filtlong"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • filtlong=0.2.0=he941832_2
Authors
  • Michael Hall
Code
"""Snakemake wrapper for filtlong."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
target_bases = int(snakemake.params.get("target_bases", 0))
if target_bases > 0:
    extra += " --target_bases {}".format(target_bases)

# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Executed shell command
shell("filtlong {extra}" " {snakemake.input.reads} > {snakemake.output} {log}")

FREEBAYES

Call small genomic variants with freebayes.

URL:

Example

This wrapper can be used in the following way:

rule freebayes:
    input:
        ref="genome.fasta",
        # you can have a list of samples here
        samples="mapped/{sample}.bam",
        # the matching BAI indexes have to present for freebayes
        indexes="mapped/{sample}.bam.bai",
        # optional BED file specifying chromosomal regions on which freebayes
        # should run, e.g. all regions that show coverage
        #regions="path/to/region-file.bed"
    output:
        "calls/{sample}.vcf",  # either .vcf or .bcf
    log:
        "logs/freebayes/{sample}.log",
    params:
        extra="",  # optional parameters
        chunksize=100000,  # reference genome chunk size for parallelization (default: 100000)
        normalize=False,  # optional flag to use bcftools norm to normalize indels (Valid params are -a, -f, -m, -D or -d)
    threads: 2
    wrapper:
        "v0.87.0/bio/freebayes"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • freebayes=1.3.2
  • bcftools=1.11
  • parallel=20190522
  • bedtools>=2.29
  • sed=4.7
Input/Output

Input:

  • SAM/BAM/CRAM file(s)
  • reference genome

Output:

  • VCF/VCF.gz/BCF file
Notes
  • The extra param allows for additional arguments for freebayes.
  • The `uncompressed_bcf`param allows for uncompressed BCF output.
  • The optional normalize param allows to use bcftools norm to normalize indels. When set one of the following params must be passed: -a, -f, -m, -D or -d
  • The chunkzise param allows setting reference genome chunk size for parallelization (default: 100000)
  • For more inforamtion see, https://github.com/freebayes/freebayes
Authors
  • Johannes Köster
  • Felix Mölder
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster, Felix Mölder, Christopher Schröder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"


from os import path
from snakemake.shell import shell
from tempfile import TemporaryDirectory

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = snakemake.params.get("extra", "")
norm = snakemake.params.get("normalize", False)


# Infer output format
uncompressed_bcf = snakemake.params.get("uncompressed_bcf", False)
out_name, out_ext = path.splitext(snakemake.output[0])
if out_ext == ".vcf":
    out_format = "v"
elif out_ext == ".bcf":
    if uncompressed_bcf:
        out_format = "u"
    else:
        out_format = "b"
elif out_ext == ".gz":
    out_name, out_ext = path.splitext(out_name)
    if out_ext == ".vcf":
        out_format = "z"
    else:
        raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")
else:
    raise ValueError("output file with invalid extension (.vcf, .vcf.gz, .bcf).")


pipe = ""
if norm:
    pipe = f"| bcftools norm {norm} --output-type {out_format} -"
else:
    pipe = f"| bcftools view --output-type {out_format} -"


if snakemake.threads == 1:
    freebayes = "freebayes"
else:
    chunksize = snakemake.params.get("chunksize", 100000)
    regions = (
        "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
            snakemake=snakemake, chunksize=chunksize
        )
    )
    if snakemake.input.get("regions", ""):
        regions = (
            "<(bedtools intersect -a "
            r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
            "{regions}) -b {snakemake.input.regions} | "
            r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
        ).format(regions=regions, snakemake=snakemake)
    freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
        snakemake=snakemake, regions=regions
    )

with TemporaryDirectory() as tempdir:
    shell(
        "({freebayes} {params} -f {snakemake.input.ref}"
        " {snakemake.input.samples} | bcftools sort -T {tempdir} - {pipe} > {snakemake.output[0]}) {log}"
    )

GATK

For gatk, the following wrappers are available:

GATK APPLYBQSR

Run gatk ApplyBQSR.

URL:

Example

This wrapper can be used in the following way:

rule gatk_applybqsr:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        recal_table="recal/{sample}.grp"
    output:
        bam="recal/{sample}.bam"
    log:
        "logs/gatk/gatk_applybqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/applybqsr"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • openjdk=8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • fasta reference
  • recalibration table for the bam

Output:

  • recalibrated bam file
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
  • The extra param allows for additional program arguments for ApplyBSQR.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360037055712-ApplyBQSR
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
    "gatk --java-options '{java_opts}' ApplyBQSR {extra} "
    "-R {snakemake.input.ref} -I {snakemake.input.bam} "
    "--bqsr-recal-file {snakemake.input.recal_table} "
    "-O {snakemake.output.bam} {log}"
)
GATK APPLYBQSRSPARK

ApplyBQSRSpark: Apply base quality score recalibration on Spark; uses output of the BaseRecalibrator tool.

URL:

Example

This wrapper can be used in the following way:

rule gatk_applybqsr_spark:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        recal_table="recal/{sample}.grp"
    output:
        bam="recal/{sample}.bam"
    log:
        "logs/gatk/gatk_applybqsr_spark/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_v0.87.0="",  # optional
        #spark_extra="", # optional
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/applybqsrspark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • openjdk=8
  • snakemake-wrapper-utils=0.1.3
Input/Output

Input:

  • bam file
  • fasta reference
  • recalibration table for the bam

Output:

  • recalibrated bam file
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
  • The extra param allows for additional program arguments for applybqsrspark.
  • The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
  • The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
  • The spark_extra param allows for additional spark arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360057440431-ApplyBQSRSpark-BETA-
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

import tempfile
import random
from pathlib import Path

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
    "spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

with tempfile.TemporaryDirectory() as tmpdir:
    # This folder must not exist; it is created by GATK
    tmpdir_shards = Path(tmpdir) / "shards_{:06d}".format(random.randrange(10 ** 6))

    shell(
        "gatk --java-options '{java_opts}' ApplyBQSRSpark {extra} "
        "--reference {snakemake.input.ref} --input {snakemake.input.bam} "
        "--bqsr-recal-file {snakemake.input.recal_table} "
        "--tmp-dir {tmpdir} --output-shard-tmp-dir {tmpdir_shards} "
        "--output {snakemake.output.bam} "
        "-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
        "{log}"
    )
GATK APPLYVQSR

Run gatk ApplyVQSR.

URL:

Example

This wrapper can be used in the following way:

rule apply_vqsr:
    input:
        vcf="test.vcf",
    recal="snps.recal",
    tranches="snps.tranches",
    ref="ref.fasta"
    output:
        vcf="test.snp_recal.vcf"
    log:
        "logs/gatk/applyvqsr.log"
    params:
        mode="SNP",  # set mode, must be either SNP, INDEL or BOTH
        extra="" # optional
    resources:
        mem_mb=50
    wrapper:
            "v0.87.0/bio/gatk/applyvqsr"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • VCF file
  • Recalibration file
  • Tranches file

Output:

  • Variant QualityScore-Recalibrated VCF
Notes
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' ApplyVQSR {extra} "
    "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "--recal-file {snakemake.input.recal} "
    "--tranches-file {snakemake.input.tranches} "
    "-mode {snakemake.params.mode} "
    "--output {snakemake.output.vcf} "
    "{log}"
)
GATK BASERECALIBRATOR

Run gatk BaseRecalibrator.

URL:

Example

This wrapper can be used in the following way:

rule gatk_baserecalibrator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        known="dbsnp.vcf.gz"  # optional known sites - single or a list
    output:
        recal_table="recal/{sample}.grp"
    log:
        "logs/gatk/baserecalibrator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • openjdk=8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • fasta reference
  • vcf.gz of known variants

Output:

  • recalibration table for the bam
Notes
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
    if isinstance(known, str):
        known = [known]
    known = list(map("--known-sites {}".format, known))

shell(
    "gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
    "-R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output.recal_table} {known} {log}"
)
GATK BASERECALIBRATORSPARK

Run gatk BaseRecalibratorSpark.

URL:

Example

This wrapper can be used in the following way:

rule gatk_baserecalibratorspark:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
        known="dbsnp.vcf.gz"  # optional known sites
    output:
        recal_table="recal/{sample}.grp"
    log:
        "logs/gatk/baserecalibrator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_v0.87.0="",  # optional
        #spark_extra="", # optional
    resources:
        mem_mb=1024
    threads: 8
    wrapper:
        "v0.87.0/bio/gatk/baserecalibratorspark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • openjdk=8
  • snakemake-wrapper-utils=0.1.3
Input/Output

Input:

  • bam file
  • fasta reference
  • vcf.gz of known variants

Output:

  • recalibration table for the bam
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
  • The extra param allows for additional program arguments for baserecalibratorspark.
  • The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
  • The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
  • The spark_extra param allows for additional spark arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA-
Authors
  • Christopher Schröder
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"

import tempfile

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
    "spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)

tmpdir = tempfile.gettempdir()

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
    known = "--known-sites {}".format(known)

shell(
    "gatk --java-options '{java_opts}' BaseRecalibratorSpark {extra} "
    "-R {snakemake.input.ref} -I {snakemake.input.bam} "
    "--output {snakemake.output.recal_table} {known} "
    "--tmp-dir {tmpdir} "
    "-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
    "{log}"
)
GATK CLEANSAM

Run gatk CleanSam

URL:

Example

This wrapper can be used in the following way:

rule gatk_clean_sam:
    input:
        bam="{sample}.bam"
    output:
        clean="{sample}.clean.bam"
    log:
        "logs/{sample}.log"
    params:
        extra="",
    java_opts="", # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/cleansam"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • SAM/BAM/CRAM file

Output:

  • clean and validates SAM/BAM/CRAM file
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' CleanSam --INPUT {snakemake.input.bam} "
    "{extra} --OUTPUT {snakemake.output.clean} {log}"
)
GATK COMBINEGVCFS

Run gatk CombineGVCFs.

URL:

Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
        ref="genome.fasta"
    output:
        gvcf="calls/all.g.vcf",
    log:
        "logs/gatk/combinegvcfs.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/combinegvcfs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • GVCF files of multiple samples

Output:

  • Combined GVCF
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' CombineGVCFs {extra} "
    "{gvcfs} "
    "-R {snakemake.input.ref} "
    "-O {snakemake.output.gvcf} {log}"
)
GATK ESTIMATELIBRARYCOMPLEXITY

Run gatk EstimateLibraryComplexity

URL:

Example

This wrapper can be used in the following way:

rule gatk_estimate_library_complexity:
    input:
        bam="{sample}.bam"
    output:
        metrics="{sample}.metrics"
    log:
        "logs/{sample}.log"
    params:
        extra="",
    java_opts="", # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/estimatelibrarycomplexity"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • SAM/BAM/CRAM file

Output:

  • metrics file
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' EstimateLibraryComplexity --INPUT {snakemake.input} "
    "{extra} --OUTPUT {snakemake.output.metrics} {log}"
)
GATK FILTERMUTECTCALLS

Run gatk FilterMutectCalls.

URL:

Example

This wrapper can be used in the following way:

rule gatk_filtermutectcalls:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.mutect.filtered.vcf",
    log:
        "logs/gatk/filter/snvs.log",
    params:
        extra="--max-alt-allele-count 3",  # optional arguments, see GATK docs
        java_opts="",  # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/filtermutectcalls"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • vcf file
  • reference genome

Output:

  • filtered vcf file
Notes
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk --java-options '{java_opts}' FilterMutectCalls "
    "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} "
    "-O {snakemake.output.vcf} "
    "{log}"
)
GATK GENOMICSDBIMPORT

Run gatk GenomicsDBImport.

URL:

Example

This wrapper can be used in the following way:

rule genomics_db_import:
    input:
        gvcfs=["calls/a.g.vcf.gz", "calls/b.g.vcf.gz"],
    output:
        db=directory("db"),
    log:
        "logs/gatk/genomicsdbimport.log"
    params:
        intervals="ref",
        db_action="create", # optional
        extra="",  # optional
        java_opts="",  # optional
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/genomicsdbimport"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • GVCF files of multiple samples

Output:

  • A GenomicsDB workspace
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
  • The intervals param is mandatory
  • By default, the wrapper will create a new database (output directory must be empty or non-existent). If you want to update an existing DB, set db_action param to update.
  • The extra param allows for additional program arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/4405451266331-GenomicsDBImport
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

gvcfs = list(map("--variant {}".format, snakemake.input.gvcfs))

db_action = snakemake.params.get("db_action", "create")
if db_action == "create":
    db_action = "--genomicsdb-workspace-path"
elif db_action == "update":
    db_action = "--genomicsdb-update-workspace-path"
else:
    raise ValueError(
        "invalid option provided to 'params.db_action'; please choose either 'create' or 'update'."
    )

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk --java-options '{java_opts}' GenomicsDBImport {extra} "
    "{gvcfs} "
    "--intervals {snakemake.params.intervals} "
    "{db_action} {snakemake.output.db} {log}"
)
GATK GENOTYPEGVCFS

Run gatk GenotypeGVCFs.

URL:

Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcf="calls/all.g.vcf",  # combined gvcf over multiple samples
    # N.B. gvcf or genomicsdb must be specified
    # in the latter case, this is a GenomicsDB data store
        ref="genome.fasta"
    output:
        vcf="calls/all.vcf",
    log:
        "logs/gatk/genotypegvcfs.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/genotypegvcfs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • GVCF files or GenomicsDB workspace
  • reference genome

Output:

  • VCF file with genotypes
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
interval_file = snakemake.input.get("interval_file", "")
if interval_file:
    interval_file = "-L {}".format(interval_file)
dbsnp = snakemake.input.get("known", "")
if dbsnp:
    dbsnp = "-D {}".format(dbsnp)

# Allow for either an input gvcf or GenomicsDB
gvcf = snakemake.input.get("gvcf", "")
genomicsdb = snakemake.input.get("genomicsdb", "")
if gvcf:
    if genomicsdb:
        raise Exception("Only input.gvcf or input.genomicsdb expected, got both.")
    input_string = gvcf
else:
    if genomicsdb:
        input_string = "gendb://{}".format(genomicsdb)
    else:
        raise Exception("Expected input.gvcf or input.genomicsdb.")

tmpdir = tempfile.gettempdir()

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
    "-V {input_string} "
    "-R {snakemake.input.ref} "
    "{dbsnp} "
    "{interval_file} "
    "--tmp-dir {tmpdir} "
    "-O {snakemake.output.vcf} {log}"
)
GATK HAPLOTYPECALLER

Run gatk HaplotypeCaller.

URL:

Example

This wrapper can be used in the following way:

rule haplotype_caller:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        # known="dbsnp.vcf"  # optional
    output:
        gvcf="calls/{sample}.g.vcf",
    #       bam="{sample}.assemb_haplo.bam",
    log:
        "logs/gatk/haplotypecaller/{sample}.log",
    params:
        extra="",  # optional
        java_opts="",  # optional
    threads: 4
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • GVCF file
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

known = snakemake.input.get("known", "")
if known:
    known = "--dbsnp " + str(known)

bam_output = snakemake.output.get("bam", "")
if bam_output:
    bam_output = "--bam-output " + str(bam_output)

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

bams = snakemake.input.bam
if isinstance(bams, str):
    bams = [bams]
bams = list(map("-I {}".format, bams))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
    "--native-pair-hmm-threads {snakemake.threads} "
    "-R {snakemake.input.ref} {bams} "
    "-ERC GVCF {bam_output} "
    "-O {snakemake.output.gvcf} {known} {log}"
)
GATK INTERVALLISTTOBED

Run gatk IntervalListToBed.

URL:

Example

This wrapper can be used in the following way:

rule gatk_interval_list_to_bed:
    input:
        intervals="genome.intervals"
    output:
        bed="genome.bed"
    log:
        "logs/genome.log"
    params:
        extra="",
    java_opts="", # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/intervallisttobed"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • interval list

Output:

  • bed file
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' IntervalListToBed --INPUT {snakemake.input.intervals} "
    "{extra} --OUTPUT {snakemake.output.bed} {log}"
)
GATK MARKDUPLICATESSPARK

Spark implementation of Picard MarkDuplicates that allows the tool to be run in parallel on multiple cores on a local machine or multiple machines on a Spark cluster while still matching the output of the non-Spark Picard version of the tool. Since the tool requires holding all of the readnames in memory while it groups read information, machine configuration and starting sort-order impact tool performance.

URL:

Example

This wrapper can be used in the following way:

rule mark_duplicates_spark:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/dedup/{sample}.log"
    params:
        extra="--remove-sequencing-duplicates",  # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_v0.87.0="",  # optional
        #spark_extra="", # optional
    resources:
        mem_mb=1024
    threads: 8
    wrapper:
        "v0.87.0/bio/gatk/markduplicatesspark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • reference file

Output:

  • bam file with marked or removed duplicates
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param allows for additional program arguments for markduplicatesspark.
  • The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
  • The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
  • The spark_extra param allows for additional spark arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360050814112-MarkDuplicatesSpark
Authors
  • Filipe G. Vieira
Code
__author__ = "Fillipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

import tempfile

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
    "spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)

tmpdir = tempfile.gettempdir()

metrics = snakemake.output.get("metrics", "")
if metrics:
    metrics = f"--metrics-file {metrics}"

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk --java-options '{java_opts}' MarkDuplicatesSpark "
    "{extra} "
    "--input {snakemake.input} "
    "--tmp-dir {tmpdir} "
    "--output {snakemake.output.bam} "
    "{metrics} "
    "-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
    "{log}"
)
GATK MUTECT2

Call somatic SNVs and indels via local assembly of haplotypes

URL:

Example

This wrapper can be used in the following way:

rule mutect2:
    input:
        fasta = "genome/genome.fasta",
        map = "mapped/{sample}.bam"
    output:
        vcf = "variant/{sample}.vcf"
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads:
        1
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "logs/mutect_{sample}.log"
    wrapper:
         "v0.87.0/bio/gatk/mutect"

rule mutect2_bam:
    input:
        fasta = "genome/genome.fasta",
        map = "mapped/{sample}.bam"
    output:
        vcf = "variant_bam/{sample}.vcf",
        bam = "variant_bam/{sample}.bam"
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads:
        1
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "logs/mutect_{sample}.log"
    wrapper:
         "v0.87.0/bio/gatk/mutect"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • Mapped reads (SAM/BAM/CRAM)
  • Reference Fasta file

Output:

  • Variant file
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for GATK4 Mutect2"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam_output = "--bam-output"
if snakemake.output.get("bam", None) is not None:
    bam_output = bam_output + " " + snakemake.output.bam
else:
    bam_output = ""

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

shell(
    "gatk --java-options '{java_opts}' Mutect2 "  # Tool and its subprocess
    "--input {snakemake.input.map} "  # Path to input mapping file
    "{bam_output} "  # Path to output bam file, optional
    "--output {snakemake.output.vcf} "  # Path to output vcf file
    "--reference {snakemake.input.fasta} "  # Path to reference fasta file
    "{extra} "  # Extra parameters
    "{log}"  # Logging behaviour
)
GATK PRINTREADSSPARK

Write reads from SAM format file (SAM/BAM/CRAM) that pass specified criteria to a new file. This is the version that can be run on Spark.

URL:

Example

This wrapper can be used in the following way:

rule gatk_printreadsspark:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        dict="genome.dict",
    output:
        bam="{sample}.bam"
    log:
        "logs/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
        #spark_runner="",  # optional, local by default
        #spark_v0.87.0="",  # optional
        #spark_extra="", # optional
    resources:
        mem_mb=1024
    threads: 8
    wrapper:
        "v0.87.0/bio/gatk/printreadsspark"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4=4.2
  • openjdk=8
  • snakemake-wrapper-utils=0.2
Input/Output

Input:

  • bam file
  • reference file
  • reference dict

Output:

  • filtered bam file
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
  • The extra param allows for additional program arguments for printreadsspark.
  • The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
  • The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
  • The spark_extra param allows for additional spark arguments.
  • For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360057441531-PrintReadsSpark
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"

import tempfile

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
    "spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)

tmpdir = tempfile.gettempdir()

shell(
    "gatk --java-options '{java_opts}' PrintReadsSpark {extra} "
    "--reference {snakemake.input.ref} --input {snakemake.input.bam} "
    "--tmp-dir {tmpdir} "
    "--output {snakemake.output.bam} "
    "-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
    "{log}"
)
GATK SCATTERINTERVALSBYNS

Run gatk ScatterIntervalsByNs.

URL:

Example

This wrapper can be used in the following way:

rule gatk_scatter_interval_by_ns:
    input:
        ref="genome.fasta",
        fai="genome.fasta.fai",
        dict="genome.dict",
    output:
        intervals="genome.intervals"
    log:
        "logs/genome.log"
    params:
        extra="--MAX_TO_MERGE 10 --OUTPUT_TYPE ACGT",
    java_opts="", # optional
    resources:
        mem_mb=1024,
    wrapper:
        "v0.87.0/bio/gatk/scatterintervalsbyns"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • reference genome

Output:

  • interval list
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' ScatterIntervalsByNs --REFERENCE {snakemake.input.ref} "
    "{extra} --OUTPUT {snakemake.output.intervals} {log}"
)
GATK SELECTVARIANTS

Run gatk SelectVariants.

URL:

Example

This wrapper can be used in the following way:

rule gatk_select:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.vcf"
    log:
        "logs/gatk/select/snvs.log"
    params:
        extra="--select-type-to-include SNP",  # optional filter arguments, see GATK docs
        java_opts="", # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/selectvariants"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • vcf file
  • reference genome

Output:

  • filtered vcf file
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} -O {snakemake.output.vcf} {log}"
)
GATK SPLITNCIGARREADS

Run gatk SplitNCigarReads.

URL:

Example

This wrapper can be used in the following way:

rule splitncigarreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
    output:
        "split/{sample}.bam"
    log:
        "logs/gatk/splitNCIGARreads/{sample}.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/splitncigarreads"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • split bam file
Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"

import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SplitNCigarReads {extra} "
    " -R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output} {log}"
)
GATK VARIANTEVAL

Run gatk VariantEval.

URL:

Example

This wrapper can be used in the following way:

rule gatk_varianteval:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
        dict="genome.dict",
        # comp="calls/comp.vcf", # optional comparison VCF
    output:
        vcf="snvs.varianteval.grp"
    log:
        "logs/gatk/varianteval/snvs.log"
    params:
        extra="",  # optional arguments, see GATK docs
        java_opts="", # optional
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/varianteval"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.2.0.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • vcf files
  • BAM/CRAM files (optional)
  • reference genome (optional)
  • reference dictionary (optional)
  • vcf.gz of known variants (optional)
  • PED (pedigree) file (optional)

Output:

  • Evaluation tables detailing the results of the eval modules on VCF file
Notes
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


vcf = snakemake.input.vcf
if isinstance(vcf, str):
    vcf = "--eval  {}".format(vcf)
else:
    vcf = list(map("--eval {}".format, vcf))

bam = snakemake.input.get("bam", "")
if bam:
    if isinstance(bam, str):
        bam = "--input  {}".format(bam)
    else:
        bam = list(map("--input {}".format, bam))

ref = snakemake.input.get("ref", "")
if ref:
    ref = "--reference " + ref

ref_dict = snakemake.input.get("dict", "")
if ref_dict:
    ref_dict = "--sequence-dictionary " + ref_dict

known = snakemake.input.get("known", "")
if known:
    known = "--dbsnp " + known

comp = snakemake.input.get("comp", "")
if comp:
    if isinstance(comp, str):
        comp = "--comparison  {}".format(comp)
    else:
        comp = list(map("--comparison {}".format, comp))

ped = snakemake.input.get("ped", "")
if ped:
    ped = "--pedigree " + ped


log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk --java-options '{java_opts}' VariantEval "
    "{vcf} "
    "{bam} "
    "{ref} "
    "{ref_dict} "
    "{known} "
    "{ped} "
    "{comp} "
    "{extra} --output {snakemake.output[0]} {log}"
)
GATK VARIANTFILTRATION

Run gatk VariantFiltration.

URL:

Example

This wrapper can be used in the following way:

rule gatk_filter:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.filtered.vcf"
    log:
        "logs/gatk/filter/snvs.log"
    params:
        filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
        extra="",  # optional arguments, see GATK docs
        java_opts="", # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/variantfiltration"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • vcf file
  • reference genome

Output:

  • filtered vcf file
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

filters = [
    "--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
    for name, expr in snakemake.params.filters.items()
]

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} {filters} -O {snakemake.output.vcf} {log}"
)
GATK VARIANTRECALIBRATOR

Run gatk VariantRecalibrator.

URL:

Example

This wrapper can be used in the following way:

from snakemake.remote import GS

# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()


def gatk_bundle(f):
    return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))


rule haplotype_caller:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
        # resources have to be given as named input files
        hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
        omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
        g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
        dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
        # use aux to e.g. download other necessary file
        aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
             gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
    output:
        vcf="calls/all.recal.vcf",
        tranches="calls/all.tranches"
    log:
        "logs/gatk/variantrecalibrator.log"
    params:
        mode="SNP",  # set mode, must be either SNP, INDEL or BOTH
        # resource parameter definition. Key must match named input files from above.
        resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
                   "omni":   {"known": False, "training": True, "truth": False, "prior": 12.0},
                   "g1k":   {"known": False, "training": True, "truth": False, "prior": 10.0},
                   "dbsnp":  {"known": True, "training": False, "truth": False, "prior": 2.0}},
        annotation=["QD", "FisherStrand"],  # which fields to use with -an (see VariantRecalibrator docs)
        extra="",  # optional
        java_opts="", # optional
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk4==4.1.4.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • VCF file

Output:

  • .recal file
  • .tranches file
Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


def fmt_res(resname, resparams):
    fmt_bool = lambda b: str(b).lower()
    try:
        f = snakemake.input.get(resname)
    except KeyError:
        raise RuntimeError(
            "There must be a named input file for every resource (missing: {})".format(
                resname
            )
        )
    return "{},known={},training={},truth={},prior={} {}".format(
        resname,
        fmt_bool(resparams["known"]),
        fmt_bool(resparams["training"]),
        fmt_bool(resparams["truth"]),
        resparams["prior"],
        f,
    )


annotation_resources = [
    "--resource:{}".format(fmt_res(resname, resparams))
    for resname, resparams in snakemake.params["resources"].items()
]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
    tranches = "--tranches-file " + snakemake.output.tranches

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantRecalibrator {extra} {annotation_resources} "
    "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "-mode {snakemake.params.mode} "
    "--output {snakemake.output.vcf} "
    "{tranches} {annotation} {log}"
)

GATK3

For gatk3, the following wrappers are available:

GATK3 BASERECALIBRATOR

Run gatk3 BaseRecalibrator.

URL:

Example

This wrapper can be used in the following way:

rule baserecalibrator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        "{sample}.recal_data_table"
    log:
        "logs/gatk3/bqsr/{sample}.log"
    params:
        extra=""  # optional
    resources:
        mem_mb = 1024
    threads: 16
    wrapper:
        "bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk==3.8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • vcf files
  • reference genome

Output:

  • recalibration table
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param allows for additional program arguments.
  • For more information see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + "  --knownSites {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T BaseRecalibrator"
    " -nct {snakemake.threads}"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 INDELREALIGNER

Run gatk3 IndelRealigner

URL:

Example

This wrapper can be used in the following way:

rule indelrealigner:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bai",
        ref="genome.fasta",
        known="dbsnp.vcf.gz",
        known_idx="dbsnp.vcf.gz.tbi",
        target_intervals="{sample}.intervals"
    output:
        bam="realigned/{sample}.bam",
        bai="realigned/{sample}.bai",
        java_temp=temp(directory("/tmp/gatk3_indelrealigner/{sample}")),
    log:
        "logs/gatk3/indelrealigner/{sample}.log"
    params:
        extra=""  # optional
    threads: 16
    resources:
        mem_mb = 1024
    wrapper:
        "bio/gatk/indelrealigner"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk==3.8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • reference genome
  • target intervals to realign
  • bed file (optional)
  • vcf files known variation (optional)

Output:

  • indel realigned bam file
  • indel realigned bai file (optional)
  • temp dir (optional)
Notes
Authors
  • Patrik Smeds
  • Filipe G. Vieira
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


bed = snakemake.input.get("bed", "")
if bed:
    bed = "-L " + bed


known = snakemake.input.get("known", "")
if known:
    if isinstance(known, str):
        known = "-known {}".format(known)
    else:
        known = list(map("-known {}".format, known))


output_bai = snakemake.output.get("bai", None)
if output_bai is None:
    extra += " --disable_bam_indexing"


log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk3 {java_opts} -T IndelRealigner"
    " {extra}"
    " -I {snakemake.input.bam}"
    " -R {snakemake.input.ref}"
    " {known}"
    " {bed}"
    " --targetIntervals {snakemake.input.target_intervals}"
    " -o {snakemake.output.bam}"
    " {log}"
)
GATK3 PRINTREADS

Run gatk3 PrintReads

URL:

Example

This wrapper can be used in the following way:

rule printreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        recal_data="{sample}.recal_data_table"
    output:
        "alignment/{sample}.bqsr.bam"
    log:
        "logs/gatk/bqsr/{sample}..log"
    params:
        extra=""  # optional
    resources:
        mem_mb = 1024
    threads: 16
    wrapper:
        "bio/gatk3/printreads"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk==3.8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • recalibration table
  • reference genome

Output:

  • bam file
Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param allows for additional program arguments.
  • For more information see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

input_bam = snakemake.input.bam
input_recal_data = snakemake.input.recal_data
input_ref = snakemake.input.ref

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T PrintReads"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " -BQSR {input_recal_data}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 REALIGNERTARGETCREATOR

Run gatk3 RealignerTargetCreator

URL:

Example

This wrapper can be used in the following way:

rule realignertargetcreator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz",
    output:
        intervals="{sample}.intervals",
        java_temp=temp(directory("gatk3_indelrealigner/{sample}")),
    log:
        "logs/gatk/realignertargetcreator/{sample}.log",
    params:
        extra="", # optional
    resources:
        mem_mb=1024,
    threads: 16
    wrapper:
        "bio/gatk3/realignertargetcreator"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gatk==3.8
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file
  • reference genome
  • bed file (optional)
  • vcf files known variation (optional)

Output:

  • target intervals
  • temp dir (optional)
Notes
Authors
  • Patrik Smeds
  • Filipe G. Vieira
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


bed = snakemake.input.get("bed", "")
if bed:
    bed = "-L " + bed


known = snakemake.input.get("known", "")
if known:
    if isinstance(known, str):
        known = "-known {}".format(known)
    else:
        known = list(map("-known {}".format, known))


log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk3 {java_opts} -T RealignerTargetCreator"
    " -nt {snakemake.threads}"
    " {extra}"
    " -I {snakemake.input.bam}"
    " -R {snakemake.input.ref}"
    " {known}"
    " {bed}"
    " -o {snakemake.output.intervals}"
    " {log}"
)

GDC-API

For gdc-api, the following wrappers are available:

GDC API-BASED DATA DOWNLOAD OF BAM SLICES

Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.

URL:

Example

This wrapper can be used in the following way:

rule gdc_api_bam_slice_download:
    output:
        bam="raw/{sample}.bam",
    log:
        "logs/gdc-api/bam-slicing/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
        # a gdc_token is required for controlled access and all BAM files
        # on GDC seem to be controlled access (adjust if this changes)
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # provide wanted `region=` or `gencode=` slices joined with `&`
        slices="region=chr22&region=chr5:1000-2000&region=unmapped&gencode=BRCA2",
        # extra command line arguments passed to curl
        extra=""
    wrapper:
        "v0.87.0/bio/gdc-api/bam-slicing"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • curl==7.69.1
Notes
Authors
  • David Lähnemann
Code
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
    raise ValueError(
        "You need to provide a GDC data access token file via the 'token' in 'params'."
    )
token = ""
with open(token_file) as tf:
    token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)

slices = snakemake.params.get("slices", "")
if slices == "":
    raise ValueError(
        "You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s)  via the 'slices' in 'params'."
    )

extra = snakemake.params.get("extra", "")

shell(
    "curl --silent"
    " --header $CURL_HEADER_TOKEN"
    " 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
    " {extra}"
    " --output {snakemake.output.bam} {log}"
)

if os.path.getsize(snakemake.output.bam) < 100000:
    with open(snakemake.output.bam) as f:
        if "error" in f.read():
            shell("cat {snakemake.output.bam} {log}")
            raise RuntimeError(
                "Your GDC API request returned an error, check your log file for the error message."
            )

GDC-CLIENT

For gdc-client, the following wrappers are available:

GDC DATA TRANSFER TOOL DATA DOWNLOAD

Download GDC data files with the gdc-client.

URL:

Example

This wrapper can be used in the following way:

rule gdc_download:
    output:
        # the file extension (up to two components, here .maf.gz), has
        # to uniquely map to one of the files downloaded for that UUID
        "raw/{sample}.maf.gz"
    log:
        "logs/gdc-client/download/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
        # a gdc_token is only required for controlled access samples,
        # leave blank otherwise (`gdc_token=""`) or skip this param entirely
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # for valid extra command line arguments, check command line help or:
        # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
        extra = ""
    threads: 4
    wrapper:
        "v0.87.0/bio/gdc-client/download"

rule gdc_download_bam:
    output:
        # specify all the downloaded files you want to keep, as all other
        # downloaded files will be removed automatically e.g. for
        # BAM data this could be
        "raw/{sample}.bam",
        "raw/{sample}.bam.bai",
        "raw/{sample}.annotations.txt",
        directory("raw/{sample}/logs")
    log:
        "logs/gdc-client/download/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
        # a gdc_token is only required for controlled access samples,
        # leave blank otherwise (`gdc_token=""`) or skip this param entirely
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # for valid extra command line arguments, check command line help or:
        # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
        extra = ""
    threads: 4
    wrapper:
        "v0.87.0/bio/gdc-client/download"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gdc-client==1.5.0
Authors
  • David Lähnemann
Code
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
from tempfile import TemporaryDirectory
import glob

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

extra = snakemake.params.get("extra", "")
token = snakemake.params.get("gdc_token", "")
if token != "":
    token = "--token-file {}".format(token)

with TemporaryDirectory() as tempdir:
    shell(
        "gdc-client download"
        " {token}"
        " {extra}"
        " -n {snakemake.threads} "
        " --log-file {snakemake.log} "
        " --dir {tempdir}"
        " {uuid}"
    )

    for out_path in snakemake.output:
        tmp_path = path.join(tempdir, uuid, path.basename(out_path))
        if not path.exists(tmp_path):
            (root, ext1) = path.splitext(out_path)
            paths = glob.glob(path.join(tempdir, uuid, "*" + ext1))
            if len(paths) > 1:
                (root, ext2) = path.splitext(root)
                paths = glob.glob(path.join(tempdir, uuid, "*" + ext2 + ext1))
            if len(paths) == 0:
                raise ValueError(
                    "{} file extension {} does not match any downloaded file.\n"
                    "Are you sure that UUID {} provides a file of such format?\n".format(
                        out_path, ext1, uuid
                    )
                )
            if len(paths) > 1:
                raise ValueError(
                    "Found more than one downloaded file with extension '{}':\n"
                    "{}\n"
                    "Cannot match requested output file {} unambiguously.\n".format(
                        ext2 + ext1, paths, out_path
                    )
                )
            tmp_path = paths[0]
        shell("mv {tmp_path} {out_path}")

GENOMEPY

Download genomes the easy way: https://github.com/vanheeringen-lab/genomepy

URL:

Example

This wrapper can be used in the following way:

rule genomepy:
    output:
        multiext("{assembly}/{assembly}", ".fa", ".fa.fai", ".fa.sizes", ".gaps.bed",
                 ".annotation.gtf.gz", ".blacklist.bed")
    log:
        "logs/genomepy_{assembly}.log"
    params:
        provider="UCSC"  # optional, defaults to ucsc. Choose from ucsc, ensembl, and ncbi
    cache: True  # mark as eligible for between workflow caching
    wrapper:
        "v0.87.0/bio/genomepy"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconda::genomepy==0.8.3
Params
  • provider: which provider to download from, defaults to UCSC (choose from UCSC, Ensembl, NCBI).
Authors
  • Maarten van der Sande
Code
__author__ = "Maarten van der Sande"
__copyright__ = "Copyright 2020, Maarten van der Sande"
__email__ = "M.vanderSande@science.ru.nl"
__license__ = "MIT"


from snakemake.shell import shell

# Optional parameters
provider = snakemake.params.get("provider", "UCSC")

# set options for plugins
all_plugins = "blacklist,bowtie2,bwa,gmap,hisat2,minimap2,star"
req_plugins = ","
if any(["blacklist" in out for out in snakemake.output]):
    req_plugins = "blacklist,"

annotation = ""
if any(["annotation" in out for out in snakemake.output]):
    annotation = "--annotation"

# parse the genome dir
genome_dir = "./"
if snakemake.output[0].count("/") > 1:
    genome_dir = "/".join(snakemake.output[0].split("/")[:-1])

log = snakemake.log

# Finally execute genomepy
shell(
    """
    # set a trap so we can reset to original user's settings
    active_plugins=$(genomepy config show | grep -Po '(?<=- ).*' | paste -s -d, -) || echo ""
    trap "genomepy plugin disable {{{all_plugins}}} >> {log} 2>&1;\
          genomepy plugin enable {{$active_plugins,}} >> {log} 2>&1" EXIT

    # disable all, then enable the ones we need
    genomepy plugin disable {{{all_plugins}}} >  {log} 2>&1
    genomepy plugin enable  {{{req_plugins}}} >> {log} 2>&1

    # install the genome
    genomepy install {snakemake.wildcards.assembly} \
    {provider} {annotation} -g {genome_dir} >> {log} 2>&1
    """
)

GRIDSS

For gridss, the following wrappers are available:

GRIDSS ASSEMBLE

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. assemble performs GRIDSS breakend assembly. Documentation at: https://github.com/PapenfussLab/gridss

URL:

Example

This wrapper can be used in the following way:

WORKING_DIR = "working_dir"
samples = ["A", "B"]

preprocess_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".idsv_metrics",
    ".insert_size_histogram.pdf",
    ".insert_size_metrics",
    ".mapq_metrics",
    ".sv.bam",
    ".sv.bam.bai",
    ".sv_metrics",
    ".tag_metrics",
    )

assembly_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".downsampled_0.bed",
    ".excluded_0.bed",
    ".idsv_metrics",
    ".mapq_metrics",
    ".quality_distribution.pdf",
    ".quality_distribution_metrics",
    ".subsetCalled_0.bed",
    ".sv.bam",
    ".sv.bam.bai",
    ".tag_metrics",
    )

reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")

rule gridss_assemble:
    input:
        bams=expand("mapped/{sample}.bam", sample=samples),
        bais=expand("mapped/{sample}.bam.bai", sample=samples),
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", *reference_index_endings),
        preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings)
    output:
        assembly="assembly/group.bam",
        assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/assemble/group.log"
    threads:
        100
    wrapper:
        "v0.87.0/bio/gridss/assemble"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gridss==2.9.4
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss assemble"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")

if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s assemble "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "  # Threads
    "--workingdir {snakemake.params.workingdir} "  # Working directory
    "--assembly {snakemake.output.assembly} "  # Assembly output
    "{snakemake.input.bams} "
    "{extra}) {log}"
)
GRIDSS CALL

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. call performs variant calling. Documentation at: https://github.com/PapenfussLab/gridss

URL:

Example

This wrapper can be used in the following way:

WORKING_DIR = "working_dir"
samples = ["A", "B"]

preprocess_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".idsv_metrics",
    ".insert_size_histogram.pdf",
    ".insert_size_metrics",
    ".mapq_metrics",
    ".sv.bam",
    ".sv.bam.bai",
    ".sv_metrics",
    ".tag_metrics",
    )

assembly_endings = (
    ".cigar_metrics",
    ".coverage.blacklist.bed",
    ".downsampled_0.bed",
    ".excluded_0.bed",
    ".idsv_metrics",
    ".mapq_metrics",
    ".quality_distribution.pdf",
    ".quality_distribution_metrics",
    ".subsetCalled_0.bed",
    ".sv.bam",
    ".sv.bam.bai",
    ".tag_metrics",
    )

reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")

rule gridss_call:
    input:
        bams=expand("mapped/{sample}.bam", sample=samples),
        bais=expand("mapped/{sample}.bam.bai", sample=samples),
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", *reference_index_endings),
        preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings),
        assembly="assembly/group.bam",
        assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
    output:
        vcf="vcf/group.vcf",
        idx="vcf/group.vcf.idx",
        tmpidx=temp(WORKING_DIR + "/group.vcf.gridss.working/group.vcf.allocated.vcf.idx") # be aware the group occurs two times here
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/call/group.log"
    threads:
        100
    wrapper:
        "v0.87.0/bio/gridss/call"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gridss==2.9.4
  • cpulimit=0.2
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss call"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(export JAVA_OPTS='-XX:ActiveProcessorCount={snakemake.threads}' & "
    "gridss -s call "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "  # Threads
    "--workingdir {snakemake.params.workingdir} "  # Working directory
    "--assembly {snakemake.input.assembly} "  # Assembly input from gridss assemble
    "--output {snakemake.output.vcf} "  # Assembly vcf
    "{snakemake.input.bams} "
    "{extra}) {log}"
)
GRIDSS PREPROCESS

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess pre-processes input BAM files. Can be run per input file. Documentation at: https://github.com/PapenfussLab/gridss

URL:

Example

This wrapper can be used in the following way:

WORKING_DIR="working_dir"

rule gridss_preprocess:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
    output:
        multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics")
    params:
        extra="--jvmheap 1g",
        workingdir=WORKING_DIR
    log:
        "log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
    threads:
        8
    wrapper:
        "v0.87.0/bio/gridss/preprocess"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gridss==2.9.4
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss preprocess"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
    raise ValueError("Please set params.workingdir to provide a working directory.")

if not snakemake.input.reference:
    raise ValueError("Please set input.reference to provide reference genome.")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s preprocess "  # Tool
    "--reference {reference} "  # Reference
    "--threads {snakemake.threads} "
    "--workingdir {snakemake.params.workingdir} "
    "{snakemake.input.bam} "
    "{extra}) {log}"
)
GRIDSS SETUPREFERENCE

GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. setupreference is a once-off setup generating additional files in the same directory as the reference. WARNING multiple instances of GRIDSS attempting to perform setupreference at the same time will result in file corruption. Make sure these files are generated before running parallel GRIDSS jobs. Documentation at: https://github.com/PapenfussLab/gridss

URL:

Example

This wrapper can be used in the following way:

rule gridss_setupreference:
    input:
        reference="reference/genome.fasta",
        dictionary="reference/genome.dict",
        indices=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
    output:
        multiext("reference/genome.fasta", ".gridsscache", ".img")
    params:
        extra="--jvmheap 1g"
    log:
        "log/gridss/setupreference.log"
    wrapper:
        "v0.87.0/bio/gridss/setupreference"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • gridss==2.9.4
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for gridss setupreference"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
reference = snakemake.input.get("reference", None)

if not snakemake.input.reference:
    raise ValueError("A reference genome has to be provided!")

for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
    if not path.exists("{}{}".format(reference, ending)):
        raise ValueError(
            "{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
                reference=reference, ending=ending
            )
        )

dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
    raise ValueError(
        "{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
            dictionary=dictionary
        )
    )

shell(
    "(gridss -s setupreference "  # Tool
    "--reference {reference} "  # Reference
    "{extra}) {log}"
)

HAP.PY

For hap.py, the following wrappers are available:

HAP.PY

Comparison of vcf files and calculating performance metrics following GA4GH defined best practices for benchmarking small variant call sets (Krusche, P. et al. 2019, https://doi.org/10.1038/s41587-019-0054-x). Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).

URL:

Example

This wrapper can be used in the following way:

rule benchmark_variants:
    input:
        truth="truth.vcf",
        query="query.vcf",
        truth_regions="truth.bed",
        strats="stratifications.tsv",
        strat_dir="strats_dir",
        genome="genome.fasta",
        genome_index="genome.fasta.fai"
    output:
        multiext("results",".runinfo.json",".vcf.gz",".summary.csv",
                ".extended.csv",".metrics.json.gz",".roc.all.csv.gz",
                ".roc.Locations.INDEL.csv.gz",".roc.Locations.INDEL.PASS.csv.gz",
                ".roc.Locations.SNP.csv.gz",".roc.tsv")
    params:
        engine="vcfeval",
        prefix=lambda wc, input, output: output[0].split('.')[0],
        ## parameters such as -L to left-align variants
        extra="--verbose"
    log: "happy.log"
    threads: 2
    wrapper: "v0.87.0/bio/hap.py/hap.py"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hap.py==0.3.14
  • rtg-tools==3.10.1
Authors
  • Nathan D. Olson
Code
__author__ = "Nathan Olson"
__copyright__ = "This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply "
__email__ = "nolson@nist.gov"
__license__ = """
This software was developed by employees of the National Institute of Standards and Technology (NIST),
an agency of the Federal Government and is being made available as a public service. Pursuant to title
17 United States Code Section 105, works of NIST employees are not subject to copyright protection in
the United States.  This software may be subject to foreign copyright.  Permission in the United States
and in foreign countries, to the extent that NIST may hold copyright, to use, copy, modify, create
derivative works, and distribute this software and its documentation without fee is hereby granted on
a non-exclusive basis, provided that this notice and disclaimer of warranty appears in all copies.

THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY,
INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY
WARRANTY THAT THE DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE SOFTWARE WILL BE
ERROR FREE.  IN NO EVENT SHALL NIST BE LIABLE FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT,
INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY CONNECTED WITH
THIS SOFTWARE, WHETHER OR NOT BASED UPON WARRANTY, CONTRACT, TORT, OR OTHERWISE, WHETHER OR NOT INJURY WAS
SUSTAINED BY PERSONS OR PROPERTY OR OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED FROM, OR AROSE OUT OF
THE RESULTS OF, OR USE OF, THE SOFTWARE OR SERVICES PROVIDED HEREUNDER.
"""

from os import path

from snakemake.shell import shell

# Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Optional parameters
engine = snakemake.params.get("engine", "")
if engine:
    engine = "--engine {}".format(engine)

truth_regions = snakemake.input.get("truth_regions", "")
if truth_regions:
    truth_regions = "-f {}".format(truth_regions)

strats = snakemake.input.get("strats", "")
if strats:
    strats = "--stratification {}".format(strats)


shell(
    "(hap.py"
    " --threads {snakemake.threads}"
    " {engine}"
    " -r {snakemake.input.genome}"
    " {extra}"
    " {truth_regions}"
    " {strats}"
    " -o {snakemake.params.prefix}"
    " {snakemake.input.truth}"
    " {snakemake.input.query})"
    " {log}"
)
PRE.PY

Preprocessing/normalisation of vcf/bcf files. Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).

URL:

Example

This wrapper can be used in the following way:

rule preprocess_variants:
    input:
        ##vcf/bcf
        variants="variants.vcf"
    output:
        "normalized/variants.vcf"
    params:
        ## path to reference genome
        genome="genome.fasta",
        ## parameters such as -L to left-align variants
        extra="-L"
    threads: 2
    wrapper:
        "v0.87.0/bio/hap.py/pre.py"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hap.py=0.3.14
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "(pre.py"
    " --threads {snakemake.threads}"
    " -r {snakemake.params.genome}"
    " {extra}"
    " {snakemake.input.variants}"
    " {snakemake.output})"
    " {log}"
)

HISAT2

For hisat2, the following wrappers are available:

HISAT2 ALIGN

Map reads with hisat2.

URL:

Example

This wrapper can be used in the following way:

rule hisat2_align:
    input:
      reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"]
    output:
      "mapped/{sample}.bam"
    log:
        "logs/hisat2_align_{sample}.log"
    params:
      extra="",
      idx="index/",
    threads: 2
    wrapper:
      "v0.87.0/bio/hisat2/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hisat2==2.1.0
  • samtools==1.9
Input/Output

Input:

  • reads: either 1 or 2 FASTQ files with reads

Output:

  • bam file with mapped reads
Params
  • idx: prefix of index file path (required)
  • extra: additional parameters
Notes
  • The -S flag must not be used since output is already directly piped to samtools for compression.
  • The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
  • The wrapper does not yet handle SRA input accessions.
  • No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()

# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
    input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
    input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
    input_flags = "-1 {0} -2 {1}".format(*reads)
else:
    raise RuntimeError(
        "Reads parameter must contain at least 1 and at most 2" " input files."
    )

# Executed shell command
shell(
    "(hisat2 {extra} "
    "--threads {snakemake.threads} "
    " -x {snakemake.params.idx} {input_flags} "
    " | samtools view -Sbh -o {snakemake.output[0]} -) "
    " {log}"
)
HISAT2 INDEX

Create index with hisat2.

URL:

Example

This wrapper can be used in the following way:

rule hisat2_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("index_{genome}")
    params:
        prefix = "index_{genome}/"
    log:
        "logs/hisat2_index_{genome}.log"
    threads: 2
    wrapper:
        "v0.87.0/bio/hisat2/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hisat2==2.1.0
  • samtools==1.9
Input/Output

Input:

  • sequence: list of FASTA files of list of sequences

Output:

  • Directory of the hisat2 custom index.
Params
  • prefix: prefix of index file path (required). Must be related to output
  • extra: additional parameters
Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for HISAT2 index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
    input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta

hisat_dir = snakemake.params.get("prefix", "")
if hisat_dir:
    os.makedirs(hisat_dir)

shell(
    "hisat2-build {extra} "
    "-p {snakemake.threads} "
    "{input_seq} "
    "{snakemake.params.prefix} "
    "{log}"
)

HMMER

For hmmer, the following wrappers are available:

HMMBUILD

hmmbuild: construct profile HMM(s) from multiple sequence alignment(s)

URL:

Example

This wrapper can be used in the following way:

rule hmmbuild_profile:
    input:
        "test-profile.sto"
    output:
        "test-profile.hmm"
    log:
        "logs/test-profile-hmmbuild.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "v0.87.0/bio/hmmer/hmmbuild"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hmmer=3.2.1
Input/Output

Input:

  • sequence alignment file

Output:

  • profile hmm
Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmbuild"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmbuild {extra} --cpu {snakemake.threads} "
    " {snakemake.output} {snakemake.input} {log} "
)
HMMPRESS

Format an HMM database into a binary format for hmmscan.

URL:

Example

This wrapper can be used in the following way:

rule hmmpress_profile:
    input:
        "test-profile.hmm"
    output:
        "test-profile.hmm.h3f",
        "test-profile.hmm.h3i",
        "test-profile.hmm.h3m",
        "test-profile.hmm.h3p"
    log:
        "logs/hmmpress.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "v0.87.0/bio/hmmer/hmmpress"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hmmer=3.2.1
Input/Output

Input:

  • hmm database

Output:

  • binary format hmm database (for hmmscan)
Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# -f Force; overwrites any previous hmmpress-ed datafiles. The default is to bitch about any existing files and ask you to delete them first.

shell("hmmpress -f {snakemake.input} {log}")
HMMSCAN

search protein sequence(s) against a protein profile database

URL:

Example

This wrapper can be used in the following way:

rule hmmscan_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        pfamtblout="test-prot-pfamtbl.txt", # save table of hits and domains to file, in Pfam format <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmscan.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmscan will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "v0.87.0/bio/hmmer/hmmscan"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hmmer=3.2.1
Input/Output

Input:

  • protein sequence file (fasta)
  • database hmm files

Output:

  • matches to hmm files
Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# save table of hits and domains to file, in Pfam format <f>
pfamtblout = snakemake.output.get("pfamtblout", "")
if pfamtblout:
    out_cmd += " --pfamtblout {} ".format(pfamtblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "hmmscan {out_cmd} {thresh_cmd} --cpu {snakemake.threads}"
    " {extra} {profile} {snakemake.input.fasta} {log}"
)
HMMSEARCH

search profile(s) against a sequence database

URL:

Example

This wrapper can be used in the following way:

rule hmmsearch_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        alignment_hits="test-prot-alignment-hits.txt", # Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmsearch.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmsearch will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "v0.87.0/bio/hmmer/hmmsearch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • hmmer=3.2.1
Input/Output

Input:

  • hmm profile(s)
  • sequence database

Output:

  • matches between sequences and hmm profiles
Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmsearch"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
alignment_hits = snakemake.output.get("alignment_hits", "")
if alignment_hits:
    out_cmd += " -A {} ".format(alignment_hits)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmsearch --cpu {snakemake.threads} "
    " {out_cmd} {thresh_cmd} {extra} {profile} "
    " {snakemake.input.fasta} {log}"
)

HOMER

For homer, the following wrappers are available:

HOMER ANNOTATEPEAKS

Performing peak annotation to associate peaks with nearby genes. For more information, please see the documentation.

URL:

Example

This wrapper can be used in the following way:

rule homer_annotatepeaks:
    input:
        peaks="peaks_refs/{sample}.peaks",
        genome="peaks_refs/gene.fasta",
        # optional input files
        # gtf="", # implicitly sets the -gtf flag
        # gene="", # implicitly sets the -gene flag for gene data file to add gene expression or other data types
        motif_files="peaks_refs/motives.txt", # implicitly sets the -m flag
        # filter_motiv="", # implicitly sets the -fm flag
        # center="",  # implicitly sets the -center flag
        nearest_peak="peaks_refs/b.peaks", # implicitly sets the -p flag
        # tag="",  # implicitly sets the -d flag for tagDirectories
        # vcf="", # implicitly sets the -vcf flag
        # bed_graph="", # implicitly sets the -bedGraph flag
        # wig="", # implicitly sets the -wig flag
        # map="", # implicitly sets the -map flag
        # cmp_genome="", # implicitly sets the -cmpGenome flag
        # cmp_Liftover="", # implicitly sets the -cmpLiftover flag
        # advanced_annotation=""  # optional, implicitly sets the -ann flag, see http://homer.ucsd.edu/homer/ngs/advancedAnnotation.html
    output:
        annotations="{sample}_annot.txt",
        # optional output, implicitly sets the -matrix flag, requires motif_files as input
        matrix=multiext("{sample}",
                        ".count.matrix.txt",
                        ".ratio.matrix.txt",
                        ".logPvalue.matrix.txt",
                        ".stats.txt"
                        ),
        # optional output, implicitly sets the -mfasta flag, requires motif_files as input
        mfasta="{sample}_motif.fasta",
        # # optional output, implicitly sets the -mbed flag, requires motif_files as input
        mbed="{sample}_motif.bed",
        # # optional output, implicitly sets the -mlogic flag, requires motif_files as input
        mlogic="{sample}_motif.logic"
    threads:
        2
    params:
        mode="", # add tss, tts or rna mode and options here, i.e. "tss mm8"
        extra="-gid"  # optional params, see http://homer.ucsd.edu/homer/ngs/annotation.html
    log:
        "logs/annotatePeaks/{sample}.log"
    wrapper:
        "v0.87.0/bio/homer/annotatePeaks"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • homer==4.11
Input/Output

Input:

  • peak or BED file
  • various optional input files, i.e. gtf, bedGraph, wiggle

Output:

  • annotation file (.txt)
  • various optional output files
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

genome = snakemake.input.get("genome", "")
extra = snakemake.params.get("extra", "")
motif_files = snakemake.input.get("motif_files", "")
matrix = snakemake.output.get("matrix", "")

if genome == "":
    genome = "none"

# optional files
opt_files = {
    "gtf": "-gtf",
    "gene": "-gene",
    "motif_files": "-m",
    "filter_motiv": "-fm",
    "center": "-center",
    "nearest_peak": "-p",
    "tag": "-d",
    "vcf": "-vcf",
    "bed_graph": "-bedGraph",
    "wig": "-wig",
    "map": "-map",
    "cmp_genome": "-cmpGenome",
    "cmp_Liftover": "-cmpLiftover",
    "advanced_annotation": "-ann",
    "mfasta": "-mfasta",
    "mbed": "-mbed",
    "mlogic": "-mlogic",
}

requires_motives = False
for i in opt_files:
    file = None
    if i == "mfasta" or i == "mbed" or i == "mlogic":
        file = snakemake.output.get(i, "")
        if file:
            requires_motives = True
    else:
        file = snakemake.input.get(i, "")
    if file:
        extra += " {flag} {file}".format(flag=opt_files[i], file=file)

if requires_motives and motif_files == "":
    sys.exit(
        "The optional output files require motif_file(s) as input. For more information please see http://homer.ucsd.edu/homer/ngs/annotation.html."
    )

# optional matrix output files:
if matrix:
    if motif_files == "":
        sys.exit(
            "The matrix output files require motif_file(s) as input. For more information please see http://homer.ucsd.edu/homer/ngs/annotation.html."
        )
    ext = ".count.matrix.txt"
    matrix_out = [i for i in snakemake.output if i.endswith(ext)][0]
    matrix_name = os.path.basename(matrix_out[: -len(ext)])
    extra += " -matrix {}".format(matrix_name)

shell(
    "(annotatePeaks.pl"
    " {snakemake.params.mode}"
    " {snakemake.input.peaks}"
    " {genome}"
    " {extra}"
    " -cpu {snakemake.threads}"
    " > {snakemake.output.annotations})"
    " {log}"
)
HOMER FINDPEAKS

Find ChIP- or ATAC-Seq peaks with the HOMER suite. For more information, please see the documentation.

URL:

Example

This wrapper can be used in the following way:

rule homer_findPeaks:
    input:
        # tagDirectory of sample
        tag="tagDir/{sample}",
        # tagDirectory of control background sample - optional
        control="tagDir/control"
    output:
        "{sample}_peaks.txt"
    params:
        # one of 7 basic modes of operation, see homer manual
        style="histone",
        extra=""  # optional params, see homer manual
    log:
        "logs/findPeaks/{sample}.log"
    wrapper:
        "v0.87.0/bio/homer/findPeaks"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • homer==4.11
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

control = snakemake.input.get("control", "")
if control == "":
    control_command = ""
else:
    control_command = "-i " + control

shell(
    "(findPeaks"
    " {snakemake.input.tag}"
    " -style {snakemake.params.style}"
    " {extra}"
    " {control_command}"
    " -o {snakemake.output})"
    " {log}"
)
HOMER GETDIFFERENTIALPEAKS

Detect differentially bound ChIP peaks between samples. For more information, please see the documentation.

URL:

Example

This wrapper can be used in the following way:

rule homer_getDifferentialPeaks:
    input:
        # peak/bed file to be tested
        peaks="{sample}.peaks.bed",
        # tagDirectory of first sample
        first="tagDir/{sample}",
        # tagDirectory of sample to compare
        second="tagDir/second"
    output:
        "{sample}_diffPeaks.txt"
    params:
        extra=""  # optional params, see homer manual
    log:
        "logs/diffPeaks/{sample}.log"
    wrapper:
        "v0.87.0/bio/homer/getDifferentialPeaks"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • homer==4.11
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(getDifferentialPeaks"
    " {snakemake.input.peaks}"
    " {snakemake.input.first}"
    " {snakemake.input.second}"
    " {extra}"
    " > {snakemake.output})"
    " {log}"
)
HOMER MAKETAGDIRECTORY

Create a tag directory with the HOMER suite. For more information, please see the documentation.

URL:

Example

This wrapper can be used in the following way:

rule homer_makeTagDir:
    input:
        # input bam, can be one or a list of files
        bam="{sample}.bam",
    output:
        directory("tagDir/{sample}")
    params:
        extra=""  # optional params, see homer manual
    log:
        "logs/makeTagDir/{sample}.log"
    wrapper:
        "v0.87.0/bio/homer/makeTagDirectory"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • homer==4.11
  • samtools==1.10
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(makeTagDirectory" " {snakemake.output}" " {extra}" " {snakemake.input})" " {log}"
)
HOMER MERGEPEAKS

Merge ChIP-Seq peaks from multiple peak files. For more information, please see the documentation. Please be aware that this wrapper does not yet support use of the -prefix parameter.

URL:

Example

This wrapper can be used in the following way:

rule homer_mergePeaks:
    input:
        # input peak files
        "peaks/{sample1}.peaks",
        "peaks/{sample2}.peaks"
    output:
        "merged/{sample1}_{sample2}.peaks"
    params:
        extra="-d given"  # optional params, see homer manual
    log:
        "logs/mergePeaks/{sample1}_{sample2}.log"
    wrapper:
        "v0.87.0/bio/homer/mergePeaks"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • homer==4.11
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


class PrefixNotSupportedError(Exception):
    pass


if "-prefix" in extra:
    raise PrefixNotSupportedError(
        "The use of the -prefix parameter is not yet supported in this wrapper"
    )

shell("(mergePeaks" " {snakemake.input}" " {extra}" " > {snakemake.output})" " {log}")

IGV-REPORTS

Create self-contained igv.js HTML pages.

URL:

Example

This wrapper can be used in the following way:

rule igv_report:
    input:
        fasta="minigenome.fa",
        vcf="variants.vcf",
        # any number of additional optional tracks, see igv-reports manual
        tracks=["alignments.bam"]
    output:
        "igv-report.html"
    params:
        extra=""  # optional params, see igv-reports manual
    log:
        "logs/igv-report.log"
    wrapper:
        "v0.87.0/bio/igv-reports"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • igv-reports=1.0
Input/Output

Input:

  • BAM, VCF, …

Output:

  • HTML
Authors
  • Johannes Köster
Code
"""Snakemake wrapper for igv-reports."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

tracks = snakemake.input.get("tracks", [])
if tracks:
    if isinstance(tracks, str):
        tracks = [tracks]
    tracks = "--tracks {}".format(" ".join(tracks))

shell(
    "create_report {extra} --standalone --output {snakemake.output[0]} {snakemake.input.vcf} {snakemake.input.fasta} {tracks} {log}"
)

INFERNAL

For infernal, the following wrappers are available:

INFERNAL CMPRESS

Starting from a CM database <cmfile> in standard Infernal-1.1 format, construct binary compressed datafiles for cmscan. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

URL:

Example

This wrapper can be used in the following way:

rule infernal_cmpress:
    input:
        "test-covariance-model.cm"
    output:
        "test-covariance-model.cm.i1i",
        "test-covariance-model.cm.i1f",
        "test-covariance-model.cm.i1m",
        "test-covariance-model.cm.i1p"
    log:
        "logs/cmpress.log"
    params:
        extra="",
    wrapper:
        "v0.87.0/bio/infernal/cmpress"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • infernal=1.1.2
Input/Output

Input:

  • RNA covariance models (CMs)

Output:

  • CMs prepared for use with cmscan
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# -F enables overwrite of old (otherwise cmpress will fail if old versions exist)
shell("cmpress -F {snakemake.input} {log}")
INFERNAL CMSCAN

cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

URL:

Example

This wrapper can be used in the following way:

rule cmscan_profile:
    input:
        fasta="test-transcript.fa",
        profile="test-covariance-model.cm.i1i"
    output:
        tblout="tr-infernal-tblout.txt",
    log:
        "logs/cmscan.log"
    params:
        evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
        extra= "",
        #score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
    threads: 4
    wrapper:
        "v0.87.0/bio/infernal/cmscan"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • infernal=1.1.2
Input/Output

Input:

  • sequence file
  • RNA covariance models (CMs)

Output:

  • rna alignments
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]

assert profile.endswith(".cm"), 'your profile file should end with ".cm"'

# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)

# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10)  # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = f" -T {float(score_threshold)} "
else:
    thresh_cmd = f" -E {float(evalue_threshold)} "

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)

JANNOVAR

Annotate predicted effect of nucleotide changes with `Jannovar<https://doc-openbio.readthedocs.io/projects/jannovar/en/master/>`_

URL:

Example

This wrapper can be used in the following way:

rule jannovar:
    input:
        vcf="{sample}.vcf",
        pedigree="pedigree_ar.ped" # optional, contains familial relationships
    output:
        "jannovar/{sample}.vcf.gz"
    log:
        "logs/jannovar/{sample}.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mg}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_gb = 1
    params:
        database="hg19_small.ser", # path to jannovar reference dataset
        extra="--show-all"         # optional parameters
    wrapper:
        "v0.87.0/bio/jannovar"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • jannovar-cli==0.31
  • snakemake-wrapper-utils==0.1.3
Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

pedigree = snakemake.input.get("pedigree", "")
if pedigree:
    pedigree = '--pedigree-file "%s"' % pedigree

shell(
    "jannovar annotate-vcf --database {snakemake.params.database}"
    " --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
    " {pedigree} {extra} {java_opts} {log}"
)

JELLYFISH

For jellyfish, the following wrappers are available:

JELLYFISH_COUNT

Count k-mers in a fastn file using jellyfish.

URL: https://github.com/gmarcais/Jellyfish

Example

This wrapper can be used in the following way:

rule jellyfish_count:
    input:
        "{prefix}.fasta",
    output:
        "{prefix}.jf",
    log:
        "{prefix}.jf.log",
    params:
        kmer_length=21,
        size="1G",
        extra="--canonical",
    threads: 2
    wrapper:
        "v0.87.0/bio/jellyfish/count"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kmer-jellyfish==2.3
Input/Output

Input:

  • sequence FASTA file

Output:

  • kmer count jf file
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    """
    (jellyfish count \
        {extra} \
        --mer-len={snakemake.params.kmer_length} \
        --size={snakemake.params.size} \
        --threads={snakemake.threads} \
        --output={snakemake.output} \
        {snakemake.input}) {log}
    """
)
JELLYFISH_DUMP

Dump kmers from jellyfish database

URL: https://github.com/gmarcais/Jellyfish

Example

This wrapper can be used in the following way:

rule jellyfish_dump:
    input:
        "{prefix}.jf",
    output:
        "{prefix}.dump",
    log:
        "{prefix}.log",
    params:
        extra="-c -t",
    wrapper:
        "v0.87.0/bio/jellyfish/dump"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kmer-jellyfish==2.3
Input/Output

Input:

  • kmer count jf file

Output:

  • dump of kmer counts
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell("(jellyfish dump {extra} -o {snakemake.output} {snakemake.input}) {log}")
JELLYFISH_HISTO

Export histogram of kmer counts.

URL: https://github.com/gmarcais/Jellyfish

Example

This wrapper can be used in the following way:

rule jellyfish_histo:
    input:
        "{prefix}.jf",
    output:
        "{prefix}.histo",
    log:
        "{prefix}.log",
    threads: 2
    wrapper:
        "v0.87.0/bio/jellyfish/histo"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kmer-jellyfish==2.3
Input/Output

Input:

  • kmer count jf file

Output:

  • kmer histogram file
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell(
    """
    (jellyfish histo \
        {extra} \
        --threads={snakemake.threads} \
        {snakemake.input} > {snakemake.output}) {log}
    """
)
JELLYFISH_MERGE

Merge jellyfish databases.

URL: https://github.com/gmarcais/Jellyfish

Example

This wrapper can be used in the following way:

rule jellyfish_merge:
    input:
        "a.jf",
        "b.jf",
    output:
        "ab.jf",
    log:
        "ab.jf.log",
    wrapper:
        "v0.87.0/bio/jellyfish/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kmer-jellyfish==2.3
Input/Output

Input:

  • multiple jf kmer count files

Output:

  • merged jf kmer count file
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell("(jellyfish merge {extra} -o {snakemake.output} {snakemake.input}) {log}")

KALLISTO

For kallisto, the following wrappers are available:

KALLISTO INDEX

Index a transcriptome using kallisto.

URL:

Example

This wrapper can be used in the following way:

rule kallisto_index:
    input:
        fasta = "{transcriptome}.fasta"
    output:
        index = "{transcriptome}.idx"
    params:
        extra = "--kmer-size=5"
    log:
        "logs/kallisto_index_{transcriptome}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/kallisto/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kallisto==0.45.0
Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file is required"
fasta = " ".join(fasta) if isinstance(fasta, list) else fasta

shell(
    "kallisto index "  # Tool
    "{extra} "  # Optional parameters
    "--index={snakemake.output.index} "  # Output file
    "{fasta} "  # Input FASTA files
    "{log}"  # Logging
)
KALLISTO QUANT

Pseudoalign reads and quantify transcripts using kallisto.

URL:

Example

This wrapper can be used in the following way:

rule kallisto_quant:
    input:
        fastq = ["reads/{exp}_R1.fastq", "reads/{exp}_R2.fastq"],
        index = "index/transcriptome.idx"
    output:
        directory('quant_results_{exp}')
    params:
        extra = ""
    log:
        "logs/kallisto_quant_{exp}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/kallisto/quant"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • kallisto==0.45.0
Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto quant"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTQ files
fastq = snakemake.input.get("fastq")
assert fastq is not None, "input-> a FASTQ-file is required"
fastq = " ".join(fastq) if isinstance(fastq, list) else fastq

shell(
    "kallisto quant "  # Tool
    "{extra} "  # Optional parameters
    "--threads={snakemake.threads} "  # Number of threads
    "--index={snakemake.input.index} "  # Input file
    "--output-dir={snakemake.output} "  # Output directory
    "{fastq} "  # Input FASTQ files
    "{log}"  # Logging
)

LAST

For last, the following wrappers are available:

LASTAL

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

URL:

Example

This wrapper can be used in the following way:

rule lastal_nucl_x_nucl:
    input:
        data="test-transcript.fa",
        lastdb="test-transcript.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-transcript.maf",
        #tab="test-transcript.tab",
        #blasttab="test-transcript.blasttab",
        #blasttabplus="test-transcript.blasttabplus",
    params:
        #Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
        D_length=1000000,
        extra=""
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/last/lastal"

rule lastal_nucl_x_prot:
    input:
        data="test-transcript.fa",
        lastdb="test-protein.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-tr-x-prot.maf"
        #tab="test-tr-x-prot.tab",
        #blasttab="test-tr-x-prot.blasttab",
        #blasttabplus="test-tr-x-prot.blasttabplus",
    params:
        frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
        extra="",
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/last/lastal"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • last=874
Authors
    1. Tessa Pierce
Code
""" Snakemake wrapper for lastal """

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000))  # last default

# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
    list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"

out_cmd = ""

if maf_out:
    out_cmd = "-f {}".format("MAF")
    outF = maf_out
elif tab_out:
    out_cmd = "-f {}".format("TAB")
    outF = tab_out
if btab_out:
    out_cmd = "-f {}".format("BlastTab")
    outF = btab_out
if btabplus_out:
    out_cmd = "-f {}".format("BlastTab+")
    outF = btabplus_out

frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
    f_cmd = f"-F {frameshift_cost}"


lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]

shell(
    "lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)
LASTDB

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

URL:

Example

This wrapper can be used in the following way:

rule lastdb_transcript:
    input:
        "test-transcript.fa"
    output:
        "test-transcript.fa.prj",
    params:
        protein_input=False,
        extra=""
    log:
        "logs/lastdb/test-transcript.log"
    wrapper:
        "v0.87.0/bio/last/lastdb"

rule lastdb_protein:
    input:
        "test-protein.fa"
    output:
        "test-protein.fa.prj",
    params:
        protein_input=True,
        extra=""
    log:
        "logs/lastdb/test-protein.log"
    wrapper:
        "v0.87.0/bio/last/lastdb"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • last=874
Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

protein_cmd = ""
protein = snakemake.params.get("protein_input", False)

if protein:
    protein_cmd = " -p "

shell("lastdb {extra} {protein_cmd} -P {snakemake.threads} {snakemake.input} {log}")

LIFTOFF

Lift features from one genome assembly to another (https://github.com/agshumate/Liftoff)

URL:

Example

This wrapper can be used in the following way:

rule liftoff:
    input:
        ref="{ref}.fasta.gz",
        tgt="{tgt}.fasta.gz",
        ann="{ann}.gff.gz",
    output:
        main="{ref}_{ann}_{tgt}.gff3",
        unmapped="{ref}_{ann}_{tgt}.unmapped.txt",
    message:
        "Testing liftoff"
    threads: 1
    params:
        extra="",
    log:
        "logs/liftoff_{ref}_{ann}_{tgt}.log",
    wrapper:
        "v0.87.0/bio/liftoff"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • liftoff=1.6
Input/Output

Input:

  • A fasta formatted reference genome file
  • A fasta formatted target genome file
  • A GFF/GTF formatted annotations file

Output:

  • A GFF formatted file containing the mapped annotations
  • A GFF formatted file containing the unmapped annotations
Authors
  • Tomás Di Domenico
Code
"""Snakemake wrapper for liftoff"""

__author__ = "Tomás Di Domenico"
__copyright__ = "Copyright 2021, Tomás Di Domenico"
__email__ = "tdido@tdido.ar"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "liftoff "  # tool
    "-g {snakemake.input.ann} "  # annotation file to lift over in GFF or GTF format
    "-o {snakemake.output.main} "  # main output
    "-u {snakemake.output.unmapped} "  # unmapped output
    "{extra} "  # optional parameters
    "{snakemake.input.tgt} "  # target fasta genome to lift genes to
    "{snakemake.input.ref} "  # reference fasta genome to lift genes from
    "{log}"  # Logging
)

LOFREQ

For lofreq, the following wrappers are available:

LOFREQ CALL

simply call variants

URL:

Example

This wrapper can be used in the following way:

rule lofreq:
    input:
        bam="data/{sample}.bam",
        bai="data/{sample}.bai"
    output:
        "calls/{sample}.vcf"
    log:
        "logs/lofreq_call/{sample}.log"
    params:
        ref="data/genome.fasta",
        extra=""
    threads: 8
    wrapper:
        "v0.87.0/bio/lofreq/call"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.6
  • lofreq==2.1.3.1
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)

if ref is None:
    raise ValueError("A reference must be provided")

bam_input = snakemake.input.bam
bai_input = snakemake.input.bai

if bam_input is None:
    raise ValueError("Missing bam input file!")

if bai_input is None:
    raise ValueError("Missing bai input file!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

shell(
    "lofreq call-parallel "
    " --pp-threads {snakemake.threads}"
    " -f {ref}"
    " {bam_input}"
    " -o {output_file}"
    " {extra}"
    " {log}"
)

MACS2

For macs2, the following wrappers are available:

MACS2 CALLPEAK

MACS2 callpeak model-based analysis tool for ChIP-sequencing that calls peaks from alignment results. For usage information about MACS2 callpeak, please see the documentation and the command line help. For more information about MACS2, also see the source code and published article. Depending on the selected extension(s), the option(s) will be set automatically (please see table below). Please note that there are extensions, that are incompatible with each other, because they require the –broad option either to be enabled or disabled.

Extension for the output files Description Format Option
NAME_peaks.xls

a table with information about called

peaks

excel  
NAME_control_lambda.bdg

local biases estimated for each genomic

location from the control sample

bedGraph –bdg or -B
NAME_treat_pileup.bdg pileup signals from treatment sample bedGraph –bdg or -B
NAME_peaks.broadPeak

similar to _peaks.narrowPeak file,

except for missing the annotating peak

summits

BED 6+3 –broad
NAME_peaks.gappedPeak

contains the broad region and narrow

peaks

BED 12+3 –broad
NAME_peaks.narrowPeak

contains the peak locations, peak

summit, p-value and q-value

BED 6+4 if not set –broad
NAME_summits.bed peak summits locations for every peak BED if not set –broad

URL:

Example

This wrapper can be used in the following way:

rule callpeak:
    input:
        treatment="samples/a.bam",   # required: treatment sample(s)
        control="samples/b.bam"      # optional: control sample(s)
    output:
        # all output-files must share the same basename and only differ by it's extension
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
        multiext("callpeak/basename",
                 "_peaks.xls",   ### required
                 ### optional output files
                 "_peaks.narrowPeak",
                 "_summits.bed"
                 )
    log:
        "logs/macs2/callpeak.log"
    params:
        "-f BAM -g hs --nomodel"
    wrapper:
        "v0.87.0/bio/macs2/callpeak"

rule callpeak_options:
    input:
        treatment="samples/a.bam",   # required: treatment sample(s)
        control="samples/b.bam"      # optional: control sample(s)
    output:
        # all output-files must share the same basename and only differ by it's extension
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
        multiext("callpeak_options/basename",
                 "_peaks.xls",   ### required
                 ### optional output files
                 # these output extensions internally set the --bdg or -B option:
                 "_treat_pileup.bdg",
                 "_control_lambda.bdg",
                 # these output extensions internally set the --broad option:
                 "_peaks.broadPeak",
                 "_peaks.gappedPeak"
                 )
    log:
        "logs/macs2/callpeak.log"
    params:
        "-f BAM -g hs --broad-cutoff 0.1 --nomodel"
    wrapper:
        "v0.87.0/bio/macs2/callpeak"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • macs2>=2.2
Input/Output

Input:

  • SAM, BAM, BED, ELAND, ELANDMULTI, ELANDEXPORT, BOWTIE, BAMPE or BEDPE files

Output:

  • tabular file in excel format (.xls) AND
  • different optional metrics in bedGraph or BED formats
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
import sys
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

in_contr = snakemake.input.get("control")
params = "{}".format(snakemake.params)
opt_input = ""
out_dir = ""

ext = "_peaks.xls"
out_file = [o for o in snakemake.output if o.endswith(ext)][0]
out_name = os.path.basename(out_file[: -len(ext)])
out_dir = os.path.dirname(out_file)

if in_contr:
    opt_input = "-c {contr}".format(contr=in_contr)

if out_dir:
    out_dir = "--outdir {dir}".format(dir=out_dir)

if any(out.endswith(("_peaks.narrowPeak", "_summits.bed")) for out in snakemake.output):
    if any(
        out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak"))
        for out in snakemake.output
    ):
        sys.exit(
            "Output files with _peaks.narrowPeak and/or _summits.bed extensions cannot be created together with _peaks.broadPeak and/or _peaks.gappedPeak extended output files.\n"
            "For usable extensions please see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.\n"
        )
    else:
        if " --broad" in params:
            sys.exit(
                "If --broad option in params is given, the _peaks.narrowPeak and _summits.bed files will not be created. \n"
                "Remove --broad option from params if these files are needed.\n"
            )

if any(
    out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output
):
    if "--broad " not in params and not params.endswith("--broad"):
        params += " --broad "

if any(
    out.endswith(("_treat_pileup.bdg", "_control_lambda.bdg"))
    for out in snakemake.output
):
    if all(p not in params for p in ["--bdg", "-B"]):
        params += " --bdg "
else:
    if any(p in params for p in ["--bdg", "-B"]):
        sys.exit(
            "If --bdg or -B option in params is given, the _control_lambda.bdg and _treat_pileup.bdg extended files must be specified in output. \n"
        )

shell(
    "(macs2 callpeak "
    "-t {snakemake.input.treatment} "
    "{opt_input} "
    "{out_dir} "
    "-n {out_name} "
    "{params}) {log}"
)

MANTA

Call structural variants with manta.

URL:

Example

This wrapper can be used in the following way:

rule manta:
    input:
        ref="human_g1k_v37_decoy.small.fasta",
        samples=["mapped/a.bam"],
        index=["mapped/a.bam.bai"],
        bed="test.bed.gz",  # optional
    output:
        vcf="results/out.bcf",
        idx="results/out.bcf.csi",
        cand_indel_vcf="results/small_indels.vcf.gz",
        cand_indel_idx="results/small_indels.vcf.gz.tbi",
        cand_sv_vcf="results/cand_sv.vcf.gz",
        cand_sv_idx="results/cand_sv.vcf.gz.tbi",
    params:
        extra_cfg="",  # optional
        extra_run="",  # optional
    log:
        "logs/manta.log",
    threads: 2
    resources:
        mem_mb=4096,
    wrapper:
        "v0.87.0/bio/manta"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • manta=1.6
  • bcftools=1.14
Input/Output

Input:

  • BAM/CRAM file(s)
  • reference genome
  • BED file (optional)

Output:

  • SVs and indels scored and genotyped under a diploid model (diploidSV.vcf.gz).
  • Unfiltered SV and indel candidates (candidateSV.vcf.gz).
  • Subset of the previous file containing only simple insertion and deletion variants less than the minimum scored variant size (candidateSmallIndels.vcf.gz).
Notes
  • The extra_cfg param allows for additional program arguments to configManta.py.
  • The extra_run param allows for additional program arguments to runWorkflow.py.
  • The runDir is created using pythons tempfile, meaning that all intermediate files are deleted on job completion
  • For more information see, https://github.com/Illumina/manta
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


import math
from snakemake.shell import shell
from pathlib import Path
from tempfile import TemporaryDirectory


extra_cfg = snakemake.params.get("extra_cfg", "")
extra_run = snakemake.params.get("extra_run", "")

bed = snakemake.input.get("bed", "")
if bed:
    bed = f"--callRegions {bed}"


mem_gb = snakemake.resources.get("mem_gb", "")
if not mem_gb:
    # 20 Gb of mem by default
    mem_gb = math.ceil(snakemake.resources.get("mem_mb", 20480) / 1024)

log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)


with TemporaryDirectory() as tempdir:
    tempdir = Path(tempdir)
    run_dir = tempdir / "runDir"
    bams = []

    # Symlink BAM/CRAM files to avoid problems with filenames
    for aln, idx in zip(snakemake.input.samples, snakemake.input.index):
        aln = Path(aln)
        idx = Path(idx)
        (tempdir / aln.name).symlink_to(aln.resolve())
        bams.append(tempdir / aln.name)

        if idx.name.endswith(".bam.bai") or idx.name.endswith(".cram.crai"):
            (tempdir / idx.name).symlink_to(idx.resolve())
        if idx.name.endswith(".bai"):
            (tempdir / idx.name).with_suffix(".bam.bai").symlink_to(idx.resolve())
        elif idx.name.endswith(".crai"):
            (tempdir / idx.name).with_suffix(".cram.crai").symlink_to(idx.resolve())
        else:
            raise ValueError(f"invalid index file name provided: {idx}")

    bams = list(map("--normalBam {}".format, bams))

    shell(
        # Configure Manta
        "configManta.py {extra_cfg} {bams} --referenceFasta {snakemake.input.ref} {bed} --runDir {run_dir} {log}; "
        # Run Manta
        "python2 {run_dir}/runWorkflow.py {extra_run} --jobs {snakemake.threads} --memGb {mem_gb} {log}; "
    )

    # Copy outputs into proper position.
    def infer_vcf_ext(vcf):
        if vcf.endswith(".vcf.gz"):
            return "z"
        elif vcf.endswith(".bcf"):
            return "b"
        else:
            raise ValueError(
                "invalid VCF extension. Only '.vcf.gz' and '.bcf' are supported."
            )

    def copy_vcf(origin_vcf, dest_vcf, dest_idx):
        if dest_vcf and dest_vcf != origin_vcf:
            dest_vcf_format = infer_vcf_ext(dest_vcf)
            shell(
                "bcftools view --threads {snakemake.threads} --output {dest_vcf:q} --output-type {dest_vcf_format} {origin_vcf:q} {log}"
            )

            origin_idx = str(origin_vcf) + ".tbi"
            if dest_idx and dest_idx != origin_idx:
                shell(
                    "bcftools index --threads {snakemake.threads} --output {dest_idx:q} {dest_vcf:q} {log}"
                )

    results_base = run_dir / "results" / "variants"

    # Copy main VCF output
    vcf_temp = results_base / "diploidSV.vcf.gz"
    vcf_final = snakemake.output.get("vcf")
    idx_final = snakemake.output.get("idx")
    copy_vcf(vcf_temp, vcf_final, idx_final)

    # Copy candidate small indels VCF
    cand_indel_vcf_temp = results_base / "candidateSmallIndels.vcf.gz"
    cand_indel_vcf_final = snakemake.output.get("cand_indel_vcf")
    cand_indel_idx_final = snakemake.output.get("cand_indel_idx")
    copy_vcf(cand_indel_vcf_temp, cand_indel_vcf_final, cand_indel_idx_final)

    # Copy candidates structural variants VCF
    cand_sv_vcf_temp = results_base / "candidateSV.vcf.gz"
    cand_sv_vcf_final = snakemake.output.get("cand_sv_vcf")
    cand_sv_idx_final = snakemake.output.get("cand_sv_idx")
    copy_vcf(cand_sv_vcf_temp, cand_sv_vcf_final, cand_sv_idx_final)

MAPDAMAGE2

tracking and quantifying damage patterns in ancient DNA sequences. For more information about MapDamage2 see MapDamage2 documentation.

URL:

Example

This wrapper can be used in the following way:

rule mapdamage2:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam",
    output:
        log="results/{sample}/Runtime_log.txt",  # output folder is infered from this file, so it needs to be the same folder for all output files
        GtoA3p="results/{sample}/3pGtoA_freq.txt",
        CtoT5p="results/{sample}/5pCtoT_freq.txt",
        dnacomp="results/{sample}/dnacomp.txt",
        frag_misincorp="results/{sample}/Fragmisincorporation_plot.pdf",
        len="results/{sample}/Length_plot.pdf",
        lg_dist="results/{sample}/lgdistribution.txt",
        misincorp="results/{sample}/misincorporation.txt",
#        rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file
    params:
        extra="--no-stats"  # optional parameters for mapdamage2 (except -i, -r, -d, --rescale)
    log:
        "logs/{sample}/mapdamage2.log"
    threads: 1  # MapDamage2 is not threaded
    wrapper:
        "v0.87.0/bio/mapdamage2"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • mapdamage2=2.2
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"

import os.path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

in_bam = snakemake.input.get("bam", "")
if in_bam:
    in_bam = "--input " + in_bam

output_folder = os.path.dirname(snakemake.output.get("log", ""))
if not output_folder:
    raise ValueError("mapDamage2 rule needs output 'log'.")

rescaled_bam = snakemake.output.get("rescaled_bam", "")
if rescaled_bam:
    rescaled_bam = "--rescale-out " + rescaled_bam


shell(
    "mapDamage "
    "{in_bam} "
    "--reference {snakemake.input.ref} "
    "--folder {output_folder} "
    "{rescaled_bam} "
    "{extra} "
    "{log}"
)

MICROPHASER

For microphaser, the following wrappers are available:

MICROPHASER BUILD_REFERENCE

Create a reference of all normal peptides in a sample

URL:

Example

This wrapper can be used in the following way:

rule microphaser_build:
    input:
        # all normal peptides from the complete proteome as nucleotide sequences
        ref_peptides="germline/peptides.fasta",
    output:
        # a binary of the normal peptides amino acid sequences
        bin="out/peptides.bin",
        # the amino acid sequences in FASTA format
        peptides="out/peptides.fasta",
    log:
        "logs/microphaser/build_reference.log"
    params:
        extra="--peptide-length 9",  # optional, desired peptide length in amino acids.
    wrapper:
        "v0.87.0/bio/microphaser/build_reference"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • microphaser=0.4
Input/Output

Input:

  • peptide reference (nucleotide sequences from microphaser germline)

Output:

  • peptide reference in amino acid FASTA format
  • binary peptide reference for filtering
Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "microphaser build_reference "
    "{extra} "
    "--reference {snakemake.input.ref_peptides} "
    "--output {snakemake.output.bin} "
    "> {snakemake.output.peptides} "
    "{log}"
)
MICROPHASER FILTER

Translate and filter neopeptides from microphaser output

URL:

Example

This wrapper can be used in the following way:

rule microphaser_filter:
    input:
        # the info file of the tumor sample to filter
        tsv="somatic/info.tsv",
        # All normal peptides to filter against
        ref_peptides="germline/peptides.bin",
    output:
        # the filtered neopeptides
        tumor="out/peptides.mt.fasta",
        # the normal peptides matching the filtered neopeptides
        normal="out/peptides.wt.fasta",
        # the info data of the filtered neopeptides
        tsv="out/peptides.info.tsv",
        # the info data of the removed neopeptides
        removed_tsv="out/peptides.removed.tsv",
        # the removed neopeptides
        removed_fasta="out/peptides.removed.fasta",
    log:
        "logs/microphaser/filter.log",
    params:
        extra="--peptide-length 9",  # optional, desired peptide length in amino acids.
    wrapper:
        "v0.87.0/bio/microphaser/filter"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • microphaser=0.4
Input/Output

Input:

  • neopeptides fasta (nucleotide sequences from microphaser somatic)
  • information tsv (from microphaser somatic)
  • sample-specific normal/wildtype pepetides (binary created using microphaser build)

Output:

  • filtered neopeptides (removed self-identical peptides) in amino acid FASTA format
  • corresponding normal peptides in amino acid FASTA format
  • filtered information tsv
  • self-identical peptides removed from the neopeptide set (tsv)
Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "microphaser filter "
    "{extra} "
    "--tsv {snakemake.input.tsv} "
    "--reference {snakemake.input.ref_peptides} "
    "--normal-output {snakemake.output.normal} "
    "--tsv-output {snakemake.output.tsv} "
    "--similar-removed {snakemake.output.removed_tsv} "
    "--removed-peptides {snakemake.output.removed_fasta} "
    " > {snakemake.output.tumor} "
    "{log}"
)
MICROPHASER NORMAL

Predict sample-specific normal peptides with integrated germline variants from NGS (whole exome/genome) data

URL:

Example

This wrapper can be used in the following way:

rule microphaser_normal:
    input:
        bam="mapped/{sample}.sorted.bam",
        index="mapped/{sample}.sorted.bam.bai",
        ref="genome.fasta",
        annotation="genome.gtf",
        variants="calls/{sample}.bcf",
    output:
        # all peptides from the healthy proteome
        peptides="out/{sample}.fasta",
        tsv="out/{sample}.tsv",
    log:
        "logs/microphaser/somatic/{sample}.log",
    params:
        extra="--window-len 9",  # optional, desired peptide length in nucleotide bases, e.g. 27 (9 AA) for MHC-I ligands.
    wrapper:
        "v0.87.0/bio/microphaser/normal"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • microphaser=0.3
Input/Output

Input:

  • bam file
  • bcf file
  • fasta reference
  • gtf annotation file

Output:

  • sample-specific peptide fasta (nucleotide sequences)
Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "microphaser normal {snakemake.input.bam} "
    "{extra} "
    "--ref {snakemake.input.ref} "
    "--variants {snakemake.input.variants} "
    "--tsv {snakemake.output.tsv} "
    "> {snakemake.output.peptides} "
    "< {snakemake.input.annotation} "
    "{log}"
)
MICROPHASER SOMATIC

Predict mutated neopeptides and their wildtype counterparts from NGS (whole exome/genome) data

URL:

Example

This wrapper can be used in the following way:

rule microphaser_somatic:
    input:
        bam="mapped/{sample}.sorted.bam",
        index="mapped/{sample}.sorted.bam.bai",
        ref="genome.fasta",
        annotation="genome.gtf",
        variants="calls/{sample}.bcf",
    output:
        # sequences neopeptides arisen from somatic variants
        tumor="out/{sample}.mt.fasta",
        # sequences of the normal, unmutated counterpart to every neopeptide
        normal="out/{sample}.wt.fasta",
        # info data of the somatic neopeptides
        tsv="out/{sample}.info.tsv",
    log:
        "logs/microphaser/somatic/{sample}.log",
    params:
        extra="--window-len 9",  # optional, desired peptide length in nucleotide bases, e.g. 27 (9 AA) for MHC-I ligands.
    wrapper:
        "v0.87.0/bio/microphaser/somatic"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • microphaser=0.3
Input/Output

Input:

  • bam file
  • bcf file
  • fasta reference
  • gtf annotation file

Output:

  • mutated peptide fasta (nucleotide sequences)
  • wildtype peptide fasta (nucleotide sequences)
  • information tsv
Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "microphaser somatic {snakemake.input.bam} "
    "{extra} "
    "--ref {snakemake.input.ref} "
    "--variants {snakemake.input.variants} "
    "--normal-output {snakemake.output.normal} "
    "--tsv {snakemake.output.tsv} "
    "> {snakemake.output.tumor} "
    "< {snakemake.input.annotation} "
    "{log}"
)

MINIMAP2

For minimap2, the following wrappers are available:

MINIMAP2

A versatile pairwise aligner for genomic and spliced nucleotide sequences.

URL:

Example

This wrapper can be used in the following way:

rule minimap2_paf:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.paf"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb",           # optional
        sorting="coordinate",           # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra=""                # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v0.87.0/bio/minimap2/aligner"

rule minimap2_sam:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.sam"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb",           # optional
        sorting="none",                 # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra=""                # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v0.87.0/bio/minimap2/aligner"

rule minimap2_bam:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.bam"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb",           # optional
        sorting="coordinate",           # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
        sort_extra=""                # optional: extra arguments for samtools/picard
    threads: 3
    wrapper:
        "v0.87.0/bio/minimap2/aligner"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • minimap2==2.17
  • samtools==1.12
Input/Output

Input:

  • FASTQ file(s)
  • reference genome

Output:

  • SAM/BAM/CRAM file
Notes
  • The extra param allows for additional arguments for minimap2.
  • The sort param allows to enable sorting (if output not PAF), and can be either ‘none’, ‘queryname’ or ‘coordinate’.
  • The sort_extra allows for extra arguments for samtools/picard
  • For more inforamtion see, https://lh3.github.io/minimap2
Authors
  • Tom Poorten
  • Michael Hall
  • Filipe G. Vieira
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell


inputQuery = " ".join(snakemake.input.query)

# Extract output format
out_name, out_ext = path.splitext(snakemake.output[0])
out_ext = out_ext[1:].upper()

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

pipe_cmd = ""
if out_ext != "PAF":
    # Add option for SAM output
    extra += " -a"

    # Determine which pipe command to use for converting to bam or sorting.
    if sort == "none":

        if out_ext != "SAM":
            # Simply convert to output format using samtools view.
            pipe_cmd = "| samtools view -h --output-fmt {} -".format(out_ext)

    elif sort in ["coordinate", "queryname"]:

        # Add name flag if needed.
        if sort == "queryname":
            sort_extra += " -n"

        # Sort alignments.
        pipe_cmd = "| samtools sort {} --output-fmt {} -".format(sort_extra, out_ext)

    else:
        raise ValueError("Unexpected value for params.sort ({})".format(sort))


shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "{snakemake.input.target} {inputQuery} {pipe_cmd} > {snakemake.output[0]}) {log}"
)
MINIMAP2 INDEX

creates a minimap2 index

URL:

Example

This wrapper can be used in the following way:

rule minimap2_index:
    input:
        target="target/{input1}.fasta"
    output:
        "{input1}.mmi"
    log:
        "logs/minimap2_index/{input1}.log"
    params:
        extra=""  # optional additional args
    threads: 3
    wrapper:
        "v0.87.0/bio/minimap2/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • minimap2==2.17
Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)

MLST

Scan contig files against traditional PubMLST typing schemes

URL:

Example

This wrapper can be used in the following way:

rule run_mlst:
    input:
        #Input assembly
        assembly="{sample}.fasta",
    output:
        #Tab delimited mlst designation
        mlst="{sample}_mlst.txt",
    params:
    #extra parameters should be space delimited
        # SYNOPSIS
        #   Automatic MLST calling from assembled contigs
        # USAGE
        #   % mlst --list                                            # list known schemes
        #   % mlst [options] <contigs.{fasta,gbk,embl}[.gz]          # auto-detect scheme
        #   % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
        # GENERAL
        #   --help            This help
        #   --version         Print version and exit(default ON)
        #   --check           Just check dependencies and exit (default OFF)
        #   --quiet           Quiet - no stderr output (default OFF)
        #   --threads [N]     Number of BLAST threads (suggest GNU Parallel instead) (default '1')
        #   --debug           Verbose debug output to stderr (default OFF)
        # SCHEME
        #   --scheme [X]      Don't autodetect, force this scheme on all inputs (default '')
        #   --list            List available MLST scheme names (default OFF)
        #   --longlist        List allelles for all MLST schemes (default OFF)
        #   --exclude [X]     Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
        # OUTPUT
        #   --csv             Output CSV instead of TSV (default OFF)
        #   --json [X]        Also write results to this file in JSON format (default '')
        #   --label [X]       Replace FILE with this name instead (default '')
        #   --nopath          Strip filename paths from FILE column (default OFF)
        #   --novel [X]       Save novel alleles to this FASTA file (default '')
        #   --legacy          Use old legacy output with allele header row (requires --scheme) (default OFF)
        # SCORING
        #   --minid [n.n]     DNA %identity of full allelle to consider 'similar' [~] (default '95')
        #   --mincov [n.n]    DNA %cov to report partial allele at all [?] (default '10')
        #   --minscore [n.n]  Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
        # PATHS
        #   --blastdb [X]     BLAST database
        #   --datadir [X]     PubMLST data
        # HOMEPAGE
        #   https://github.com/tseemann/mlst - Torsten Seemann
        extra="--nopath",
    log:
        "logs/{sample}.mlst.log",
    threads: 1
    wrapper:
        "v0.87.0/bio/mlst"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • mlst=2.19
Input/Output

Input:

  • Genomic assembly (fasta format)

Output:

  • Returns a tab-separated line containing the filename, matching PubMLST scheme name, ST (sequence type) and the allele IDs. Other output formats are also available (eg. CSV, JSON)
Notes
Authors
Code
__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "mlst"
    " {snakemake.params.extra}"
    " {snakemake.input.assembly}"
    " > {snakemake.output.mlst}"
    " {log}"
)

MOSDEPTH

fast BAM/CRAM depth calculation

URL:

Example

This wrapper can be used in the following way:

rule mosdepth:
    input:
        bam="aligned/{dataset}.bam",
        bai="aligned/{dataset}.bam.bai",
    output:
        "mosdepth/{dataset}.mosdepth.global.dist.txt",
        "mosdepth/{dataset}.per-base.bed.gz",  # produced unless --no-per-base specified
        summary="mosdepth/{dataset}.mosdepth.summary.txt",  # this named output is required for prefix parsing
    log:
        "logs/mosdepth/{dataset}.log",
    params:
        extra="--fast-mode",  # optional
    # additional decompression threads through `--threads`
    threads: 4  # This value - 1 will be sent to `--threads`
    wrapper:
        "v0.87.0/bio/mosdepth"


rule mosdepth_bed:
    input:
        bam="aligned/{dataset}.bam",
        bai="aligned/{dataset}.bam.bai",
        bed="test.bed",
    output:
        "mosdepth_bed/{dataset}.mosdepth.global.dist.txt",
        "mosdepth_bed/{dataset}.mosdepth.region.dist.txt",
        "mosdepth_bed/{dataset}.regions.bed.gz",
        summary="mosdepth_bed/{dataset}.mosdepth.summary.txt",  # this named output is required for prefix parsing
    log:
        "logs/mosdepth_bed/{dataset}.log",
    params:
        extra="--no-per-base --use-median",  # optional
    # additional decompression threads through `--threads`
    threads: 4  # This value - 1 will be sent to `--threads`
    wrapper:
        "v0.87.0/bio/mosdepth"


rule mosdepth_by_threshold:
    input:
        bam="aligned/{dataset}.bam",
        bai="aligned/{dataset}.bam.bai",
    output:
        "mosdepth_by_threshold/{dataset}.mosdepth.global.dist.txt",
        "mosdepth_by_threshold/{dataset}.mosdepth.region.dist.txt",
        "mosdepth_by_threshold/{dataset}.regions.bed.gz",
        "mosdepth_by_threshold/{dataset}.thresholds.bed.gz",  # needs to go with params.thresholds spec
        summary="mosdepth_by_threshold/{dataset}.mosdepth.summary.txt",  # this named output is required for prefix parsing
    log:
        "logs/mosdepth_by/{dataset}.log",
    params:
        by="500",  # optional, window size,  specifies --by for mosdepth.region.dist.txt and regions.bed.gz
        thresholds="1,5,10,30",  # optional, specifies --thresholds for thresholds.bed.gz
    # additional decompression threads through `--threads`
    threads: 4  # This value - 1 will be sent to `--threads`
    wrapper:
        "v0.87.0/bio/mosdepth"


rule mosdepth_quantize_precision:
    input:
        bam="aligned/{dataset}.bam",
        bai="aligned/{dataset}.bam.bai",
    output:
        "mosdepth_quantize_precision/{dataset}.mosdepth.global.dist.txt",
        "mosdepth_quantize_precision/{dataset}.quantized.bed.gz",  # optional, needs to go with params.quantize spec
        summary="mosdepth_quantize_precision/{dataset}.mosdepth.summary.txt",  # this named output is required for prefix parsing
    log:
        "logs/mosdepth_quantize_precision/{dataset}.log",
    params:
        extra="--no-per-base",  # optional
        quantize="0:1:5:150",  # optional, specifies --quantize for quantized.bed.gz
        precision="5",  # optional, set decimals of precision
    # additional decompression threads through `--threads`
    threads: 4  # This value - 1 will be sent to `--threads`
    wrapper:
        "v0.87.0/bio/mosdepth"


rule mosdepth_cram:
    input:
        bam="aligned/{dataset}.cram",
        bai="aligned/{dataset}.cram.crai",
        bed="test.bed",
        fasta="genome.fasta",
    output:
        "mosdepth_cram/{dataset}.mosdepth.global.dist.txt",
        "mosdepth_cram/{dataset}.mosdepth.region.dist.txt",
        "mosdepth_cram/{dataset}.regions.bed.gz",
        summary="mosdepth_cram/{dataset}.mosdepth.summary.txt",  # this named output is required for prefix parsing
    log:
        "logs/mosdepth_cram/{dataset}.log",
    params:
        extra="--no-per-base --use-median",  # optional
    # additional decompression threads through `--threads`
    threads: 4  # This value - 1 will be sent to `--threads`
    wrapper:
        "v0.87.0/bio/mosdepth"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • mosdepth==0.3.1
Input/Output

Input:

  • BAM/CRAM files
  • reference genome (optional)
  • BED file (optional)

Output:

  • Several coverage summary files.
Notes
  • The by param allows to specify (integer) window-sizes (incompatible with input BED).
  • The threshold param allows to, for or each interval in –by, write number of bases covered by at least threshold bases. Specify multiple integer values separated by ‘,’.
  • The precision param allows to specify output floating point precision.
  • The extra param allows for additional program arguments.
  • For more information see, https://github.com/brentp/mosdepth
Authors
  • William Rowell
  • David Lähnemann
  • Filipe Vieira
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

import sys
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bed = snakemake.input.get("bed", "")
by = snakemake.params.get("by", "")
if by:
    if bed:
        sys.exit(
            "Either provide a bed input file OR a window size via params.by, not both."
        )
    else:
        by = f"--by {by}"
if bed:
    by = f"--by {bed}"

quantize_out = False
thresholds_out = False
regions_bed_out = False
region_dist_out = False
for file in snakemake.output:
    if ".per-base." in file and "--no-per-base" in extra:
        sys.exit(
            "You asked not to generate per-base output (--no-per-base), but your rule specifies a '.per-base.' output file. Remove one of the two."
        )
    if ".quantized.bed.gz" in file:
        quantize_out = True
    if ".thresholds.bed.gz" in file:
        thresholds_out = True
    if ".mosdepth.region.dist.txt" in file:
        region_dist_out = True
    if ".regions.bed.gz" in file:
        regions_bed_out = True


if by and not regions_bed_out:
    sys.exit(
        "You ask for by-region output. Please also specify *.regions.bed.gz as a rule output."
    )

if by and not region_dist_out:
    sys.exit(
        "You ask for by-region output. Please also specify *.mosdepth.region.dist.txt as a rule output."
    )

if (region_dist_out or regions_bed_out) and not by:
    sys.exit(
        "You specify *.regions.bed.gz and/or *.mosdepth.region.dist.txt as a rule output. You also need to ask for by-region output via 'input.bed' or 'params.by'."
    )

quantize = snakemake.params.get("quantize", "")
if quantize:
    if not quantize_out:
        sys.exit(
            "You ask for quantized output via params.quantize. Please also specify *.quantized.bed.gz as a rule output."
        )
    quantize = f"--quantize {quantize}"

if not quantize and quantize_out:
    sys.exit(
        "The rule has output *.quantized.bed.gz specified. Please also specify params.quantize to actually generate it."
    )


thresholds = snakemake.params.get("thresholds", "")
if thresholds:
    if not thresholds_out:
        sys.exit(
            "You ask for --thresholds output via params.thresholds. Please also specify *.thresholds.bed.gz as a rule output."
        )
    thresholds = f"--thresholds {thresholds}"

if not thresholds and thresholds_out:
    sys.exit(
        "The rule has output *.thresholds.bed.gz specified. Please also specify params.thresholds to actually generate it."
    )


precision = snakemake.params.get("precision", "")
if precision:
    precision = f"MOSDEPTH_PRECISION={precision}"


fasta = snakemake.input.get("fasta", "")
if fasta:
    fasta = f"--fasta {fasta}"


# mosdepth takes additional threads through its option --threads
# One thread for mosdepth
# Other threads are *additional* decompression threads passed to the '--threads' argument
threads = "" if snakemake.threads <= 1 else "--threads {}".format(snakemake.threads - 1)


# named output summary = "*.mosdepth.summary.txt" is required
prefix = snakemake.output.summary.replace(".mosdepth.summary.txt", "")


shell(
    "({precision} mosdepth {threads} {fasta} {by} {quantize} {thresholds} {extra} {prefix} {snakemake.input.bam}) {log}"
)

MSISENSOR

For msisensor, the following wrappers are available:

MSISENSOR MSI

Score your MSI with MSIsensor

URL:

Example

This wrapper can be used in the following way:

rule test_msisensor_msi:
    input:
        normal = "example.normal.bam",
        tumor = "example.tumor.bam",
        microsat = "example.microsate.sites"
    output:
        "example.msi",
        "example.msi_dis",
        "example.msi_germline",
        "example.msi_somatic"
    message:
        "Testing MSIsensor msi"
    threads:
        1
    log:
        "example.log"
    params:
        out_prefix = "example.msi"
    wrapper:
        "v0.87.0/bio/msisensor/msi"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • msisensor==0.5
Input/Output

Input:

  • A microsatellite and homopolymer list from MSIsensor Scan
  • A pair of normal/tumoral bams

Output:

  • A text file containing MSI scores
  • A TSV formatted file containing read count distribution
  • A TSV formatted file containing somatic sites
  • A TSV formatted file containing germline sites
Authors
Code
"""Snakemake script for MSISensor msi"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from os.path import commonprefix
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

# Detemining common prefix in output files
# to fill the requested parameter '-o'
prefix = commonprefix(snakemake.output)

shell(
    "msisensor msi"  # Tool and its sub-command
    " -d {snakemake.input.microsat}"  # Path to homopolymer/microsat file
    " -n {snakemake.input.normal}"  # Path to normal bam
    " -t {snakemake.input.tumor}"  # Path to tumor bam
    " -o {prefix}"  # Path to output distribution file
    " -b {snakemake.threads}"  # Maximum number of threads used
    " {extra}"  # Optional extra parameters
    " {log}"  # Logging behavior
)
MSISENSOR SCAN

Scan homopolymers and microsatelites with MSIsensor

URL:

Example

This wrapper can be used in the following way:

rule test_msisensor_scan:
    input:
        "genome.fasta"
    output:
        "microsat.list"
    message:
        "Testing MSISensor scan"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/msisensor_scan.log"
    wrapper:
        "v0.87.0/bio/msisensor/scan"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • msisensor==0.5
Input/Output

Input:

  • A (multi)fasta formatted file

Output:

  • A text file containing homopolymers and microsatelites
Authors
  • Thibault Dayris
Code
"""Snakemake script for MSISensor Scan"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

shell(
    "msisensor scan "  # Tool and its sub-command
    "-d {snakemake.input} "  # Path to fasta file
    "-o {snakemake.output} "  # Path to output file
    "{extra} "  # Optional extra parameters
    "{log}"  # Logging behavior
)

MULTIQC

Generate qc report using multiqc.

URL:

Example

This wrapper can be used in the following way:

rule multiqc:
    input:
        expand("samtools_stats/{sample}.txt", sample=["a", "b"])
    output:
        "qc/multiqc.html"
    params:
        ""  # Optional: extra parameters for multiqc.
    log:
        "logs/multiqc.log"
    wrapper:
        "v0.87.0/bio/multiqc"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • multiqc=1.11
Input/Output

Input:

  • input directory containing qc files

Output:

  • qc report (html)
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {snakemake.params}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_dirs}"
    " {log}"
)

MUSCLE

build multiple sequence alignments using MUSCLE. Documentation found at https://www.drive5.com/muscle/manual/index.html

URL:

Example

This wrapper can be used in the following way:

rule muscle_fasta:
    input:
        fasta="{sample}.fa",  # Input fasta file
    output:
        alignment="{sample}.afa",  # Output alignment file
    log:
        "logs/muscle/{sample}.log",
    params:
        extra="",  # Additional arguments
    wrapper:
        "v0.87.0/bio/muscle"


rule muscle_clw:
    input:
        fasta="{sample}.fa",
    output:
        alignment="{sample}.clw",
    log:
        "logs/muscle/{sample}.log",
    params:
        extra="-clw",
    wrapper:
        "v0.87.0/bio/muscle"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • muscle==3.8.1551
Input/Output

Input:

  • FASTA file

Output:

  • Alignment file, with FASTA as default file format
Notes
  • MUSCLE is a single-core program. It cannot utilize more than 1 thread.
Authors
  • Nikos Tsardakas Renhuldt
Code
__author__ = "Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2021, Nikos Tsardakas Renhuldt"
__email__ = "nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "muscle "
    "{extra} "
    "-in {snakemake.input.fasta} "
    "-out {snakemake.output.alignment} "
    "{log}"
)

NANOSIM-H

NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.

URL:

Example

This wrapper can be used in the following way:

rule nanosimh:
    input:
        "{sample}.fa"
    output:
        reads = "{sample}.simulated.fa",
        log = "{sample}.simulated.log",
        errors = "{sample}.simulated.errors.txt"
    params:
        extra = "",
        num_reads = 10,
        perfect_reads = True,
        min_read_len = 10,
    log:
        "logs/nanosim-h/test/{sample}.log"
    wrapper:
        "v0.87.0/bio/nanosim-h"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • nanosim-h==1.1.0.4
Authors
  • Michael Hall
Code
"""Snakemake wrapper for NanoSim-H."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


def is_header(query):
    return query.startswith(">")


def get_length_of_longest_sequence(fh):
    current_length = 0
    all_lengths = []
    for line in fh:
        if not is_header(line):
            current_length += len(line.rstrip())
        else:
            all_lengths.append(current_length)
            current_length = 0
    all_lengths.append(current_length)

    return max(all_lengths)


# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)

# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
    with open(snakemake.input[0]) as fh:
        max_read_len = get_length_of_longest_sequence(fh)

perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "nanosim-h {extra} "
    "{perfect_reads_flag} "
    "--max-len {max_read_len} "
    "--min-len {min_read_len} "
    "--profile {profile} "
    "--number {num_reads} "
    "--out-pref {prefix} "
    "{snakemake.input} {log}"
)

NEXTFLOW

Run nextflow pipeline

URL:

Example

This wrapper can be used in the following way:

    conda:
        "envs/curl.yaml"
    log:
        "logs/get-genome.log"


    conda:
        "envs/curl.yaml"
    log:
        "logs/get-annotation.log"


    conda:
        "envs/curl.yaml"
    log:
        "logs/get-design.log"


rule chipseq_pipeline:
    input:
        input="design.csv",
        fasta="data/genome.fasta",
        gtf="data/genome.gtf",
        # any --<argname> pipeline file arguments can be given here as <argname>=<path>
    output:
        "results/multiqc/broadPeak/multiqc_report.html",
    params:
        pipeline="nf-core/chipseq",
        revision="1.2.1",
        profile=["test", "docker"],
        # any --<argname> pipeline arguments can be given here as <argname>=<value>
    handover: True
    wrapper:
        "v0.87.0/utils/nextflow"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • nextflow=20.10
Notes

This wrapper can e.g. be used to run nf-core pipelines. In each of the nf-core pipeline descriptions, you will find available parameters and the output file structure (under “aws results”). The latter can be used to set the desired output files for this wrapper.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2021, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

revision = snakemake.params.get("revision")
profile = snakemake.params.get("profile", [])
if isinstance(profile, str):
    profile = [profile]

args = []

if revision:
    args += ["-revision", revision]
if profile:
    args += ["-profile", ",".join(profile)]
print(args)

# TODO pass threads in case of single job
# TODO limit parallelism in case of pipeline
# TODO handle other resources

add_parameter = lambda name, value: args.append("--{} {}".format(name, value))

for name, files in snakemake.input.items():
    if isinstance(files, list):
        # TODO check how multiple input files under a single arg are usually passed to nextflow
        files = ",".join(files)
    add_parameter(name, files)
for name, value in snakemake.params.items():
    if name != "pipeline" and name != "revision":
        add_parameter(name, value)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
args = " ".join(args)
pipeline = snakemake.params.pipeline

shell("nextflow run {pipeline} {args} {log}")

NGS-DISAMBIGUATE

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.

URL:

Example

This wrapper can be used in the following way:

rule disambiguate:
    input:
        a="mapped/{sample}.a.bam",
        b="mapped/{sample}.b.bam"
    output:
        a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
        b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
        a_disambiguated='disambiguate/{sample}.graft.bam',
        b_disambiguated='disambiguate/{sample}.host.bam',
        summary='qc/disambiguate/{sample}.txt'
    params:
        algorithm="bwa",
        # optional: Prefix to use for output. If omitted, a
        # suitable value is guessed from the output paths. Prefix
        # is used for the intermediate output paths, as well as
        # sample name in summary file.
        prefix="{sample}",
        extra=""
    wrapper:
        "v0.87.0/bio/ngs-disambiguate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ngs-disambiguate==2016.11.10
  • bamtools==2.4.0
Input/Output

Input:

  • species a bam file (name sorted)
  • species b bam file (name sorted)

Output:

  • bam file with ambiguous alignments for species a
  • bam file with ambiguous alignments for species b
  • bam file with unambiguous alignments for species a
  • bam file with unambiguous alignments for species b
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")

output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
    prefix = path.splitext(path.basename(snakemake.output.summary))[0]

# Run command.
shell(
    "ngs_disambiguate"
    " {extra}"
    " -o {output_dir}"
    " -s {prefix}"
    " -a {snakemake.params.algorithm}"
    " {snakemake.input.a}"
    " {snakemake.input.b}"
)

# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)

output_map = {
    output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
    output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
    output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
    output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
    output_base + "_summary.txt": snakemake.output.summary,
}

for src, dest in output_map.items():
    if src != dest:
        shell("mv {src} {dest}")

OPEN-CRAVAT

For open-cravat, the following wrappers are available:

OPENCRAVAT MODULE

Install OpenCRAVAT modules. Annotate variant calls with OpenCRAVAT. For more details, see https://github.com/KarchinLab/open-cravat/wiki.

URL:

Example

This wrapper can be used in the following way:

rule opencravat_module:
    output:
        # add any other desired modules as separate directory outputs
        directory("modules/annotators/biogrid"),
    log:
        "logs/open-cravat/module.log"
    wrapper:
        "v0.87.0/bio/open-cravat/module"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • open-cravat=2.1
Authors
  • Rick Kim
Code
__author__ = "Rick Kim"
__copyright__ = "Copyright 2020, Rick Kim"
__license__ = "GPLv3"

from snakemake.shell import shell
import cravat
import re
import pathlib
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
onames = []
for o in snakemake.output:
    onames.append(o)
if type(onames) == str:
    onames = [onames]
elif type(onames) == list:
    onames = onames
else:
    onames = [str(onames)]
for oname in onames:
    if os.path.exists(oname):
        continue
    [o2, module_name] = os.path.split(oname)
    [modules_dir, module_type] = os.path.split(o2)
    module_type = module_type[:-1]
    modules_dir_cur = cravat.admin_util.get_modules_dir()
    if modules_dir_cur != modules_dir:
        cravat.admin_util.set_modules_dir(modules_dir)
    cmd = ["oc", "module", "install", module_name, "-y"]
    cmd = " ".join(cmd)
    shell("{cmd} {log}")
OPENCRAVAT RUN

Runs OpenCRAVAT. Annotate variant calls with OpenCRAVAT. For more details, see https://github.com/KarchinLab/open-cravat/wiki.

URL:

Example

This wrapper can be used in the following way:

rule opencravat:
    input:
        'example_input.tsv',
        'modules/commons/hg38wgs',
        'modules/converters/cravat-converter',
        'modules/mappers/hg38',
        'modules/annotators/biogrid',
        'modules/annotators/clinvar',
        'modules/postaggregators/tagsampler',
        'modules/postaggregators/varmeta',
        'modules/postaggregators/vcfinfo',
        'modules/reporters/excelreporter',
        'modules/reporters/tsvreporter',
        'modules/reporters/csvreporter',
    output:
        'example_input.tsv.xlsx',
        'example_input.tsv.variant.tsv',
        'example_input.tsv.variant.csv'
    log:
        "logs/open-cravat/run.log"
    threads: 1 # set number of threads for parallel processing
    wrapper:
        "v0.87.0/bio/open-cravat/run"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • open-cravat=2.1
Authors
  • Rick Kim
Code
__author__ = "Rick Kim"
__copyright__ = "Copyright 2020, Rick Kim"
__license__ = "GPLv3"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
inputfiles = []
annotators = []
reporters = []
modules_dir = set()
for v in snakemake.input:
    if os.path.isfile(v):
        inputfiles.append(v)
    elif os.path.isdir(v):
        (module_group_dir, module_name) = os.path.split(v)
        (in_modules_dir, module_group) = os.path.split(module_group_dir)
        modules_dir.add(in_modules_dir)
        if module_group == "annotators":
            annotators.append(module_name)
        elif module_group == "reporters" and module_name.endswith("reporter"):
            reporters.append(module_name[:-8])
if len(modules_dir) > 1:
    print(f'Multiple modules directory detected: {",".join(list(modules_dir))}')
    exit()
cmd = ["oc", "run"]
cmd.extend(inputfiles)
genome = snakemake.params.get("genome", "hg38")
mp = snakemake.threads
cmd.extend(["-l", genome])
cmd.extend(["--mp", str(mp)])
if len(annotators) > 0:
    cmd.append("-a")
    cmd.extend(annotators)
if len(reporters) > 0:
    cmd.append("-t")
    cmd.extend(reporters)
extra = snakemake.params.get("extra", "")
if len(extra) > 0 and type(extra) == str:
    cmd.extend(extra.split(" "))
shell("{cmd} {log}")

OPTITYPE

Precision 4-digit HLA-I-typing from NGS data based on integer linear programming. Use razers3 beforehand to generate input fastq files only mapping to HLA-regions. Please see https://github.com/FRED-2/OptiType

URL:

Example

This wrapper can be used in the following way:

rule optitype:
    input:
        # list of input reads
        reads=["reads/{sample}_1.fished.fastq", "reads/{sample}_2.fished.fastq"]
    output:
        multiext("optitype/{sample}", "_coverage_plot.pdf", "_result.tsv")
    log:
        "logs/optitype/{sample}.log"
    params:
        # Type of sequencing data. Can be 'dna' or 'rna'. Default is 'dna'.
        sequencing_type="dna",
        # optiype config file, optional
        config="",
        # additional parameters
        extra=""
    wrapper:
        "v0.87.0/bio/optitype"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • optitype==1.3.5
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
outdir = os.path.dirname(snakemake.output[0])

# get sequencing type
seq_type = snakemake.params.get("sequencing_type", "dna")
seq_type = "--{}".format(seq_type)

# check if non-default config.ini is used
config = snakemake.params.get("config", "")
if any(config):
    config = "--config {}".format(config)

shell(
    "(OptiTypePipeline.py"
    " --input {snakemake.input.reads}"
    " --outdir {outdir}"
    " --prefix {snakemake.wildcards.sample}"
    " {seq_type}"
    " {config}"
    " {extra})"
    " {log}"
)

PALADIN

For paladin, the following wrappers are available:

PALADIN ALIGN

Align nucleotide reads to a protein fasta file (that has been indexed with paladin index). PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

URL:

Example

This wrapper can be used in the following way:

rule paladin_align:
    input:
        reads=["reads/reads.left.fq.gz"],
        index="index/prot.fasta.bwt",
    output:
        "paladin_mapped/{sample}.bam" # will output BAM format if output file ends with ".bam", otherwise SAM format
    log:
        "logs/paladin/{sample}.log"
    threads: 4
    wrapper:
        "v0.87.0/bio/paladin/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Input/Output

Input:

  • nucleotide reads (fastq)
  • indexed protein fasta file (output of paladin index or prepare)

Output:

  • mapped reads (SAM or BAM format)
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PALADIN alignment"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

r = snakemake.input.get("reads")
assert (
    r is not None
), "reads are required as input. If you have paired end reads, please merge them first (e.g. with PEAR)"
index = snakemake.input.get("index")
assert (
    index is not None
), "please index your assembly and provide the basename (with'.bwt' extension) via the 'index' input param"

index_base = str(index).rsplit(".bwt")[0]

outfile = snakemake.output

# if bam output, pipe to bam!
output_cmd = "  | samtools view -Sb - > " if str(outfile).endswith(".bam") else " -o "

min_orf_len = snakemake.params.get("f", "250")

shell(
    "paladin align -f {min_orf_len} -t {snakemake.threads} {extra} {index_base} {r} {output_cmd} {outfile}"
)
PALADIN INDEX

Index a protein fasta file for mapping with paladin. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

URL:

Example

This wrapper can be used in the following way:

rule paladin_index:
    input:
        "prot.fasta",
    output:
        "index/prot.fasta.bwt"
    log:
        "logs/paladin/prot_index.log"
    params:
      reference_type=3
    wrapper:
        "v0.87.0/bio/paladin/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Input/Output

Input:

  • protein fasta file

Output:

  • file indexed for paladin mapping
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Index."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"


# this wrapper temporarily copies your assembly into the output dir
# so that all the paladin output files end up in the desired spot

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

input_assembly = snakemake.input
annotation = snakemake.input.get("gff", "")
paladin_index = str(snakemake.output)
reference_type = snakemake.params.get("reference_type", "3")
assert int(reference_type) in [1, 2, 3, 4]
ref_type_cmd = "-r" + str(reference_type)

output_base = paladin_index.rsplit(".bwt")[0]

shell("cp {input_assembly} {output_base}")
shell("paladin index {ref_type_cmd} {output_base} {annotation} {extra} {log}")
shell("rm -f {output_base}")
PALADIN PREPARE

Download and prepare uniprot refs for paladin mapping. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

URL:

Example

This wrapper can be used in the following way:

rule paladin_prepare:
    output:
        "uniprot_sprot.fasta.gz",
        "uniprot_sprot.fasta.gz.pro"
    log:
        "logs/paladin/prepare_sprot.log"
    params:
        reference_type=1, # 1=swiss-prot, 2=uniref90
    wrapper:
        "v0.87.0/bio/paladin/prepare"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Prepare"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

reference_type = snakemake.params.get(
    "reference_type", "1"
)  # download swissprot as default
assert int(reference_type) in [1, 2]
ref_type_cmd = "-r" + str(reference_type)

shell("paladin prepare {ref_type_cmd} {extra} {log}")

PANDORA

For pandora, the following wrappers are available:

PANDORA INDEX

Index population reference graph (PRG) sequences.

URL: https://github.com/rmcolq/pandora/wiki/Usage#build-index

Example

This wrapper can be used in the following way:

rule pandora_index:
    input:
        "{gene}/prg.fa",
    output:
        index="{gene}/prg.fa.k15.w14.idx",
        kmer_prgs=directory("{gene}/kmer_prgs"),
    log:
        "pandora_index/{gene}.log",
    params:
        options="-v -k 15 -w 14",
    threads: 1
    wrapper:
        "v0.87.0/bio/pandora/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pandora=0.9
Input/Output

Input:

  • A PRG file (made by make_prg <https://github.com/iqbal-lab-org/make_prg>) to index

Output:

  • index: A pandora index file
  • kmer_prgs: A directory of the index kmer PRGs in GFA format
Params
  • options: Any options other than threads (see docs)
Authors
  • Michael Hall
Code
"""Snakemake wrapper for indexing population reference graph (PRG) sequences with
pandora
"""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2021, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=False)
options = snakemake.params.get("options", "")

shell("pandora index -t {snakemake.threads} {options} {snakemake.input} {log}")

PBMM2

For pbmm2, the following wrappers are available:

PBMM2 ALIGN

Align reads using pbmm2, a minimap2 SMRT wrapper for PacBio data https://github.com/PacificBiosciences/pbmm2/

URL:

Example

This wrapper can be used in the following way:

rule pbmm2_align:
    input:
        reference="target/{reference}.fasta", # can be either genome index or genome fasta
        query="{query}.bam", # can be either unaligned bam, fastq, or fasta
    output:
        bam="aligned/{query}.{reference}.bam",
        index="aligned/{query}.{reference}.bam.bai",
    log:
        "logs/pbmm2_align/{query}.{reference}.log",
    params:
        preset="CCS", # SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED
        sample="", # sample name for @RG header
        extra="--sort", # optional additional args
        loglevel="INFO",
    threads: 12
    wrapper:
        "v0.87.0/bio/pbmm2/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pbmm2==1.4.0
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

import tempfile
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
tmp_root = snakemake.params.get("tmp_root", None)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

with tempfile.TemporaryDirectory(dir=tmp_root) as tmp_dir:
    shell(
        """
        (TMPDIR={tmp_dir}; \
        pbmm2 align --num-threads {snakemake.threads} \
            --preset {snakemake.params.preset} \
            --sample {snakemake.params.sample} \
            --log-level {snakemake.params.loglevel} \
            {extra} \
            {snakemake.input.reference} \
            {snakemake.input.query} \
            {snakemake.output.bam}) {log}
        """
    )
PBMM2 INDEX

Indexes a reference using pbmm2, a minimap2 SMRT wrapper for PacBio data https://github.com/PacificBiosciences/pbmm2/

URL:

Example

This wrapper can be used in the following way:

rule pbmm2_index:
    input:
        reference="target/{reference}.fasta",
    output:
        "target/{reference}.mmi",
    log:
        "logs/pbmm2_index/{reference}.log",
    params:
        preset="CCS", # SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED
        extra="", # optional additional args
    threads: 8
    wrapper:
        "v0.87.0/bio/pbmm2/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pbmm2==1.3.0
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    """
    (pbmm2 index \
    --num-threads {snakemake.threads} \
    --preset {snakemake.params.preset} \
    --log-level DEBUG \
    {extra} \
    {snakemake.input.reference} {snakemake.output}) {log}
    """
)

PEAR

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger

URL:

Example

This wrapper can be used in the following way:

rule pear_merge:
    input:
        read1="reads/reads.left.fq.gz",
        read2="reads/reads.right.fq.gz"
    output:
        assembled="pear/reads_pear_assembled.fq.gz",
        discarded="pear/reads_pear_discarded.fq.gz",
        unassembled_read1="pear/reads_pear_unassembled_r1.fq.gz",
        unassembled_read2="pear/reads_pear_unassembled_r2.fq.gz",
    log:
        'logs/pear.log'
    params:
        pval=".01",
        extra=""
    threads: 4
    resources:
        mem_mb=4000 # define amount of memory to be used by pear
    wrapper:
        "v0.87.0/bio/pear"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pear=0.9.6
Input/Output

Input:

  • paired fastq files

Output:

  • merged fastq
Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

r1 = snakemake.input.get("read1")
r2 = snakemake.input.get("read2")
assert r1 is not None and r2 is not None, "r1 and r2 files are required as input"

assembled = snakemake.output.get("assembled")
assert assembled is not None, "require 'assembled' outfile"
gzip = True if assembled.endswith(".gz") else False

out_base, out_end = assembled.rsplit(".f")
out_end = ".f" + out_end

df_assembled = out_base + ".assembled.fastq"
df_discarded = out_base + ".discarded.fastq"
df_unassembled_r1 = out_base + ".unassembled.forward.fastq"
df_unassembled_r2 = out_base + ".unassembled.reverse.fastq"

df_outputs = [df_assembled, df_discarded, df_unassembled_r1, df_unassembled_r2]

discarded = snakemake.output.get("discarded", out_base + ".discarded" + out_end)
unassembled_r1 = snakemake.output.get(
    "unassembled_read1", out_base + ".unassembled_r1" + out_end
)
unassembled_r2 = snakemake.output.get(
    "unassembled_read2", out_base + ".unassembled_r2" + out_end
)

final_outputs = [assembled, discarded, unassembled_r1, unassembled_r2]


def move_files(in_list, out_list, gzip):
    for f, o in zip(in_list, out_list):
        if f != o:
            if gzip:
                shell("gzip -9 -c {f} > {o}")
                shell("rm -f {f}")
            else:
                shell("cp {f} {o}")
                shell("rm -f {f}")
        elif gzip:
            shell("gzip -9 {f}")


pval = float(snakemake.params.get("pval", ".01"))
max_mem = snakemake.resources.get("mem_mb", "4000")
extra = snakemake.params.get("extra", "")

shell(
    "pear -f {r1} -r {r2} -p {pval} -j {snakemake.threads} -y {max_mem} {extra} -o {out_base} {log}"
)

move_files(df_outputs, final_outputs, gzip)

PICARD

For picard, the following wrappers are available:

PICARD ADDORREPLACEREADGROUPS

Add or replace read groups with picard tools.

URL:

Example

This wrapper can be used in the following way:

rule replace_rg:
    input:
        "mapped/{sample}.bam"
    output:
        "fixed-rg/{sample}.bam"
    log:
        "logs/picard/replace_rg/{sample}.log"
    params:
        "RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/addorreplacereadgroups"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • bam file with added or replaced read groups
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params
java_opts = get_java_opts(snakemake)

shell(
    "picard AddOrReplaceReadGroups {java_opts} {extra} "
    "I={snakemake.input} O={snakemake.output} &> {snakemake.log}"
)
PICARD BEDTOINTERVALLIST

picard BedToIntervalList converts a BED file to Picard Interval List format.

URL:

Example

This wrapper can be used in the following way:

rule bed_to_interval_list:
    input:
        bed="resources/a.bed",
        dict="resources/genome.dict"
    output:
        "a.interval_list"
    log:
        "logs/picard/bedtointervallist/a.log"
    params:
        # optional parameters
        "SORT=true " # sort output interval list before writing
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/bedtointervallist"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

Output:

  • interval_list Picard format
Authors
  • Fabian Kilpert
Code
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell()

extra = snakemake.params
java_opts = get_java_opts(snakemake)

shell(
    "picard BedToIntervalList "
    "{java_opts} {extra} "
    "INPUT={snakemake.input.bed} "
    "SEQUENCE_DICTIONARY={snakemake.input.dict} "
    "OUTPUT={snakemake.output} "
    "{log} "
)
PICARD COLLECTALIGNMENTSUMMARYMETRICS

Collect metrics on aligned reads with picard tools.

URL:

Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam"
    output:
        "stats/{sample}.summary.txt"
    log:
        "logs/picard/alignment-summary/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/collectalignmentsummarymetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell()

extra = snakemake.params
java_opts = get_java_opts(snakemake)

shell(
    "picard CollectAlignmentSummaryMetrics {java_opts} {extra} "
    "INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
    "REFERENCE_SEQUENCE={snakemake.input.ref} {log}"
)
PICARD COLLECTGCBIASMETRICS

Run picard CollectGcBiasMetrics to generate QC metrics pertaining to GC bias.

URL:

Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        # BAM aligned to reference genome
        bam="mapped/a.bam",
        # reference genome FASTA from which GC-context is inferred
        ref="genome.fasta"
    output:
        metrics="results/a.gcmetrics.txt",
        chart="results/a.gc.pdf",
        summary="results/a.summary.txt"
    params:
        # optional additional parameters, for example,
        extra="MINIMUM_GENOME_FRACTION=1E-5"
    log:
        "logs/picard/a.gcmetrics.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/collectgcbiasmetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.25.4
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • BAM file of RNA-seq data aligned to genome
  • REF_FLAT formatted file of transcriptome annotations

Output:

  • GC metrics text file
  • GC metrics PDF figure
  • GC summary metrics text file
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "picard CollectGcBiasMetrics "
    "{java_opts} {extra} "
    "INPUT={snakemake.input.bam} "
    "OUTPUT={snakemake.output.metrics} "
    "CHART={snakemake.output.chart} "
    "SUMMARY_OUTPUT={snakemake.output.summary} "
    "REFERENCE_SEQUENCE={snakemake.input.ref} "
    "{log}"
)
PICARD COLLECTHSMETRICS

Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.

URL:

Example

This wrapper can be used in the following way:

rule picard_collect_hs_metrics:
    input:
        bam="mapped/{sample}.bam",
        reference="genome.fasta",
        # Baits and targets should be given as interval lists. These can
        # be generated from bed files using picard BedToIntervalList.
        bait_intervals="regions.intervals",
        target_intervals="regions.intervals"
    output:
        "stats/hs_metrics/{sample}.txt"
    params:
        # Optional extra arguments. Here we reduce sample size
        # to reduce the runtime in our unit test.
        extra="SAMPLE_SIZE=1000"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "logs/picard_collect_hs_metrics/{sample}.log"
    wrapper:
        "v0.87.0/bio/picard/collecthsmetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • metrics file
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard CollectHSMetrics."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
java_opts = get_java_opts(snakemake)

shell(
    "picard CollectHsMetrics"
    " {java_opts} {extra}"
    " INPUT={snakemake.input.bam}"
    " OUTPUT={snakemake.output[0]}"
    " REFERENCE_SEQUENCE={snakemake.input.reference}"
    " BAIT_INTERVALS={snakemake.input.bait_intervals}"
    " TARGET_INTERVALS={snakemake.input.target_intervals}"
    " {log}"
)
PICARD COLLECTINSERTSIZEMETRICS

Collect metrics on insert size of paired end reads with picard tools.

URL:

Example

This wrapper can be used in the following way:

rule insert_size:
    input:
        "mapped/{sample}.bam"
    output:
        txt="stats/{sample}.isize.txt",
        pdf="stats/{sample}.isize.pdf"
    log:
        "logs/picard/insert_size/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/collectinsertsizemetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • r-base==3.6.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • txt: textual representation of metrics
  • pdf: insert size histogram
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell()

extra = snakemake.params
java_opts = get_java_opts(snakemake)


shell(
    "picard CollectInsertSizeMetrics {java_opts} {extra} "
    "INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
    "HISTOGRAM_FILE={snakemake.output.pdf} {log}"
)
PICARD COLLECTMULTIPLEMETRICS

A picard meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard’s documentation. For more information about picard, also see the source code.

You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below).

Tool Extension(s) for the output files
CollectAlignmentSummaryMetrics “.alignment_summary_metrics”
CollectInsertSizeMetrics

“.insert_size_metrics”,

“.insert_size_histogram.pdf”

QualityScoreDistribution

“.quality_distribution_metrics”,

“.quality_distribution.pdf”

MeanQualityByCycle

“.quality_by_cycle_metrics”,

“.quality_by_cycle.pdf”

CollectBaseDistributionByCycle

“.base_distribution_by_cycle_metrics”,

“.base_distribution_by_cycle.pdf”

CollectGcBiasMetrics

“.gc_bias.detail_metrics”,

“.gc_bias.summary_metrics”,

“.gc_bias.pdf”

RnaSeqMetrics “.rna_metrics”
CollectSequencingArtifactMetrics

“.bait_bias_detail_metrics”,

“.bait_bias_summary_metrics”,

“.error_summary_metrics”,

“.pre_adapter_detail_metrics”,

“.pre_adapter_summary_metrics”

CollectQualityYieldMetrics “.quality_yield_metrics”

URL:

Example

This wrapper can be used in the following way:

rule collect_multiple_metrics:
    input:
         bam="mapped/{sample}.bam",
         ref="genome.fasta"
    output:
        # Through the output file extensions the different tools for the metrics can be selected
        # so that it is not necessary to specify them under params with the "PROGRAM" option.
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
        multiext("stats/{sample}",
                 ".alignment_summary_metrics",
                 ".insert_size_metrics",
                 ".insert_size_histogram.pdf",
                 ".quality_distribution_metrics",
                 ".quality_distribution.pdf",
                 ".quality_by_cycle_metrics",
                 ".quality_by_cycle.pdf",
                 ".base_distribution_by_cycle_metrics",
                 ".base_distribution_by_cycle.pdf",
                 ".gc_bias.detail_metrics",
                 ".gc_bias.summary_metrics",
                 ".gc_bias.pdf",
                 ".rna_metrics",
                 ".bait_bias_detail_metrics",
                 ".bait_bias_summary_metrics",
                 ".error_summary_metrics",
                 ".pre_adapter_detail_metrics",
                 ".pre_adapter_summary_metrics",
                 ".quality_yield_metrics"
                 )
    resources:
        # This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
        #     https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
        mem_gb=3
    log:
        "logs/picard/multiple_metrics/{sample}.log"
    params:
        # optional parameters
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE "
        "REF_FLAT=ref_flat.txt "   # is required if RnaSeqMetrics are used
    wrapper:
        "v0.87.0/bio/picard/collectmultiplemetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.23.0
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • BAM file (.bam)
  • FASTA reference sequence file (.fasta or .fa)

Output:

  • multiple metrics text files (_metrics) AND
  • multiple metrics pdf files (.pdf)
  • the appropriate extensions for the output files must be used depending on the desired tools
Authors
  • David Laehnemann
  • Antonie Vietor
Code
__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import sys
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params
java_opts = get_java_opts(snakemake)

exts_to_prog = {
    ".alignment_summary_metrics": "CollectAlignmentSummaryMetrics",
    ".insert_size_metrics": "CollectInsertSizeMetrics",
    ".insert_size_histogram.pdf": "CollectInsertSizeMetrics",
    ".quality_distribution_metrics": "QualityScoreDistribution",
    ".quality_distribution.pdf": "QualityScoreDistribution",
    ".quality_by_cycle_metrics": "MeanQualityByCycle",
    ".quality_by_cycle.pdf": "MeanQualityByCycle",
    ".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle",
    ".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle",
    ".gc_bias.detail_metrics": "CollectGcBiasMetrics",
    ".gc_bias.summary_metrics": "CollectGcBiasMetrics",
    ".gc_bias.pdf": "CollectGcBiasMetrics",
    ".rna_metrics": "RnaSeqMetrics",
    ".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics",
    ".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics",
    ".error_summary_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics",
    ".quality_yield_metrics": "CollectQualityYieldMetrics",
}
progs = set()

for file in snakemake.output:
    matched = False
    for ext in exts_to_prog:
        if file.endswith(ext):
            progs.add(exts_to_prog[ext])
            matched = True
    if not matched:
        sys.exit(
            "Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
        )

programs = " PROGRAM=" + " PROGRAM=".join(progs)

out = str(snakemake.wildcards.sample)  # as default
output_file = str(snakemake.output[0])
for ext in exts_to_prog:
    if output_file.endswith(ext):
        out = output_file[: -len(ext)]
        break

shell(
    "(picard CollectMultipleMetrics "
    "I={snakemake.input.bam} "
    "O={out} "
    "R={snakemake.input.ref} "
    "{extra} {programs} {java_opts}) {log}"
)
PICARD COLLECTRNASEQMETRICS

Run picard CollectRnaSeqMetrics to generate QC metrics for RNA-seq data.

URL:

Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        # BAM aligned, splicing-aware, to reference genome
        bam="mapped/a.bam",
        # Annotation file containing transcript, gene, and exon data
        refflat="annotation.refFlat"
    output:
        "results/a.rnaseq_metrics.txt"
    params:
        # strand is optional (defaults to NONE) and pertains to the library preparation
        # options are FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND, and NONE
        strand="NONE",
        # optional additional parameters, for example,
        extra="VALIDATION_STRINGENCY=STRICT"
    log:
        "logs/picard/rnaseq-metrics/a.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "https://raw.githubusercontent.com/brcopeland/snakemake-wrappers/picard_collectrnaseqmetrics/bio/picard/collectrnaseqmetrics"
        #"v0.87.0/bio/picard/collectrnaseqmetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.25.4
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • BAM file of RNA-seq data aligned to genome
  • REF_FLAT formatted file of transcriptome annotations

Output:

  • RNA-Seq metrics text file
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

strand = snakemake.params.get("strand", "NONE")
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "picard CollectRnaSeqMetrics "
    "{java_opts} {extra} "
    "INPUT={snakemake.input.bam} "
    "OUTPUT={snakemake.output} "
    "REF_FLAT={snakemake.input.refflat} "
    "STRAND_SPECIFICITY={strand} "
    "{log}"
)
PICARD COLLECTTARGETEDPCRMETRICS

Collect metric information for target pcr metrics runs, with picard tools.

URL:

Example

This wrapper can be used in the following way:

rule CollectTargetedPcrMetrics:
    input:
        bam="mapped/{sample}.bam",
        amplicon_intervals="amplicon.list",
        target_intervals="target.list"
    output:
        "stats/{sample}.pcr.txt"
    log:
        "logs/picard/collecttargetedpcrmetrics/{sample}.log"
    params:
        extra=""
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/collecttargetedpcrmetrics"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@mail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


log = snakemake.log_fmt_shell()

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

shell(
    "picard CollectTargetedPcrMetrics "
    "{java_opts} {extra} "
    "INPUT={snakemake.input.bam} "
    "OUTPUT={snakemake.output[0]} "
    "AMPLICON_INTERVALS={snakemake.input.amplicon_intervals} "
    "TARGET_INTERVALS={snakemake.input.target_intervals} "
    "{log}"
)
PICARD CREATESEQUENCEDICTIONARY

Create a .dict file for a given FASTA file

URL:

Example

This wrapper can be used in the following way:

rule create_dict:
    input:
        "genome.fasta"
    output:
        "genome.dict"
    log:
        "logs/picard/create_dict.log"
    params:
        extra=""  # optional: extra arguments for picard.
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/createsequencedictionary"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • FASTA file

Output:

  • .dict file
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard "
    "CreateSequenceDictionary "
    "{java_opts} {extra} "
    "R={snakemake.input[0]} "
    "O={snakemake.output[0]} "
    "{log}"
)
PICARD MARKDUPLICATES

Mark PCR and optical duplicates with picard tools. For more information about MarkDuplicates see picard documentation.

URL:

Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    # optional to specify a list of BAMs; this has the same effect
    # of marking duplicates on separate read groups for a sample
    # and then merging
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/picard/dedup/{sample}.log"
    params:
        extra="REMOVE_DUPLICATES=true"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/markduplicates"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file(s)

Output:

  • bam file with marked or removed duplicates
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
bams = snakemake.input
if isinstance(bams, str):
    bams = [bams]
bams = list(map("INPUT={}".format, bams))

shell(
    "picard MarkDuplicates "  # Tool and its subcommand
    "{java_opts} "  # Automatic java option
    "{extra} "  # User defined parmeters
    "{bams} "  # Input bam(s)
    "OUTPUT={snakemake.output.bam} "  # Output bam
    "METRICS_FILE={snakemake.output.metrics} "  # Output metrics
    "{log}"  # Logging
)
PICARD MARKDUPLICATESWITHMATECIGAR

Mark PCR and optical duplicates with picard tools, taking into account the CIGAR of the mate.

URL:

Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/picard/dedup/{sample}.log"
    params:
        "REMOVE_DUPLICATES=true"
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/markduplicateswithmatecigar"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • bam file

Output:

  • bam file with marked or removed duplicates
Notes
Authors
  • Johannes Köster
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)


shell(
    "picard MarkDuplicatesWithMateCigar {java_opts} {extra} INPUT={snakemake.input} "
    "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
    "{log}"
)
PICARD MERGESAMFILES

Merge sam/bam files using picard tools.

URL:

Example

This wrapper can be used in the following way:

rule merge_bams:
    input:
        expand("mapped/{sample}.bam", sample=["a", "b"])
    output:
        "merged.bam"
    log:
        "logs/picard_mergesamfiles.log"
    params:
        "VALIDATION_STRINGENCY=LENIENT"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/mergesamfiles"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • sam/bam files

Output:

  • merged sam/bam file
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params
java_opts = get_java_opts(snakemake)

inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " MergeSamFiles"
    " {java_opts} {extra}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD MERGEVCFS

Merge vcf files using picard tools.

URL:

Example

This wrapper can be used in the following way:

rule merge_vcfs:
    input:
        vcfs=["snvs.chr1.vcf", "snvs.chr2.vcf"]
    output:
        "snvs.vcf"
    log:
        "logs/picard/mergevcfs.log"
    params:
        extra=""
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/mergevcfs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • vcf files

Output:

  • merged vcf file
Authors
  • Johannes Köster
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


inputs = " ".join("INPUT={}".format(f) for f in snakemake.input.vcfs)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

shell(
    "picard"
    " MergeVcfs"
    " {java_opts}"
    " {extra}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD REVERTSAM

Reverts SAM or BAM files to a previous state. .

URL:

Example

This wrapper can be used in the following way:

rule revert_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "revert/{sample}.bam"
    log:
        "logs/picard/revert_sam/{sample}.log"
    params:
        extra="SANITIZE=true" # optional: Extra arguments for picard.
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/revertsam"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • sam/bam file

Output:

  • sam/bam file.
Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard RevertSam."""

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " RevertSam"
    " {java_opts}"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD SOMTOFASTQ

Converts a SAM or BAM file to FASTQ.

URL:

Example

This wrapper can be used in the following way:

rule bam_to_fastq:
    input:
        "mapped/{sample}.bam"
    output:
        fastq1="reads/{sample}.R1.fastq",
        fastq2="reads/{sample}.R2.fastq"
    log:
        "logs/picard/sam_to_fastq/{sample}.log"
    params:
        extra="" # optional: Extra arguments for picard.
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/samtofastq"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • sam/bam file

Output:

  • fastq files.
Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)

if not isinstance(fastq1, str):
    raise ValueError("f1 needs to be provided")

output = " FASTQ=" + fastq1

if isinstance(fastq2, str):
    output += " SECOND_END_FASTQ=" + fastq2

if isinstance(fastq_unpaired, str):
    if not isinstance(fastq2, str):
        raise ValueError("f2 is required if fastq_unpaired is set")

    output += " UNPAIRED_FASTQ=" + fastq_unpaired

shell(
    "picard"
    " SamToFastq"
    " {java_opts}"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " {output}"
    " {log}"
)
PICARD SORTSAM

Sort sam/bam files using picard tools.

URL:

Example

This wrapper can be used in the following way:

rule sort_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "sorted/{sample}.bam"
    log:
        "logs/picard/sort_sam/{sample}.log"
    params:
        sort_order="coordinate",
        extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/picard/sortsam"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • picard==2.22.1
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • sam/bam file

Output:

  • sorted sam/bam file.
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " SortSam"
    " {java_opts}"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " SORT_ORDER={snakemake.params.sort_order}"
    " {log}"
)

PINDEL

For pindel, the following wrappers are available:

PINDEL

Call variants with pindel.

URL:

Example

This wrapper can be used in the following way:

pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]


rule pindel:
    input:
        ref="genome.fasta",
        # samples to call
        samples=["mapped/a.bam"],
        # bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
        config="pindel_config.txt"
    output:
        expand("pindel/all_{type}", type=pindel_types)
    params:
        # prefix must be consistent with output files
        prefix="pindel/all",
        extra=""  # optional parameters (except -i, -f, -o)
    log:
        "logs/pindel.log"
    threads: 4
    wrapper:
        "v0.87.0/bio/pindel/call"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pindel==0.2.5b8
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
    "-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}"
)
PINDEL2VCF

Convert pindel output to vcf.

URL:

Example

This wrapper can be used in the following way:

rule pindel2vcf:
    input:
        ref="genome.fasta",
        pindel="pindel/all_{type}"
    output:
        "pindel/all_{type}.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.{type}.log"
    wrapper:
        "v0.87.0/bio/pindel/pindel2vcf"

rule pindel2vcf_multi_input:
    input:
        ref="genome.fasta",
        pindel=["pindel/all_D", "pindel/all_INV"]
    output:
        "pindel/all.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.log"
    wrapper:
        "v0.87.0/bio/pindel/pindel2vcf"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pindel==0.2.5b8
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

expected_endings = [
    "INT",
    "D",
    "SI",
    "INV",
    "INV_final",
    "TD",
    "LI",
    "BP",
    "CloseEndMapped",
    "RP",
]


def split_file_name(file_parts, file_ending_index):
    return (
        "_".join(file_parts[:file_ending_index]),
        "_".join(file_parts[file_ending_index:]),
    )


def process_input_path(input_file):
    """
    :params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
    :return: ""/path/to/file", "all"

    """
    file_path, file_name = os.path.split(input_file)
    file_parts = file_name.split("_")
    # seperate ending and name, to name: all ending: D or name: all ending: INV_final
    file_name, file_ending = split_file_name(
        file_parts, -2 if file_name.endswith("_final") else -1
    )
    if not file_ending in expected_endings:
        raise Exception("Unexpected variant type: " + file_ending)
    return file_path, file_name


with tempfile.TemporaryDirectory() as tmpdirname:
    input_flag = "-p"
    input_file = snakemake.input.get("pindel")
    if isinstance(input_file, list) and len(input_file) > 1:
        input_flag = "-P"
        input_path, input_name = process_input_path(input_file[0])
        input_file = os.path.join(input_path, input_name)
        for variant_input in snakemake.input.pindel:
            if not variant_input.startswith(input_file):
                raise Exception(
                    "Unable to extract common path from multi file input, expect path is: "
                    + input_file
                )
            if not os.path.isfile(variant_input):
                raise Exception('Input "' + input_file + '" is not a file!')
            os.symlink(
                os.path.abspath(variant_input),
                os.path.join(tmpdirname, os.path.basename(variant_input)),
            )
        input_file = os.path.join(tmpdirname, input_name)
    shell(
        "pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}"
    )

PLASS

Plass (Protein-Level ASSembler) is software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.

URL:

Example

This wrapper can be used in the following way:

rule plass_paired:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "plass/prot.fasta"
    log:
        "logs/plass.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "v0.87.0/bio/plass"

rule plass_single:
    input:
        single=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
    output:
        "plass/prot_single.fasta"
    log:
        "logs/plass_single.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "v0.87.0/bio/plass"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • plass=2.c7e35
Input/Output

Input:

  • fastq files

Output:

  • fasta containing protein assembly
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PLASS Protein-Level Assembler."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2018, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

# allow multiple input files for single assembly
left = snakemake.input.get("left")
single = snakemake.input.get("single")
assert (
    left is not None or single is not None
), "please check read inputs: either left/right or single read file inputs are required"
if left:
    left = (
        [snakemake.input.left]
        if isinstance(snakemake.input.left, str)
        else snakemake.input.left
    )
    right = snakemake.input.get("right")
    assert (
        right is not None
    ), "please input 'right' reads or specify that the reads are 'single'"
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) == len(
        right
    ), "left input needs to contain the same number of files as the right input"
    input_str_left = " " + " ".join(left)
    input_str_right = " " + " ".join(right)
    input_cmd = input_str_left + " " + input_str_right
else:
    single = (
        [snakemake.input.single]
        if isinstance(snakemake.input.single, str)
        else snakemake.input.single
    )
    input_cmd = " " + " ".join(single)


outdir = path.dirname(snakemake.output[0])
tmpdir = path.join(outdir, "tmp")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "plass assemble {input_cmd} {snakemake.output} {tmpdir} --threads {snakemake.threads} {snakemake.params.extra} {log}"
)

PRESEQ

For preseq, the following wrappers are available:

PRESEQ LC_EXTRAP

preseq estimates the library complexity of existing sequencing data to then estimate the yield of future experiments based on their design. For usage information, please see preseq’s command line help (this seems more up to date than the available documentation from 2014 ). For more information about preseq, also see the source code.

URL:

Example

This wrapper can be used in the following way:

rule preseq_lc_extrap_bam:
    input:
        "samples/{sample}.sorted.bam"
    output:
        "test_bam/{sample}.lc_extrap"
    params:
        "-v"   #optional parameters
    log:
        "logs/test_bam/{sample}.log"
    wrapper:
        "v0.87.0/bio/preseq/lc_extrap"

rule preseq_lc_extrap_bed:
    input:
        "samples/{sample}.sorted.bed"
    output:
        "test_bed/{sample}.lc_extrap"
    params:
        "-v"   #optional parameters
    log:
        "logs/test_bed/{sample}.log"
    wrapper:
        "v0.87.0/bio/preseq/lc_extrap"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • preseq==2.0.3
Input/Output

Input:

  • bed files containing duplicates and sorted by chromosome, start position, strand position and finally strand OR
  • bam files containing duplicates and sorted by using bamtools or samtools sort.

Output:

  • lc_extrap (.lc_extrap)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
    if "-bam" not in (snakemake.input[0]):
        params = "-bam "

shell(
    "(preseq lc_extrap {params} {snakemake.params} {snakemake.input[0]} -output {snakemake.output[0]}) {log}"
)

PRIMERCLIP

Primer trimming on sam file, https://github.com/swiftbiosciences/primerclip

URL:

Example

This wrapper can be used in the following way:

rule primerclip:
    input:
        v0.87.0_file="v0.87.0_file",
        alignment_file="mapped/{sample}.bam"
    output:
        alignment_file="mapped/{sample}.trimmed.bam"
    log:
        "logs/primerclip/{sample}.log"
    params:
        extra=""
    wrapper:
        "v0.87.0/bio/primerclip"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.9
  • primerclip==0.3.8
Input/Output

Input:

  • sam file
  • master primer file

Output:

  • sam file
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

master_file = snakemake.input.master_file
in_alignment_file = snakemake.input.alignment_file
out_alignment_file = snakemake.output.alignment_file

# Check inputs/arguments.
if not isinstance(master_file, str):
    raise ValueError("master_file, path to the master file")

if not isinstance(in_alignment_file, str):
    raise ValueError("in_alignment_file, path to the input alignment file")

if not isinstance(out_alignment_file, str):
    raise ValueError("out_alignment_file, path to the output file")

samtools_input_command = "samtools view -h " + in_alignment_file

samtools_output_command = " | head -n -3 | samtools view -Sh"

if out_alignment_file.endswith(".cram"):
    samtools_output_command += "C -o " + out_alignment_file
elif out_alignment_file.endswith(".sam"):
    samtools_output_command += " -o " + out_alignment_file
else:
    samtools_output_command += "b -o " + out_alignment_file

shell(
    "{samtools_input_command} |"
    " primerclip"
    " {master_file}"
    " /dev/stdin"
    " /dev/stdout"
    " {samtools_output_command}"
    " {log}"
)

PROSOLO

For prosolo, the following wrappers are available:

PROSOLO FDR CONTROL

ProSolo can control the false discovery rate of any combination of its defined single cell events (like the presence of an alternative allele or the dropout of an allele).

URL:

Example

This wrapper can be used in the following way:

rule prosolo_fdr_control:
    input:
         "variant_calling/{sc}.{bulk}.prosolo.bcf"
    output:
         "fdr_control/{sc}.{bulk}.prosolo.fdr.bcf"
    threads:
        1
    params:
        # comma-separated set of events for whose (joint)
        # false discovery rate you want to control
        events = "ADO_TO_REF,HET",
        # false discovery rate to control for
        fdr = 0.05
    log:
        "logs/prosolo_{sc}_{bulk}.fdr.log"
    wrapper:
        "v0.87.0/bio/prosolo/control-fdr"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • prosolo==0.6.1
Input/Output

Input:

  • Variants called with prosolo in vcf or bcf format, including the fine-grained posterior probabilities for single cell events.

Output:

  • bcf file with all variants that satisfy the chosen false discovery rate threshold with regard to the specified events.
Authors
  • David Lähnemann
Code
"""Snakemake wrapper for ProSolo FDR control"""

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "( prosolo control-fdr"
    " {snakemake.input}"
    " --events {snakemake.params.events}"
    " --var SNV"
    " --fdr {snakemake.params.fdr}"
    " --output {snakemake.output} )"
    "{log} "
)
PROSOLO

ProSolo calls variants or other events (like allele dropout) in a single cell sample against a bulk background sample. The single cell should stem from the same population of cells as the bulk background sample. The single cell sample should be amplified using multiple displacement amplification to match ProSolo’s statistical model.

URL:

Example

This wrapper can be used in the following way:

rule prosolo_calling:
    input:
        single_cell = "data/mapped/{sc}.sorted.bam",
        single_cell_index = "data/mapped/{sc}.sorted.bam.bai",
        bulk = "data/mapped/{bulk}.sorted.bam",
        bulk_index = "data/mapped/{bulk}.sorted.bam.bai",
        ref = "data/genome.fa",
        ref_idx = "data/genome.fa.fai",
        candidates = "data/{sc}.{bulk}.prosolo_candidates.bcf",
    output:
        "variant_calling/{sc}.{bulk}.prosolo.bcf"
    params:
        extra = ""
    threads:
        1
    log:
        "logs/prosolo_{sc}_{bulk}.log"
    wrapper:
        "v0.87.0/bio/prosolo/single-cell-bulk"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • prosolo==0.6.1
Input/Output

Input:

  • A position-sorted single cell bam file, with its index.
  • A position-sorted bulk bam file, with its index.
  • A reference genome sequence in fasta format, with its index.
  • A vcf or bcf file specifying candidate sites to perform calling on.

Output:

  • Variants called in bcf format, with fine-grained posterior probabilities for single cell events.
Authors
  • David Lähnemann
Code
"""Snakemake wrapper for ProSolo single-cell-bulk calling"""

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "( prosolo single-cell-bulk "
    "--omit-indels "
    " {snakemake.params.extra} "
    "--candidates {snakemake.input.candidates} "
    "--output {snakemake.output} "
    "{snakemake.input.single_cell} "
    "{snakemake.input.bulk} "
    "{snakemake.input.ref} ) "
    "{log} "
)

PTRIMMER

Tool to trim off the primer sequence from mutiplex amplicon sequencing

URL:

Example

This wrapper can be used in the following way:

rule ptrimmer_pe:
    input:
        r1="resources/a.lane1_R1.fastq.gz",
        r2="resources/a.lane1_R2.fastq.gz",
        primers="resources/primers.txt"
    output:
        r1="results/a.lane1_R1.fq.gz",
        r2="results/a.lane1_R2.fq.gz"
    log:
        "logs/ptrimmer/a.lane.log"
    wrapper:
        "v0.87.0/bio/ptrimmer"

rule ptrimmer_se:
    input:
        r1="resources/a.lane1_R1.fastq.gz",
        primers="resources/primers.txt"
    output:
        r1="results/a.lane1_R1.fq",
    log:
        "logs/ptrimmer/a.lane1.log"
    wrapper:
        "v0.87.0/bio/ptrimmer"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ptrimmer==1.3.3
Authors
  • Felix Mölder
Code
__author__ = "Felix Mölder"
__copyright__ = "Copyright 2020, Felix Mölder"
__email__ = "felix.moelder@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path
import ntpath

input_reads = "-f {r1}".format(r1=snakemake.input.r1)

out_r1 = ntpath.basename(snakemake.output.r1)
output_reads = "-d {o1}".format(o1=out_r1)

if snakemake.input.get("r2", ""):
    seqmode = "pair"
    input_reads = "{reads} -r {r2}".format(reads=input_reads, r2=snakemake.input.r2)
    out_r2 = ntpath.basename(snakemake.output.r2)
    output_reads = "{reads} -e {o2}".format(reads=output_reads, o2=out_r2)
else:
    seqmode = "single"

primers = snakemake.input.primers

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

ptrimmer_params = "-t {mode} {in_reads} -a {primers} {out_reads}".format(
    mode=seqmode, in_reads=input_reads, primers=primers, out_reads=output_reads
)

process_r1 = "mv {out_read} {final_output_path}".format(
    out_read=out_r1, final_output_path=snakemake.output.r1
)

process_r2 = ""
if snakemake.input.get("r2", ""):
    process_r2 = "&& mv {out_read} {final_output_path}".format(
        out_read=out_r2, final_output_path=snakemake.output.r2
    )

shell("(ptrimmer {ptrimmer_params} && {process_r1} {process_r2}) {log}")

PYFASTAQ

For pyfastaq, the following wrappers are available:

PYFASTAQ REPLACE_BASES

Replaces all occurrences of one letter with another.

URL:

Example

This wrapper can be used in the following way:

rule replace_bases:
    input:
        "{sample}.rna.fa"
    output:
        "{sample}.dna.fa",
    params:
        old_base = "U",
        new_base = "T",
    log:
        "logs/fastaq/replace_bases/test/{sample}.log"
    wrapper:
        "v0.87.0/bio/pyfastaq/replace_bases"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • pyfastaq==3.17.0
Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "fastaq replace_bases"
    " {snakemake.input[0]}"
    " {snakemake.output[0]}"
    " {snakemake.params.old_base}"
    " {snakemake.params.new_base}"
    " {log}"
)

QUALIMAP

For qualimap, the following wrappers are available:

QUALIMAP RNASEQ

Run qualimap rnaseq to create a QC report for RNA-seq data.

URL:

Example

This wrapper can be used in the following way:

rule qualimap:
    input:
        # BAM aligned, splicing-aware, to reference genome
        bam="mapped/a.bam",
        # GTF containing transcript, gene, and exon data
        gtf="annotation.gtf"
    output:
        directory("qc/a")
    log:
        "logs/qualimap/rna-seq/a.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    wrapper:
        "v0.87.0/bio/qualimap/rnaseq"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • qualimap==2.2.2d
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • BAM file of RNA-seq data aligned to genome
  • GTF file containing genome annotations

Output:

  • QC report in html/pdf format
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

java_opts = snakemake.params.get("java_opts", "")
if java_opts:
    java_opts_str = f'JAVA_OPTS="{java_opts}"'
else:
    java_opts_str = ""
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "{java_opts_str} qualimap rnaseq {extra} "
    "-bam {snakemake.input.bam} -gtf {snakemake.input.gtf} "
    "-outdir {snakemake.output} "
    "{log}"
)

RASUSA

Randomly subsample sequencing reads to a specified coverage.

URL: https://github.com/mbhall88/rasusa

Example

This wrapper can be used in the following way:

rule subsample:
    input:
        r1="{sample}.r1.fq",
        r2="{sample}.r2.fq",
    output:
        r1="{sample}.subsampled.r1.fq",
        r2="{sample}.subsampled.r2.fq",
    params:
        options="--seed 15",
        genome_size="3mb",  # required, unless `bases` is given
        coverage=20,  # required, unless `bases is given
        #bases="2gb"
    log:
        "logs/subsample/{sample}.log",
    wrapper:
        "v0.87.0/bio/rasusa"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rasusa=0.6
Input/Output

Input:

  • Reads to subsample in FASTA/Q format. Input files can be named or unnamed.

Output:

  • File paths to write subsampled reads to. If using paired-end data, make sure there are two output files in the same order as the input.
Params
  • bases: Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB
    If this option is given, coverage and genome_size are ignored
  • coverage: The desired coverage to sub-sample the reads to.
    If bases is not provided, this option and genome_size are required
  • genome_size: Genome size to calculate coverage with respect to. e.g., 4.3kb, 7Tb, 9000, 4.1MB
    Alternatively, a FASTA/Q index file can be provided and the genome size will be set to the sum of all reference sequences.
    If bases is not provided, this option and coverage are required
  • options: Any other options as listed in the docs.
Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


options = snakemake.params.get("options", "")

bases = snakemake.params.get("bases")
if bases is not None:
    options += " -b {}".format(bases)
else:
    covg = snakemake.params.get("coverage")
    gsize = snakemake.params.get("genome_size")
    if covg is None or gsize is None:
        raise ValueError(
            "If `bases` is not given, then `coverage` and `genome_size` must be"
        )
    options += " -g {gsize} -c {covg}".format(gsize=gsize, covg=covg)

shell("rasusa {options} -i {snakemake.input} -o {snakemake.output} 2> {snakemake.log}")

RAZERS3

Mapping (short) reads against a reference sequence. Can have multiple output formats, please see https://github.com/seqan/seqan/tree/master/apps/razers3

URL:

Example

This wrapper can be used in the following way:

rule razers3:
    input:
        # list of input reads
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        # output format is automatically inferred from file extension. Can be bam/sam or other formats.
        "mapped/{sample}.bam"
    log:
        "logs/razers3/{sample}.log"
    params:
        # the reference genome
        genome="genome.fasta",
        # additional parameters
        extra=""
    threads: 8
    wrapper:
        "v0.87.0/bio/razers3"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • razers3==3.5.8
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "(razers3"
    " -tc {snakemake.threads}"
    " {extra}"
    " -o {snakemake.output[0]}"
    " {snakemake.params.genome}"
    " {snakemake.input.reads})"
    " {log}"
)

RBT

For rbt, the following wrappers are available:

RBT CSV-REPORT

Creates an html report of qc data stored in a csv file. For more details, visit https://github.com/rust-bio/rust-bio-tools

URL:

Example

This wrapper can be used in the following way:

rule csv_report:
    input:
        # a csv formatted file containing the data for the report
        "report.csv",
    output:
        # path to the resulting report directory
        directory("qc_data"),
    params:
        extra="--sort-column 'contig length'",
    log:
        "logs/rbt-csv-report",
    wrapper:
        "v0.87.0/bio/rbt/csvreport"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rust-bio-tools=0.22
Input/Output

Input:

  • A csv file containing the qc report

Output:

  • QC report folder including html document and .xlsx file
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("rbt csv-report {snakemake.input} {snakemake.output} {extra} {log}")

REBALER

Reference-based long read assemblies of bacterial genomes

URL:

Example

This wrapper can be used in the following way:

rule rebaler:
    input:
        reference="ref.fa",
        reads="{sample}.fq",
    output:
        assembly="{sample}.asm.fa",
    log:
        "logs/rebaler/{sample}.log",
    params:
        extra="",
    wrapper:
        "v0.87.0/bio/rebaler"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rebaler==0.2.0
Authors
  • Michael Hall
Code
"""Snakemake wrapper for Rebaler - https://github.com/rrwick/Rebaler"""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"

from snakemake.shell import shell


def get_named_input(name):
    value = snakemake.input.get(name)
    if value is None:
        raise NameError("Missing input named '{}'".format(name))
    return value


def get_named_output(name):
    return snakemake.output.get(name, snakemake.output[0])


log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

reference = get_named_input("reference")
reads = get_named_input("reads")
output = get_named_output("assembly")

shell("rebaler {extra} -t {snakemake.threads} {reference} {reads} > {output} {log}")

REFERENCE

For reference, the following wrappers are available:

ENSEMBL-ANNOTATION

Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.

URL:

Example

This wrapper can be used in the following way:

rule get_annotation:
    output:
        "refs/annotation.gtf"
    params:
        species="homo_sapiens",
        release="87",
        build="GRCh37",
        fmt="gtf",
        flavor="" # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
    log:
        "logs/get_annotation.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "v0.87.0/bio/reference/ensembl-annotation"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • curl
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess
import sys
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
fmt = snakemake.params.fmt
build = snakemake.params.build
flavor = snakemake.params.get("flavor", "")

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

if flavor:
    flavor += "."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

suffix = ""
if fmt == "gtf":
    suffix = "gtf.gz"
elif fmt == "gff3":
    suffix = "gff3.gz"

url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{fmt}/{species}/{species_cap}.{build}.{release}.{flavor}{suffix}".format(
    release=release,
    build=build,
    species=species,
    fmt=fmt,
    species_cap=species.capitalize(),
    suffix=suffix,
    flavor=flavor,
    branch=branch,
)

try:
    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download annotation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-SEQUENCE

Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.

URL:

Example

This wrapper can be used in the following way:

rule get_genome:
    output:
        "refs/genome.fasta"
    params:
        species="saccharomyces_cerevisiae",
        datatype="dna",
        build="R64-1-1",
        release="98"
    log:
        "logs/get_genome.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "v0.87.0/bio/reference/ensembl-sequence"

rule get_chromosome:
    output:
        "refs/chr1.fasta"
    params:
        species="saccharomyces_cerevisiae",
        datatype="dna",
        build="R64-1-1",
        release="101",
        chromosome="I"
    log:
        "logs/get_genome.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "v0.87.0/bio/reference/ensembl-sequence"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • curl
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess as sp
import sys
from itertools import product
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
    build=build, release=release
)

suffixes = ""
datatype = snakemake.params.get("datatype", "")
chromosome = snakemake.params.get("chromosome", "")
if datatype == "dna":
    if chromosome:
        suffixes = ["dna.chromosome.{}.fa.gz".format(chromosome)]
    else:
        suffixes = ["dna.primary_assembly.fa.gz", "dna.toplevel.fa.gz"]
elif datatype == "cdna":
    suffixes = ["cdna.all.fa.gz"]
elif datatype == "cds":
    suffixes = ["cds.all.fa.gz"]
elif datatype == "ncrna":
    suffixes = ["ncrna.fa.gz"]
elif datatype == "pep":
    suffixes = ["pep.all.fa.gz"]
else:
    raise ValueError("invalid datatype, must be one of dna, cdna, cds, ncrna, pep")

if chromosome:
    if not datatype == "dna":
        raise ValueError(
            "invalid datatype, to select a single chromosome the datatype must be dna"
        )

success = False
for suffix in suffixes:
    url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species_cap}.{spec}.{suffix}".format(
        release=release,
        species=species,
        datatype=datatype,
        spec=spec.format(build=build, release=release),
        suffix=suffix,
        species_cap=species.capitalize(),
        branch=branch,
    )

    try:
        shell("curl -sSf {url} > /dev/null 2> /dev/null")
    except sp.CalledProcessError:
        continue

    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
    success = True
    break

if not success:
    print(
        "Unable to download requested sequence data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-VARIATION

Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.

URL:

Example

This wrapper can be used in the following way:

rule get_variation:
    # Optional: add fai as input to get VCF with annotated contig lengths (as required by GATK)
    # and properly sorted VCFs.
    # input:
    #     fai="refs/genome.fasta.fai"
    output:
        vcf="refs/variation.vcf.gz",
    params:
        species="saccharomyces_cerevisiae",
        release="98",  # releases <98 are unsupported
        build="R64-1-1",
        type="all",  # one of "all", "somatic", "structural_variation"
        # chromosome="21", # optionally constrain to chromosome, only supported for homo_sapiens
    log:
        "logs/get_variation.log",
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "v0.87.0/bio/reference/ensembl-variation"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bcftools=1.11
  • curl
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import tempfile
import subprocess
import sys
import os
from snakemake.shell import shell
from snakemake.exceptions import WorkflowError

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
type = snakemake.params.type
chromosome = snakemake.params.get("chromosome", "")

if release < 98:
    print("Ensembl releases <98 are unsupported.", file=open(snakemake.log[0], "w"))
    exit(1)

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

if chromosome and type != "all":
    raise ValueError(
        "Parameter chromosome given but chromosome-wise download"
        "is only implemented for type='all'."
    )

if type == "all":
    if species == "homo_sapiens" and release >= 93:
        chroms = (
            list(range(1, 23)) + ["X", "Y", "MT"] if not chromosome else [chromosome]
        )
        suffixes = ["-chr{}".format(chrom) for chrom in chroms]
    else:
        if chromosome:
            raise ValueError(
                "Parameter chromosome given but chromosome-wise download"
                "is only implemented for homo_sapiens in releases >=93."
            )
        suffixes = [""]
elif type == "somatic":
    suffixes = ["_somatic"]
elif type == "structural_variations":
    suffixes = ["_structural_variations"]
else:
    raise ValueError(
        "Unsupported type {} (only all, somatic, structural_variations are allowed)".format(
            type
        )
    )

species_filename = species if release >= 91 else species.capitalize()

urls = [
    "ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.{ext}".format(
        release=release,
        species=species,
        suffix=suffix,
        species_filename=species_filename,
        branch=branch,
        ext=ext,
    )
    for suffix in suffixes
    for ext in ["vcf.gz", "vcf.gz.csi"]
]
names = [os.path.basename(url) for url in urls if url.endswith(".gz")]

try:
    gather = "curl {urls}".format(urls=" ".join(map("-O {}".format, urls)))
    workdir = os.getcwd()
    with tempfile.TemporaryDirectory() as tmpdir:
        if snakemake.input.get("fai"):
            shell(
                "(cd {tmpdir}; {gather} && "
                "bcftools concat -Oz --naive {names} > concat.vcf.gz && "
                "bcftools reheader --fai {workdir}/{snakemake.input.fai} concat.vcf.gz "
                "> {workdir}/{snakemake.output}) {log}"
            )
        else:
            shell(
                "(cd {tmpdir}; {gather} && "
                "bcftools concat -Oz --naive {names} "
                "> {workdir}/{snakemake.output}) {log}"
            )
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download variation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided? ",
        file=sys.stderr,
    )
    exit(1)

REFGENIE

Deploy biomedical reference datasets via refgenie.

URL:

Example

This wrapper can be used in the following way:

rule obtain_asset:
    output:
        # the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
        fai="refs/genome.fasta"
        # Multiple outputs/seek keys are possible here.
    params:
        genome="human_alu",
        asset="fasta",
        tag="default"
    wrapper:
        "v0.87.0/bio/refgenie"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • refgenie=0.9.2
  • refgenconf=0.9.0
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import refgenconf

genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag

conf_path = os.environ["REFGENIE"]

rgc = refgenconf.RefGenConf(conf_path, writable=True)

# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)

for seek_key, out in snakemake.output.items():
    path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
    os.symlink(path, out)

RSEM

For rsem, the following wrappers are available:

RSEM CALCULATE EXPRESSION

Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data.

URL:

Example

This wrapper can be used in the following way:

rule calculate_expression:
    input:
        # input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end)
        # an aligned to transcriptome BAM
        bam="mapped/a.bam",
        # one of the index files created by rsem-prepare-reference; the file suffix is stripped and passed on to rsem
        reference="index/reference.seq",
    output:
        # genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix
        # this file contains per-gene quantification data for the sample
        genes_results="output/a.genes.results",
        # isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results
        # this file contains per-transcript quantification data for the sample
        isoforms_results="output/a.isoforms.results",
    params:
        # optional, specify if sequencing is paired-end
        paired_end=True,
        # additional optional parameters to pass to rsem, for example,
        extra="--seed 42",
    log:
        "logs/rsem/calculate_expression/a.log",
    wrapper:
        "v0.87.0/bio/rsem/calculate-expression"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rsem==1.3.3
Input/Output

Input:

  • BAM aligned to transcriptome

Output:

  • Per-gene and per-isoform read quantification.
Notes
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

bam = snakemake.input.get("bam", "")
fq_one = snakemake.input.get("fq_one", "")
fq_two = snakemake.input.get("fq_two", "")
if bam:
    if fq_one:
        raise Exception("Only input.bam or input.fq_one expected, got both.")
    input_bam = " --bam"
    input_string = bam
    paired_end = snakemake.params.get("paired-end", False)
else:
    input_bam = ""
    if fq:
        input_bam = False
        if isinstance(fq, list):
            num_fq_one = len(fq)
            input_string = ",".join(fq)
        else:
            num_fq_one = 1
            input_string = fq
        if fq_two:
            paired_end = True
            if isinstance(fq_two, list):
                num_fq_two = len(fq_two)
                if num_fq_one != num_fq_two:
                    raise Exception(
                        "Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two)
                    )
            else:
                fq_two = [fq_two]
            input_string += " " + ",".join(fq_two)
        else:
            paired_end = False
    else:
        raise Exception("Expected input.bam or input.fq_one, got neither.")

if paired_end:
    paired_end_string = "--paired-end"
else:
    paired_end_string = ""

genes_results = snakemake.output.genes_results
if genes_results.endswith(".genes.results"):
    output_prefix = genes_results[: -len(".genes.results")]
else:
    raise Exception(
        "output.genes_results file name malformed "
        "(rsem will append .genes.results suffix)"
    )
if not snakemake.output.isoforms_results.endswith(".isoforms.results"):
    raise Exception(
        "output.isoforms_results file name malformed "
        "(rsem will append .isoforms.results suffix)"
    )

reference_prefix = os.path.splitext(snakemake.input.reference)[0]

extra = snakemake.params.get("extra", "")
threads = snakemake.threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "rsem-calculate-expression --num-threads {snakemake.threads} {extra} "
    "{paired_end_string} {input_bam} {input_string} "
    "{reference_prefix} {output_prefix} "
    "{log}"
)
RSEM GENERATE DATA MATRIX

Run rsem-generate-data-matrix to combine a set of single-sample rsem results into a single matrix.

URL:

Example

This wrapper can be used in the following way:

rule rsem_generate_data_matrix:
    input:
        # one or more expression files created by rsem-calculate-expression
        ["a.genes.results", "b.genes.results"],
    output:
        # a tsv containing each sample in the input as a column
        "genes.results",
    params:
        # optional additional parameters
        extra="",
    log:
        "logs/rsem/generate_data_matrix.log",
    wrapper:
        "v0.87.0/bio/rsem/generate-data-matrix"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rsem==1.3.3
Input/Output

Input:

  • a list of rsem results files

Output:

  • Quantification results summarized by allele/gene/isoform per sample
Notes
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
    "rsem-generate-data-matrix {extra} "
    "{snakemake.input} > {snakemake.output} "
    "{log}"
)
RSEM PREPARE REFERENCE

Run rsem-prepare-reference to create index files for downstream analysis with rsem.

URL:

Example

This wrapper can be used in the following way:

rule prepare_reference:
    input:
        # reference FASTA with either the entire genome or transcript sequences
        reference_genome="genome.fasta",
    output:
        # one of the index files created and used by RSEM (required)
        seq="index/reference.seq",
        # RSEM produces a number of other files which may optionally be specified as output; these may be provided so that snakemake is aware of them, but the wrapper doesn't do anything with this information other than to verify that the file path prefixes match that of output.seq.
        # for example,
        grp="index/reference.grp",
        ti="index/reference.ti",
    params:
        # optional additional parameters, for example,
        #extra="--gtf annotations.gtf",
        # if building the index against a reference transcript set
        extra="",
    log:
        "logs/rsem/prepare-reference.log",
    wrapper:
        "v0.87.0/bio/rsem/prepare-reference"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • rsem==1.3.3
Input/Output

Input:

  • reference genome
  • additional optional arguments

Output:

  • index files for downstream use with rsem
Notes
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

# the reference_name argument is inferred by stripping the .seq suffix from
# the output.seq value
output_directory = os.path.dirname(os.path.abspath(snakemake.output.seq))
seq_file = os.path.basename(snakemake.output.seq)
if seq_file.endswith(".seq"):
    reference_name = os.path.join(output_directory, seq_file[:-4])
else:
    raise Exception("output.seq has an invalid file suffix (must be .seq)")

for output_variable, output_path in snakemake.output.items():
    if not os.path.abspath(output_path).startswith(reference_name):
        raise Exception(
            "the path for {} is inconsistent with that of output.seq".format(
                output_variable
            )
        )

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "rsem-prepare-reference --num-threads {snakemake.threads} {extra} "
    "{snakemake.input.reference_genome} {reference_name} "
    "{log}"
)

RUBIC

RUBIC detects recurrent copy number alterations using copy number breaks.

URL:

Example

This wrapper can be used in the following way:

rule rubic:
    input:
        seg="{samples}/segments.txt",
        markers="{samples}/markers.txt"
    output:
        out_gains="{samples}/gains.txt",
        out_losses="{samples}/losses.txt",
        out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
    params:
        fdr="",
        genefile=""
    wrapper:
        "v0.87.0/bio/rubic"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • r-base=3.4.1
  • r-rubic=1.0.3
  • r-data.table=1.10.4
  • r-pracma=2.0.4
  • r-ggplot2=2.2.1
  • r-gtable=0.2.0
  • r-codetools=0.2_15
  • r-digest=0.6.12
Input/Output

Input:

  • seg
  • markers

Output:

  • out_gains
  • out_losses
  • out_plots
Params
  • fdr: false discovery rate (optional, leave empty to use default value of 0.25)
  • genefile: file path to use custom gene file (optional, leave empty to use default file)
Authors
  • Beatrice F. Tan
Code
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"

library(RUBIC)

all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]

rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])

SALMON

For salmon, the following wrappers are available:

SALMON_INDEX

Index a transcriptome assembly with salmon

URL:

Example

This wrapper can be used in the following way:

rule salmon_index:
    input:
        "assembly/transcriptome.fasta"
    output:
        directory("salmon/transcriptome_index")
    log:
        "logs/salmon/transcriptome_index.log"
    threads: 2
    params:
        # optional parameters
        extra=""
    wrapper:
        "v0.87.0/bio/salmon/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • salmon==0.14.1
Input/Output

Input:

  • assembly fasta

Output:

  • indexed assembly
Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Index."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "salmon index -t {snakemake.input} -i {snakemake.output} "
    " --threads {snakemake.threads} {extra} {log}"
)
SALMON_QUANT

Quantify transcripts with salmon

URL:

Example

This wrapper can be used in the following way:

rule salmon_quant_reads:
    input:
        # If you have multiple fastq files for a single sample (e.g. technical replicates)
        # use a list for r1 and r2.
        r1 = "reads/{sample}_1.fq.gz",
        r2 = "reads/{sample}_2.fq.gz",
        index = "salmon/transcriptome_index"
    output:
        quant = 'salmon/{sample}/quant.sf',
        lib = 'salmon/{sample}/lib_format_counts.json'
    log:
        'logs/salmon/{sample}.log'
    params:
        # optional parameters
        libtype ="A",
        #zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
        extra=""
    threads: 2
    wrapper:
        "v0.87.0/bio/salmon/quant"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • salmon==0.14.1
Input/Output

Input:

  • assembly index, fastq files

Output:

  • quantification files
Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Quant"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell


def manual_decompression(reads, zip_ext):
    """Allow *.bz2 input into salmon. Also provide same
    decompression for *gz files, as salmon devs mention
    it may be faster in some cases."""
    if zip_ext and reads:
        if zip_ext == "bz2":
            reads = " < (bunzip2 -c " + reads + ")"
        elif zip_ext == "gz":
            reads = " < (gunzip -c " + reads + ")"
    return reads


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")

r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")

assert (
    r1 is not None and r2 is not None
) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
    r1 = (
        [snakemake.input.r1]
        if isinstance(snakemake.input.r1, str)
        else snakemake.input.r1
    )
    r2 = (
        [snakemake.input.r2]
        if isinstance(snakemake.input.r2, str)
        else snakemake.input.r2
    )
    assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
    r1_cmd = " -1 " + manual_decompression(" ".join(r1), zip_extension)
    r2_cmd = " -2 " + manual_decompression(" ".join(r2), zip_extension)
    read_cmd = " ".join([r1_cmd, r2_cmd])
if r:
    assert (
        r1 is None and r2 is None
    ), "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
    r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
    read_cmd = " -r " + manual_decompression(" ".join(r), zip_extension)

outdir = path.dirname(snakemake.output.get("quant"))

shell(
    "salmon quant -i {snakemake.input.index} "
    " -l {libtype} {read_cmd} -o {outdir} "
    " -p {snakemake.threads} {extra} {log} "
)

SAMBAMBA

For sambamba, the following wrappers are available:

SAMBAMBA FLAGSTAT

Outputs some statistics drawn from read flags. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-flagstat.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_flagstat:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.stats.txt"
    params:
        extra=""  # optional parameters
    log:
        "logs/sambamba-flagstat/{sample}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/sambamba/flagstat"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • bam file

Output:

  • flag statistics
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "sambamba flagstat {snakemake.params.extra} -t {snakemake.threads} "
    "{snakemake.input[0]} > {snakemake.output[0]} "
    "{log}"
)
SAMBAMBA INDEX

Indexing a bam file with `sambamba https://lomereiter.github.io/sambamba/docs/sambamba-index.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_index:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.bam.bai"
    params:
        extra=""  # optional parameters
    log:
        "logs/sambamba-index/{sample}.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/sambamba/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • bam file

Output:

  • bam index
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sambamba index {snakemake.params.extra} -t {snakemake.threads} "
    "{snakemake.input[0]} {snakemake.output[0]} "
    "{log}"
)
SAMBAMBA MARKDUP

Marks (default) or removes duplicate reads in BAM file. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-markdup.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_markdup:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.rmdup.bam"
    params:
        extra="-r"  # optional parameters
    log:
        "logs/sambamba-markdup/{sample}.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/sambamba/markdup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • bam file

Output:

  • deduplicated bam file
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sambamba markdup {snakemake.params.extra} -t {snakemake.threads} "
    "{snakemake.input[0]} {snakemake.output[0]} "
    "{log}"
)
SAMBAMBA MERGE

merge multiple BAM files into one using `sambamba https://lomereiter.github.io/sambamba/docs/sambamba-merge.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_merge:
    input:
        ["mapped/{sample}_1.sorted.bam", "mapped/{sample}_2.sorted.bam"]
    output:
        "mapped/{sample}.merged.bam"
    params:
        extra=""  # optional parameters
    log:
        "logs/sambamba-merge/{sample}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/sambamba/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • sorted bam files

Output:

  • merged bam file
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sambamba merge {snakemake.params.extra} -t {snakemake.threads} "
    "{snakemake.output[0]} {snakemake.input} "
    "{log}"
)
SAMBAMBA SLICE

Fast tool for copying a slice of a BAM file. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-slice.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_slice:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai"
    output:
        "mapped/{sample}.region.bam"
    params:
        region="xx:1-10"  # region to catch (contig:start-end)
    log:
        "logs/sambamba-slice/{sample}.log"
    wrapper:
        "v0.87.0/bio/sambamba/slice"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • coordinate-sorted and indexed bam file

Output:

  • new bam file with specific region
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "sambamba slice "
    "{snakemake.input[0]} {snakemake.params.region} > {snakemake.output[0]} "
    "{log}"
)
SAMBAMBA SORT

Sort bam file with sambamba

URL:

Example

This wrapper can be used in the following way:

rule sambamba_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        ""  # optional parameters
    log:
        "logs/sambamba-sort/{sample}.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/sambamba/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • bam file

Output:

  • sorted bam file
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sambamba sort {snakemake.params} -t {snakemake.threads} "
    "-o {snakemake.output[0]} {snakemake.input[0]} "
    "{log}"
)
SAMBAMBA VIEW

Filter and/or view BAM files. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-view.html`_

URL:

Example

This wrapper can be used in the following way:

rule sambamba_view:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.filtered.bam"
    params:
        extra="-f bam -F 'mapping_quality >= 50'"  # optional parameters
    log:
        "logs/sambamba-view/{sample}.log"
    threads: 8
    wrapper:
        "v0.87.0/bio/sambamba/view"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sambamba==0.8.0
Input/Output

Input:

  • bam/sam file

Output:

  • (filtered) bam/sam file
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os

from snakemake.shell import shell

in_file = snakemake.input[0]
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

if in_file.endswith(".sam") and ("-S" not in extra or "--sam-input" not in extra):
    extra += " --sam-input"

shell(
    "sambamba view {extra} -t {snakemake.threads} "
    "{snakemake.input[0]} > {snakemake.output[0]} "
    "{log}"
)

SAMTOOLS

For samtools, the following wrappers are available:

SAMTOOLS CALMD

Calculates MD and NM tags. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_calmd:
    input:
        aln = "{sample}.bam", # Can be 'sam', 'bam', or 'cram'
        ref = "genome.fasta"
    output:
        "{sample}.calmd.bam"
    params:
        "-E" # optional params string
    threads: 2
    wrapper:
        "v0.87.0/bio/samtools/calmd"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.11
Authors
  • Filipe G. Vieira
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"


from os import path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

out_name, out_ext = path.splitext(snakemake.output[0])
out_ext = out_ext[1:].upper()

shell(
    "samtools calmd --threads {snakemake.threads} {snakemake.params} --output-fmt {out_ext} {snakemake.input.aln} {snakemake.input.ref} > {snakemake.output[0]} {log}"
)
SAMTOOLS DEPTH

Compute the read depth at each position or region using samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_depth:
    input:
        bams=["mapped/A.bam", "mapped/B.bam"],
        bed="regionToCalcDepth.bed", # optional
    output:
        "depth.txt"
    params:
        # optional bed file passed to -b
        extra="" # optional additional parameters as string
    wrapper:
        "v0.87.0/bio/samtools/depth"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Authors
  • Dayne Filer
Code
"""Snakemake wrapper for running samtools depth."""

__author__ = "Dayne L Filer"
__copyright__ = "Copyright 2020, Dayne L Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

params = snakemake.params.get("extra", "")

# check for optional bed file
bed = snakemake.input.get("bed", "")
if bed:
    bed = "-b " + bed

shell(
    "samtools depth {params} {bed} "
    "-o {snakemake.output[0]} {snakemake.input.bams} {log}"
)
SAMTOOLS FAIDX

index reference sequence in FASTA format from reference sequence. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "{sample}.fa"
    output:
        "{sample}.fa.fai"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/samtools/faidx"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • reference sequence file (.fa)

Output:

  • indexed reference sequence file (.fai)
Authors
  • Michael Chambers
Code
__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "greenkidneybean@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "samtools faidx {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]} {log}"
)
SAMTOOLS FASTQ INTERLEAVED

Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.

URL:

Example

This wrapper can be used in the following way:

rule samtools_fastq_interleaved:
    input:
        "mapped/{sample}.bam",
    output:
        "reads/{sample}.fq",
    log:
        "{sample}.interleaved.log",
    params:
        " ",
    threads: 3
    wrapper:
        "v0.87.0/bio/samtools/fastq/interleaved"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools=1.14
Notes
Authors
  • David Laehnemann
  • Victoria Sack
  • Filipe G. Vieira
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

prefix = os.path.splitext(snakemake.output[0])[0]

shell(
    "samtools fastq {snakemake.params} "
    " -@ {snakemake.threads} "
    " {snakemake.input[0]}"
    " > {snakemake.output[0]} "
    "{log}"
)
SAMTOOLS FASTQ SEPARATE

Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools fastq documentation), as are secondary (0x100) and supplementary reads (0x800).

URL:

Example

This wrapper can be used in the following way:

rule samtools_fastq_separate:
    input:
        "mapped/{sample}.bam",
    output:
        "reads/{sample}.1.fq",
        "reads/{sample}.2.fq",
    log:
        "{sample}.separate.log",
    params:
        sort="-m 4G",
        fastq="-n",
    # Remember, this is the number of samtools' additional threads. At least 2 threads have to be requested on cluster sumbission. This value - 2 will be sent to samtools sort -@ argument.
    threads: 3
    wrapper:
        "v0.87.0/bio/samtools/fastq/separate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools=1.14
  • snakemake-wrapper-utils=0.3
Notes
Authors
  • David Laehnemann
  • Victoria Sack
  • Filipe G. Vieira
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem

params_sort = snakemake.params.get("sort", "")
params_fastq = snakemake.params.get("fastq", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools fastq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = 0 if snakemake.threads <= 2 else snakemake.threads - 2

mem = get_mem(snakemake, "MiB")
mem = "-m {0:.0f}M".format(mem / threads) if mem and threads else ""

with tempfile.TemporaryDirectory() as tmpdir:
    tmp_prefix = Path(tmpdir) / "samtools_fastq.sort_"

    shell(
        "(samtools sort -n"
        " --threads {threads}"
        " {mem}"
        " -T {tmp_prefix}"
        " {params_sort}"
        " {snakemake.input[0]} | "
        "samtools fastq"
        " {params_fastq}"
        " -1 {snakemake.output[0]}"
        " -2 {snakemake.output[1]}"
        " -0 /dev/null"
        " -s /dev/null"
        " -F 0x900"
        " - "
        ") {log}"
    )
SAMTOOLS FASTX

Converts a SAM, BAM or CRAM into FASTQ or FASTA format.

URL:

Example

This wrapper can be used in the following way:

rule samtools_fastq:
    input:
        "{prefix}.sam",
    output:
        "{prefix}.fasta",
    log:
        "{prefix}.log",
    message:
        ""
    threads:  # Samtools takes additional threads through its option -@
        2     # This value - 1 will be sent to -@
    params:
        outputtype = "fasta",
        extra = ""
    wrapper:
        "v0.87.0/bio/samtools/fastx/"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.12
Input/Output

Input:

  • bam or sam file (.bam, .sam)

Output:

  • fastq file (.fastq) or fasta file (.fasta)
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")
# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    """
    (samtools {snakemake.params.outputtype} \
        {threads} {extra} \
        {snakemake.input} > {snakemake.output}) {log}
    """
)
SAMTOOLS FIXMATE

Use samtools to correct mate information after BWA mapping. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_fixmate:
    input:
        "mapped/{input}"
    output:
        "fixed/{input}"
    message:
        "Fixing mate information in {wildcards.input}"
    threads:
        1
    params:
        extra = ""
    wrapper:
        "v0.87.0/bio/samtools/fixmate/"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • bam or sam file (.bam,.sam)

Output:

  • bam or sam file (.bam,.sam)
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for samtools fixmate"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

# Samtools' threads parameter lists ADDITIONAL threads.
# that is why threads - 1 has to be given to the -@ parameter
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

makedirs(op.dirname(snakemake.output[0]))

shell(
    "samtools fixmate {extra} {threads}"
    " {snakemake.input[0]} {snakemake.output[0]} {log}"
)
SAMTOOLS FLAGSTAT

Use samtools to create a flagstat file from a bam or sam file. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_flagstat:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.bam.flagstat"
    wrapper:
        "v0.87.0/bio/samtools/flagstat"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • bam or sam file (.bam,.sam)

Output:

  • flagstat file (.flagstat)
Authors
  • Christopher Preusch
Code
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]} {log}")
SAMTOOLS IDXSTATS

Use samtools to retrieve and print stats form indexed bam, sam or cram files. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_idxstats:
    input:
        bam="mapped/{sample}.bam",
        idx="mapped/{sample}.bam.bai"
    output:
        "mapped/{sample}.bam.idxstats"
    log:
        "logs/samtools/idxstats/{sample}.log"
    wrapper:
        "v0.87.0/bio/samtools/idxstats"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • indexed sam, bam or cram file (.sam, .bam, .cram)
  • corresponding index files

Output:

  • idxstat file (.idxstats)
Authors
  • Antonie Vietor
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("samtools idxstats {snakemake.input.bam} > {snakemake.output[0]} {log}")
SAMTOOLS INDEX

Index bam file with samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "mapped/{sample}.sorted.bam"
    output:
        "mapped/{sample}.sorted.bam.bai"
    log:
        "logs/samtools_index/{sample}.log"
    params:
        "" # optional params string
    threads:  # Samtools takes additional threads through its option -@
        4     # This value - 1 will be sent to -@
    wrapper:
        "v0.87.0/bio/samtools/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • bam file

Output:

  • bam file index (.bai)
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools index {threads} {snakemake.params} {snakemake.input[0]} {snakemake.output[0]} {log}"
)
SAMTOOLS MERGE

Merge two bam files with samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_merge:
    input:
        ["mapped/A.bam", "mapped/B.bam"]
    output:
        "merged.bam"
    params:
        "" # optional additional parameters as string
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@
    wrapper:
        "v0.87.0/bio/samtools/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Input/Output

Input:

  • list of bam files to merge

Output:

  • merged bam file
Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools merge {threads} {snakemake.params} "
    "{snakemake.output[0]} {snakemake.input} "
    "{log}"
)
SAMTOOLS MPILEUP

Generate pileup using samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule mpilup:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        reference_genome="genome.fasta"
    output:
        "mpileup/{sample}.mpileup.gz"
    log:
        "logs/samtools/mpileup/{sample}.log"
    params:
        extra="-d 10000",  # optional
    wrapper:
        "v0.87.0/bio/samtools/mpileup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
  • pigz==2.3.4
Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for running mpileup."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam_input = snakemake.input.bam
reference_genome = snakemake.input.reference_genome

extra = snakemake.params.get("extra", "")

if not snakemake.output[0].endswith(".gz"):
    raise Exception(
        'output file will be compressed and therefore filename should end with ".gz"'
    )

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "samtools mpileup "
    "{extra} "
    "-f {reference_genome} "
    "{bam_input}  "
    " | pigz > {snakemake.output} "
    "{log}"
)
SAMTOOLS SORT

Sort bam file with samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        extra = "-m 4G",
        tmp_dir = "/tmp/"
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@.
    wrapper:
        "v0.87.0/bio/samtools/sort"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

out_name, out_ext = os.path.splitext(snakemake.output[0])

tmp_dir = snakemake.params.get("tmp_dir", "")
if tmp_dir:
    prefix = os.path.join(tmp_dir, os.path.basename(out_name))
else:
    prefix = out_name

# Samtools takes additional threads through its option -@
# One thread for samtools
# Other threads are *additional* threads passed to the argument -@
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools sort {extra} {threads} -o {snakemake.output[0]} "
    "-T {prefix} {snakemake.input[0]} "
    "{log}"
)
SAMTOOLS STATS

Generate stats using samtools. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_stats:
    input:
        "mapped/{sample}.bam"
    output:
        "samtools_stats/{sample}.txt"
    params:
        extra="",                       # Optional: extra arguments.
        region="xx:1000000-2000000"      # Optional: region string.
    log:
        "logs/samtools_stats/{sample}.log"
    wrapper:
        "v0.87.0/bio/samtools/stats"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.10
Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell("samtools stats {extra} {snakemake.input} {region} > {snakemake.output} {log}")
SAMTOOLS VIEW

Convert or filter SAM/BAM. For more information see SAMtools documentation.

URL:

Example

This wrapper can be used in the following way:

rule samtools_view:
    input:
        "{sample}.sam"
    output:
        "{sample}.bam"
    log:
        "{sample}.log"
    params:
        extra="" # optional params string
    wrapper:
        "v0.87.0/bio/samtools/view"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • samtools==1.12
  • snakemake-wrapper-utils==0.2.0
Input/Output

Input:

  • SAM/BAM/CRAM file

Output:

  • SAM/BAM/CRAM file
Notes
Authors
  • Johannes Köster
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts


samtools_opts = get_samtools_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)


shell(
    "samtools view {snakemake.params.extra} {samtools_opts} -o {snakemake.output[0]} {snakemake.input[0]} {log}"
)

SEQTK

For seqtk, the following wrappers are available:

SEQTK MERGEPE

Interleave two paired-end FASTA/Q files

URL: https://github.com/lh3/seqtk

Example

This wrapper can be used in the following way:

rule seqtk_mergepe:
    input:
        r1="{sample}.1.fastq.gz",
        r2="{sample}.2.fastq.gz",
    output:
        merged="{sample}.merged.fastq.gz",
    params:
        compress_lvl=9,
    log:
        "logs/seqtk_mergepe/{sample}.log",
    threads: 2
    wrapper:
        "v0.87.0/bio/seqtk/mergepe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • seqtk=1.3
  • pigz=2.3
Input/Output

Input:

  • paired fastq files - can be compressed in gzip format (*.gz).

Output:

  • a single, interleaved FASTA/Q file. By default, the output will be compressed, use the param compress_lvl to change this.
Params
  • compress_lvl: Regulate the speed of compression using the specified digit, where 1 indicates the fastest compression method (less compression) and 9 indicates the slowest compression method (best compression). 0 is no compression. 11 gives a few percent better compression at a severe cost in execution time, using the zopfli algorithm. The default is 6.
Notes

Multiple threads can be used during compression of the output file with pigz.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for interleaving reads from paired FASTA/Q files using seqtk."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2021, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=False)
compress_lvl = int(snakemake.params.get("compress_lvl", 6))

shell(
    "(seqtk mergepe {snakemake.input} "
    "| pigz -{compress_lvl} -c -p {snakemake.threads}) > {snakemake.output} {log}"
)
SEQTK-SEQ

Common transformations of FASTA/Q using seqtk

URL: https://github.com/lh3/seqtk

Example

This wrapper can be used in the following way:

rule seqtk_seq_fastq_to_fasta:
    input:
        "{prefix}.fastq",
    output:
        "{prefix}.fasta",
    log:
        "{prefix}.log",
    params:
        extra="-A",
    wrapper:
        "v0.87.0/bio/seqtk/seq"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • seqtk==1.3
Input/Output

Input:

  • fastn file (can be gzip compressed)

Output:

  • fastn file (gzip compressed)
Authors
  • William Rowell
Code
"""Snakemake wrapper seqtk seq subcommand"""

__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("(seqtk seq {extra} {snakemake.input} > {snakemake.output}) {log}")
SEQTK-SUBSAMPLE-PE

Subsample reads from paired FASTQ files

URL:

Example

This wrapper can be used in the following way:

rule seqtk_subsample_pe:
    input:
        f1="{sample}.1.fastq.gz",
        f2="{sample}.2.fastq.gz"
    output:
        f1="{sample}.1.subsampled.fastq.gz",
        f2="{sample}.2.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "v0.87.0/bio/seqtk/subsample/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • seqtk==1.3
  • pigz=2.3
Input/Output

Input:

  • paired fastq files (can be gzip compressed)

Output:

  • subsampled paired fastq files (gzip compressed)
Params
  • n: number of reads after subsampling
  • seed: seed to initialize a pseudorandom number generator
Authors
  • Fabian Kilpert
Code
"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f1} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f1} "
    "&& "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input.f2} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output.f2} "
    ") {log} "
)
SEQTK-SUBSAMPLE-SE

Subsample reads from FASTQ file

URL:

Example

This wrapper can be used in the following way:

rule seqtk_subsample_se:
    input:
        "{sample}.fastq.gz"
    output:
        "{sample}.subsampled.fastq.gz"
    params:
        n=3,
        seed=12345
    log:
        "logs/seqtk_subsample/{sample}.log"
    threads:
        1
    wrapper:
        "v0.87.0/bio/seqtk/subsample/se"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • seqtk==1.3
  • pigz=2.3
Input/Output

Input:

  • fastq file (can be gzip compressed)

Output:

  • subsampled fastq file (gzip compressed)
Params
  • n: number of reads after subsampling
  • seed: seed to initialize a pseudorandom number generator
Authors
  • Fabian Kilpert
Code
"""Snakemake wrapper for subsampling reads from FASTQ file using seqtk."""

__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "( "
    "seqtk sample "
    "-s {snakemake.params.seed} "
    "{snakemake.input} "
    "{snakemake.params.n} "
    "| pigz -9 -p {snakemake.threads} "
    "> {snakemake.output} "
    ") {log} "
)

SHOVILL

Assemble bacterial isolate genomes from Illumina paired-end reads.

URL:

Example

This wrapper can be used in the following way:

rule shovill:
  input:
    r1="reads/{sample}_R1.fq.gz",
    r2="reads/{sample}_R2.fq.gz"
  output:
    raw_assembly="assembly/{sample}.{assembler}.assembly.fa",
    contigs="assembly/{sample}.{assembler}.contigs.fa"
  params:
    extra=""
  log:
    "logs/shovill/{sample}.{assembler}.log"
  threads: 1
  wrapper:
    "v0.87.0/bio/shovill"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • shovill==1.1.0
Authors
  • Sangram Keshari Sahu
Code
"""Snakemake wrapper for shovill."""

__author__ = "Sangram Keshari Sahu"
__copyright__ = "Copyright 2020, Sangram Keshari Sahu"
__email__ = "sangramsahu15@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from tempfile import TemporaryDirectory

# Placeholder for optional parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
params = snakemake.params.get("extra", "")

with TemporaryDirectory() as tempdir:
    shell(
        "(shovill"
        " --assembler {snakemake.wildcards.assembler}"
        " --outdir {tempdir} --force"
        " --R1 {snakemake.input.r1}"
        " --R2 {snakemake.input.r2}"
        " --cpus {snakemake.threads}"
        " {params}) {log}"
    )

    shell(
        "mv {tempdir}/{snakemake.wildcards.assembler}.fasta {snakemake.output.raw_assembly}"
        " && mv {tempdir}/contigs.fa {snakemake.output.contigs}"
    )

SICKLE

For sickle, the following wrappers are available:

SICKLE PE

Trim paired-end reads with sickle.

URL:

Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    r1="input_R1.fq",
    r2="input_R2.fq"
  output:
    r1="output_R1.fq",
    r2="output_R2.fq",
    rs="output_single.fq",
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    # optional log file
    "logs/sickle/job.log"
  wrapper:
    "v0.87.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sickle-trim==1.33
Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
    "-o {snakemake.output.r1} -p {snakemake.output.r2} "
    "-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
    "{extra}) {log}"
)
SICKLE SE

Trim single-end reads with sickle.

URL:

Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    "input_R1.fq"
  output:
    "output_R1.fq"
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    "logs/sickle/job.log"
  wrapper:
    "v0.87.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sickle-trim==1.33
Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
    "-t {snakemake.params.qual_type} {extra}) {log}"
)

SNP-MUTATOR

Generate mutated sequence files from a reference genome.

URL:

Example

This wrapper can be used in the following way:

NUM_SIMULATIONS = 2

rule snpmutator:
    input:
        "{sample}.fa"
    output:
        vcf = "{sample}.mutated.vcf",
        sequences = expand(
            "{{sample}}_mutated_{simulation_number}.fasta",
            simulation_number=range(1, NUM_SIMULATIONS + 1)
        )
    params:
        num_simulations = NUM_SIMULATIONS,
        extra = " ".join([
            "--num-substitutions 2",
            "--num-insertions 2",
            "--num-deletions 0"
        ]),
    log:
        "logs/snp-mutator/test/{sample}.log"
    wrapper:
        "v0.87.0/bio/snp-mutator"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snp-mutator==1.2.0
Authors
  • Michael Hall
Code
"""Snakemake wrapper for SNP Mutator."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from pathlib import Path

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
num_simulations = snakemake.params.get("num_simulations", 100)
fasta_outdir = Path(snakemake.output.sequences[0]).absolute().parent
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "snpmutator {extra} "
    "--num-simulations {num_simulations} "
    "--vcf {snakemake.output.vcf} "
    "-F {fasta_outdir} "
    "{snakemake.input} {log} "
)

SNPEFF

For snpeff, the following wrappers are available:

SNPEFF

Annotate predicted effect of nucleotide changes with SnpEff

URL:

Example

This wrapper can be used in the following way:

rule snpeff:
    input:
        calls="{sample}.vcf", # (vcf, bcf, or vcf.gz)
        db="resources/snpeff/ebola_zaire" # path to reference db downloaded with the snpeff download wrapper
    output:
        calls="snpeff/{sample}.vcf",   # annotated calls (vcf, bcf, or vcf.gz)
        stats="snpeff/{sample}.html",  # summary statistics (in HTML), optional
        csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
    log:
        "logs/snpeff/{sample}.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=4096
    wrapper:
        "v0.87.0/bio/snpeff/annotate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpeff==4.3.1t
  • bcftools=1.11
  • snakemake-wrapper-utils==0.1.3
Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell
from os import path
import shutil
import tempfile
from pathlib import Path
from snakemake_wrapper_utils.java import get_java_opts


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

outcalls = snakemake.output.calls
if outcalls.endswith(".vcf.gz"):
    outprefix = "| bcftools view -Oz"
elif outcalls.endswith(".bcf"):
    outprefix = "| bcftools view -Ob"
else:
    outprefix = ""

incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
    incalls = "< <(bcftools view {})".format(incalls)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

data_dir = Path(snakemake.input.db).parent.resolve()

stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = "" if not csvstats else "-csvStats {}".format(csvstats)
stats_opt = "-noStats" if not stats else "-stats {}".format(stats)

reference = path.basename(snakemake.input.db)

shell(
    "snpEff {java_opts} -dataDir {data_dir} "
    "{stats_opt} {csvstats_opt} {extra} "
    "{reference} {incalls} "
    "{outprefix} > {outcalls} {log}"
)
SNPEFF DOWNLOAD

Download snpeff DB for a given species.

URL:

Example

This wrapper can be used in the following way:

rule snpeff_download:
    output:
        # wildcard {reference} may be anything listed in `snpeff databases`
        directory("resources/snpeff/{reference}")
    log:
        "logs/snpeff/download/{reference}.log"
    params:
        reference="{reference}"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/snpeff/download"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpeff==4.3.1t
  • bcftools=1.11
  • snakemake-wrapper-utils==0.1.3
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)

reference = snakemake.params.reference
outdir = Path(snakemake.output[0]).parent.resolve()
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("snpEff download {java_opts} -dataDir {outdir} {reference} {log}")

SNPSIFT

For snpsift, the following wrappers are available:

SNPSIFT ANNOTATE

Annotate using fields from another VCF file with SnpSift

URL:

Example

This wrapper can be used in the following way:

rule test_snpsift_annotate:
    input:
        call="in.vcf",
        database="annotation.vcf"
    output:
        call="annotated/out.vcf"
    log:
        "annotate.log"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/snpsift/annotate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpsift==4.3.1t
  • bcftools==1.10.2
  • pbgzip==2016.08.04
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • A VCF-formatted file that is to be annoated
  • A VCF-formatted annotation file

Output:

  • A VCF-formatted file
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift annotate"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1

incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
    min_threads += 1
    incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
    min_threads += 1
    incall = "< <(gunzip -c {})".format(incall)

outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
    min_threads += 1
    outcall = "| gzip -c > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
    min_threads += 1
    outcall = "| bcftools view > {}".format(outcall)
else:
    outcall = "> {}".format(outcall)

if snakemake.threads < min_threads:
    raise ValueError(
        "At least {} threads required, {} provided".format(
            min_threads, snakemake.threads
        )
    )

shell(
    "SnpSift annotate"  # Tool and its subcommand
    " {java_opts} {extra}"  # Extra parameters
    " {snakemake.input.database}"  # Path to annotation vcf file
    " {incall} "  # Path to input vcf file
    " {outcall} "  # Path to output vcf file
    " {log}"  # Logging behaviour
)
SNPSIFT DBNSFP

Annotate using integrated annotation from dbNSFP with SnpSift

URL:

Example

This wrapper can be used in the following way:

rule test_snpsift_dbnsfp:
    input:
        call = "in.vcf",
        dbNSFP = "dbNSFP.txt.gz"
    output:
        call = "out.vcf"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/snpsift/dbnsfp"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpsift=4.3.1t
  • bcftools==1.10.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • Calls that are to be annoated
  • A dnNSFP text file

Output:

  • Annotated calls
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift dbNSFP"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Using user-defined file if requested
db = snakemake.input.get("dbNSFP", "")
if db != "":
    db = "-db {}".format(db)

min_threads = 1

# Uncompression shall be done on user request
incall = snakemake.input["call"]
if incall.endswith("bcf"):
    min_threads += 1
    incall = "< <(bcftools view {})".format(incall)
elif incall.endswith("gz"):
    min_threads += 1
    incall = "< <(gunzip -c {})".format(incall)

# Compression shall be done according to user-defined output
outcall = snakemake.output["call"]
if outcall.endswith("gz"):
    min_threads += 1
    outcall = "| gzip -c > {}".format(outcall)
elif outcall.endswith("bcf"):
    min_threads += 1
    outcall = "| bcftools view > {}".format(outcall)
else:
    outcall = "> {}".format(outcall)

# Each (un)compression raises the thread number
if snakemake.threads < min_threads:
    raise ValueError(
        "At least {} threads required, {} provided".format(
            min_threads, snakemake.threads
        )
    )


shell(
    "SnpSift dbnsfp"  # Tool and its subcommand
    " {java_opts} {extra}"  # Extra parameters
    " {db}"  # Path to annotation vcf file
    " {incall}"  # Path to input vcf file
    " {outcall}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)
SNPSIFT GENES SETS

Annotate using GMT genes sets with SnpSift

URL:

Example

This wrapper can be used in the following way:

rule test_snpsift_gmt:
    input:
        call = "in.vcf",
        gmt = "fake_set.gmt"
    output:
        call = "annotated/out.vcf"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/snpsift/genesets"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpsift==4.3.1t
  • bcftools==1.10.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • Calls that are to be annotated
  • A GMT-formatted annotation file

Output:

  • Annotated calls
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift geneSets"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1

# Uncompression shall be done according to user-defined input
incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
    min_threads += 1
    incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
    min_threads += 1
    incall = "< <(gunzip -c {})".format(incall)

# Compression shall be done according to user-defined output
outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
    min_threads += 1
    outcall = "| gzip -c > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
    min_threads += 1
    outcall = "| bcftools view > {}".format(outcall)
else:
    outcall = "> {}".format(outcall)

# Each (un)compression step raises the threads requirements
if snakemake.threads < min_threads:
    raise ValueError(
        "At least {} threads required, {} provided".format(
            min_threads, snakemake.threads
        )
    )


shell(
    "SnpSift geneSets"  # Tool and its subcommand
    " {java_opts} {extra}"  # Extra parameters
    " {snakemake.input.gmt}"  # Path to annotation vcf file
    " {incall}"  # Path to input vcf file
    " {outcall}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)
SNPSIFT GWAS CATALOG

Annotate using GWAS catalog with SnpSift

URL:

Example

This wrapper can be used in the following way:

rule test_snpsift_gwascat:
    input:
        call = "in.vcf",
        gwascat = "gwascatalog.txt"
    output:
        call = "annotated/out.vcf"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/snpsift/gwascat"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpsift==4.3.1t
  • bcftools==1.10.2
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • Calls that are to be annotated (vcf, bcf, vcf.gz)
  • A GWAS Catalog TSV-formatted file

Output:

  • Annotated calls (vcf, bcf, vcf.gz)
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift gwasCat"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1

# Uncompression shall be done based on user input
incall = snakemake.input["call"]
if incall.endswith("bcf"):
    min_threads += 1
    incall = "< <(bcftools view {})".format(incall)
elif incall.endswith("gz"):
    min_threads += 1
    incall = "< <(gunzip -c {})".format(incall)


# Compression shall be done based on user-defined output
outcall = snakemake.output["call"]
if outcall.endswith("bcf"):
    min_threads += 1
    outcall = "| bcftools view {}".format(outcall)
elif outcall.endswith("gz"):
    min_threads += 1
    outcall = "| gzip -c > {}".format(outcall)
else:
    outcall = "> {}".format(outcall)


# Each additional (un)compression step requires more threads
if snakemake.threads < min_threads:
    raise ValueError(
        "At least {} threads required, {} provided".format(
            min_threads, snakemake.threads
        )
    )

shell(
    "SnpSift gwasCat "  # Tool and its subcommand
    " {java_opts} {extra} "  # Extra parameters
    " -db {snakemake.input.gwascat} "  # Path to gwasCat file
    " {incall} "  # Path to input vcf file
    " {outcall} "  # Path to output vcf file
    " {log} "  # Logging behaviour
)
SNPSIFT VARTYPE

Add an INFO field denoting variant type with SnpSift

URL:

Example

This wrapper can be used in the following way:

rule test_snpsift_vartype:
    input:
        vcf="in.vcf"
    output:
        vcf="annotated/out.vcf"
    message:
        "Testing SnpSift varType"
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "varType.log"
    wrapper:
        "v0.87.0/bio/snpsift/varType"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • snpsift=4.3.1t
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • A VCF-formatted file

Output:

  • A VCF-formatted file
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift varType"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "SnpSift varType"  # Tool and its subcommand
    " {java_opts} {extra}"  # Extra parameters
    " {snakemake.input.vcf}"  # Path to input vcf file
    " > {snakemake.output.vcf}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)

SOURMASH

For sourmash, the following wrappers are available:

SOURMASH_COMPUTE

Build a MinHash signature for a transcriptome, genome, or reads

URL:

Example

This wrapper can be used in the following way:

rule sourmash_reads:
    input:
        "reads/a.fastq"
    output:
        "reads.sig"
    log:
        "logs/sourmash/sourmash_compute_reads.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "v0.87.0/bio/sourmash/compute"


rule sourmash_transcriptome:
    input:
        "assembly/transcriptome.fasta"
    output:
        "transcriptome.sig"
    log:
        "logs/sourmash/sourmash_compute_transcriptome.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "v0.87.0/bio/sourmash/compute"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sourmash==2.0.0a7
Input/Output

Input:

  • assembly fasta, or reads fastq

Output:

  • sourmash signature
Authors
  • Lisa K. Johnson
Code
"""Snakemake wrapper for sourmash compute."""

__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled", "1000")
k = snakemake.params.get("k", "31")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
    " {extra} {log}"
)

SPADES

For spades, the following wrappers are available:

METASPADES

Assemble metagenome with metaspades. For more information see the Spades documentation.

Metagenome assembly uses a lot of computational resources. Spades is told to restart from a previous checkpont if the file params.txt exist in the output directory. In this way one can use snakemake with –restart-times to automatically restart the assembly.

Input of metaspades should be at least one paired-end library (=2 fastq files) optionally merged reads as a third fastq file might be supplied and singleton reads as a 4th input file. Long reads can also be input as pacbio or nanopore input argument. To distinguish short from long reads. Use the reads as name for the short reads.

URL:

Example

This wrapper can be used in the following way:

container: "docker://continuumio/miniconda3:4.4.10"


rule run_metaspades:
    input:
        reads=["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
    output:
        contigs="assembly/contigs.fasta",
        scaffolds="assembly/scaffolds.fasta",
        dir=directory("assembly/intermediate_files"),
    benchmark:
        "logs/benchmarks/assembly/spades.txt"
    params:
        # all parameters are optional
        k="auto",
        extra="--only-assembler",
    log:
        "log/spades.log",
    threads: 8
    resources:
        mem_mem=250000,
        time=60 * 24,
    wrapper:
        "v0.87.0/bio/spades/metaspades"


rule download_test_reads:
    output:
        ["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
    log:
        "log/download.log",
    shell:
        " wget https://zenodo.org/record/3992790/files/test_reads.tar.gz >> {log} 2>&1 ; "
        " tar -xzf test_reads.tar.gz >> {log} 2>&1"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • spades>=3.15
  • python>=3.5,<3.10
Authors
  • Silas Kieser
  • Anton Korobeynikov
Code
"""Snakemake wrapper for metaspades."""

__author__ = "Silas Kieser @silask"
__copyright__ = "Copyright 2021, Silas Kieser"
__email__ = "silas.kieser@gmail.com"
__license__ = "MIT"

import os, shutil
from snakemake.shell import shell


# infer output directory

if hasattr(snakemake.output, "dir"):
    output_dir = snakemake.output.dir

else:
    # get output_dir file from output
    if hasattr(snakemake.output, "contigs"):
        output_file = snakemake.output.contigs
    elif hasattr(snakemake.output, "scaffolds"):
        output_file = snakemake.output.scaffolds
    else:
        output_file = snakemake.output[0]

    output_dir = os.path.split(output_file)[0]


# parse params
extra = snakemake.params.get("extra", "")
kmers = snakemake.params.get("k", "'auto'")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

if hasattr(snakemake.resources, "mem_mb"):
    mem_gb = snakemake.resources.mem_mb // 1000
    memory_requirements = f" --memory {mem_gb}"
else:
    memory_requirements = ""

if not os.path.exists(os.path.join(output_dir, "params.txt")):

    # parse short reads
    if hasattr(snakemake.input, "reads"):
        reads = snakemake.input.reads
    else:
        reads = snakemake.input

    assert (
        len(reads) > 1
    ), "Metaspades needs a paired end library. This means you should supply at least 2 fastq files in the rule input."

    assert (
        type(reads[0]) == str
    ), f"Metaspades allows only 1 library. Therefore reads need to be strings got {reads}"

    input_arg = " --pe1-1 {0} --pe1-2 {1} ".format(*reads)

    if len(reads) >= 3:
        input_arg += " --pe1-m {2}".format(*reads)

        if len(reads) >= 4:
            input_arg += " --pe1-s {3}".format(*reads)

    # parse long reads
    for longread_name in ["pacbio", "nanopore"]:
        if hasattr(snakemake.input, longread_name):
            input_arg += " --{name} {}".format(name=longread_name, **snakemake.input)

    shell(
        "spades.py --meta "
        " --threads {snakemake.threads} "
        " {memory_requirements} "
        " -o {output_dir} "
        " -k {kmers} "
        " {input_arg} "
        " {extra} "
        " > {snakemake.log[0]} 2>&1 "
    )


else:
    # params.txt file exitst already I restart from previous run

    shell(
        "echo '\n\nRestart Spades \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
    )

    shell("rm -f {output_dir}/pipeline_state/stage_*_copy_files 2>> {log}")

    shell(
        "spades.py --meta "
        " --restart-from last "
        " --threads {threads} "
        " {memory_requirements} "
        " -o {output_dir} "
        " >> {snakemake.log[0]} 2>&1 "
    )


# Rename/ move output files

Output_key_mapping = {
    "contigs": "contigs.fasta",
    "scaffolds": "scaffolds.fasta",
    "graph": "assembly_graph_with_scaffolds.gfa",
}

has_named_output = False
for key in Output_key_mapping:
    if hasattr(snakemake.output, key):

        has_named_output = True
        file_produced = os.path.join(output_dir, Output_key_mapping[key])
        file_renamed = getattr(snakemake.output, key)

        if file_produced != file_renamed:
            shutil.move(file_produced, file_renamed)


if not has_named_output:

    file_produced = os.path.join(output_dir, "contigs.fasta")
    file_renamed = snakemake.output[0]

    if file_produced != file_renamed:
        shutil.move(file_produced, file_renamed)

SRA-TOOLS

For sra-tools, the following wrappers are available:

SRA-TOOLS FASTERQ-DUMP

Download FASTQ files from SRA.

URL:

Example

This wrapper can be used in the following way:

rule get_fastq_pe:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq",
        "data/pe/{accession}_2.fastq",
    log:
        "logs/pe/{accession}.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"


rule get_fastq_pe_gz:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq.gz",
        "data/pe/{accession}_2.fastq.gz",
    log:
        "logs/pe/{accession}.gz.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"


rule get_fastq_pe_bz2:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/pe/{accession}_1.fastq.bz2",
        "data/pe/{accession}_2.fastq.bz2",
    log:
        "logs/pe/{accession}.bz2.log"
    params:
        extra="--skip-technical"
    threads: 6  # defaults to 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se:
    output:
        "data/se/{accession}.fastq"
    log:
        "logs/se/{accession}.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se_gz:
    output:
        "data/se/{accession}.fastq.gz"
    log:
        "logs/se/{accession}.gz.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se_bz2:
    output:
        "data/se/{accession}.fastq.bz2"
    log:
        "logs/se/{accession}.bz2.log"
    params:
        extra="--skip-technical"
    threads: 6
    wrapper:
        "v0.87.0/bio/sra-tools/fasterq-dump"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • sra-tools>2.9.1
  • pigz>=2.6
  • pbzip2>=1.1
  • snakemake-wrapper-utils=0.3
Notes
  • The output format is automatically detected and, if needed, files compressed with either gzip or bzip2.
  • Currently only supports PE samples
  • The extra param alllows for additional program arguments.
  • More information in, https://github.com/ncbi/sra-tools
Authors
  • Johannes Köster
  • Derek Croote
  • Filipe G. Vieira
Code
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")


# Parse memory
mem_mb = get_mem(snakemake, "MiB")


# Outdir
outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = f"--outdir {outdir}"


# Output compression
compress = ""
mem = f"-m{mem_mb}" if mem_mb else ""

for output in snakemake.output:
    out_name, out_ext = os.path.splitext(output)
    if out_ext == ".gz":
        compress += f"pigz -p {snakemake.threads} {out_name}; "
    elif out_ext == ".bz2":
        compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; "


with tempfile.TemporaryDirectory() as tmpdir:
    mem = f"--mem {mem_mb}M" if mem_mb else ""

    shell(
        "(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} "
        "{extra} {outdir} {snakemake.wildcards.accession}; "
        "{compress}"
        ") {log}"
    )

STAR

For star, the following wrappers are available:

STAR

Map reads with STAR.

URL:

Example

This wrapper can be used in the following way:

rule star_pe_multi:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1=["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
        # paired end reads needs to be ordered so each item in the two lists match
        fq2=["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"],  #optional
    output:
        # see STAR manual for additional output files
        "star/pe/{sample}/Aligned.out.sam",
    log:
        "logs/star/pe/{sample}.log",
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra="",
    threads: 8
    wrapper:
        "v0.87.0/bio/star/align"


rule star_se:
    input:
        fq1="reads/{sample}_R1.1.fastq",
    output:
        # see STAR manual for additional output files
        "star/{sample}/Aligned.out.sam",
    log:
        "logs/star/{sample}.log",
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra="",
    threads: 8
    wrapper:
        "v0.87.0/bio/star/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • star==2.7.9a
Notes
Authors
  • Johannes Köster
  • Tomás Di Domenico
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = (
    [snakemake.input.fq1]
    if isinstance(snakemake.input.fq1, str)
    else snakemake.input.fq1
)
fq2 = snakemake.input.get("fq2")
if fq2:
    fq2 = (
        [snakemake.input.fq2]
        if isinstance(snakemake.input.fq2, str)
        else snakemake.input.fq2
    )
    assert len(fq1) == len(
        fq2
    ), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])

if fq1[0].endswith(".gz"):
    readcmd = "--readFilesCommand zcat"
else:
    readcmd = ""

if "SortedByCoordinate" in extra:
    bamprefix = "Aligned.sortedByCoord.out."
else:
    bamprefix = "Aligned.out."

outprefix = snakemake.output[0].split(bamprefix)[0]

if outprefix == os.path.dirname(snakemake.output[0]):
    outprefix += "/"

with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "STAR "
        "{extra} "
        "--runThreadN {snakemake.threads} "
        "--genomeDir {snakemake.params.index} "
        "--readFilesIn {input_str} "
        "{readcmd} "
        "--outFileNamePrefix {outprefix} "
        "--outStd Log "
        "--outTmpDir {tmpdir}/STARtmp "
        "{log}"
    )
STAR INDEX

Index fasta sequences with STAR

URL:

Example

This wrapper can be used in the following way:

rule star_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("{genome}")
    message:
        "Testing STAR index"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/star_index_{genome}.log"
    wrapper:
        "v0.87.0/bio/star/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • star==2.7.8a
Input/Output

Input:

  • A (multi)fasta formatted file

Output:

  • A directory containing the indexed sequence for downstream STAR mapping
Authors
  • Thibault Dayris
  • Tomás Di Domenico
Code
"""Snakemake wrapper for STAR index"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
sjdb_overhang = snakemake.params.get("sjdbOverhang", "100")

gtf = snakemake.input.get("gtf")
if gtf is not None:
    gtf = "--sjdbGTFfile " + gtf
    sjdb_overhang = "--sjdbOverhang " + sjdb_overhang
else:
    gtf = sjdb_overhang = ""

makedirs(snakemake.output)

shell(
    "STAR "  # Tool
    "--runMode genomeGenerate "  # Indexation mode
    "{extra} "  # Optional parameters
    "--runThreadN {snakemake.threads} "  # Number of threads
    "--genomeDir {snakemake.output} "  # Path to output
    "--genomeFastaFiles {snakemake.input.fasta} "  # Path to fasta files
    "{sjdb_overhang} "  # Read-len - 1
    "{gtf} "  # Highly recommended GTF
    "{log}"  # Logging
)

STRELKA

For strelka, the following wrappers are available:

STRELKA GERMLINE

Call germline variants with Strelka.

URL:

Example

This wrapper can be used in the following way:

rule strelka_germline:
    input:
        # the required bam file
        bam="mapped/{sample}.bam",
        # path to reference genome fasta and index
        fasta="genome.fasta",
        fasta_index="genome.fasta.fai"
    output:
        # Strelka results - either use directory or complete file path
        directory("strelka/{sample}")
    log:
        "logs/strelka/germline/{sample}.log"
    params:
        # optional parameters
        config_extra="",
        run_extra=""
    threads: 8
    wrapper:
        "v0.87.0/bio/strelka/germline"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strelka==2.9.10
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"


import os
from pathlib import Path
from snakemake.shell import shell

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam = snakemake.input.get("bam")  # input bam file, required
assert bam is not None, "input-> bam is a required input parameter"

if snakemake.output[0].endswith(".vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "(configureStrelkaGermlineWorkflow.py "  # configure the strelka run
    "--bam {bam} "  # input bam
    "--referenceFasta {snakemake.input.fasta} "  # reference genome
    "--runDir {run_dir} "  # output directory
    "{config_extra} "  # additional parameters for the configuration
    "&& {run_dir}/runWorkflow.py "  # run the strelka workflow
    "-m local "  # run in local mode
    "-j {snakemake.threads} "  # number of threads
    "{run_extra}) "  # additional parameters for the run
    "{log}"
)  # logging
STRELKA

Strelka calls somatic and germline small variants from mapped sequencing reads

URL:

Example

This wrapper can be used in the following way:

rule strelka:
    input:
        # The normal bam and its index
        # are optional input
        # normal = "data/b.bam",
        # normal_index = "data/b.bam.bai"
        tumor = "data/{tumor}.bam",
        tumor_index = "data/{tumor}.bam.bai",
        fasta = "data/genome.fasta",
        fasta_index = "data/genome.fasta.fai"
    output:
        # Strelka output - can be directory or full file path
        directory("{tumor}_vcf")
    threads:
        1
    params:
        run_extra = "",
        config_extra = ""
    log:
        "logs/strelka_{tumor}.log"
    wrapper:
        "v0.87.0/bio/strelka/somatic"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strelka==2.9.10
Input/Output

Input:

  • A tumor bam file, with its index.
  • A reference genome sequence in fasta format, with its index.
  • An optional normal bam file for somatic calling, with its index.

Output:

  • Statistics about calling results
  • Variants called
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Strelka"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")

# If a normal bam is given in input,
# then it should be provided in the input
# block, so Snakemake will perform additional
# tests on file existance.
normal = (
    "--normalBam {}".format(snakemake.input["normal"])
    if "normal" in snakemake.input.keys()
    else ""
)

if snakemake.output[0].endswith("vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "(configureStrelkaSomaticWorkflow.py "  # Configuration script
    "{normal} "  # Path to normal bam (if any)
    "--tumorBam {snakemake.input.tumor} "  # Path to tumor bam
    "--referenceFasta {snakemake.input.fasta} "  # Path to fasta file
    "--runDir {run_dir} "  # Path to output directory
    "{config_extra} "  # Extra parametersfor configuration
    " && "
    "{run_dir}/runWorkflow.py "  # Run the pipeline
    "--mode local "  # Stop internal job submission
    "--jobs {snakemake.threads} "  # Nomber of threads
    "{run_extra}) "  # Extra parameters for runWorkflow
    "{log}"  # Logging behaviour
)

STRLING

For strling, the following wrappers are available:

STRLING CALL

STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. call calls genotypes/estimate allele sizes for all loci in each sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html

URL:

Example

This wrapper can be used in the following way:

rule strling_call:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        bin="extract/{sample}.bin",
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
        bounds="merged/group-bounds.txt" # optional, produced by strling merge
    output:
        "call/{sample}-bounds.txt", # must end with -bounds.txt
        "call/{sample}-genotype.txt", # must end with -genotype.txt
        "call/{sample}-unplaced.txt" # must end with -unplaced.txt
    params:
        extra="" # optional extra command line arguments
    log:
        "log/strling/call/{sample}.log"
    wrapper:
        "v0.87.0/bio/strling/call"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strling==0.3
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling call"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
bin = snakemake.input.get("bin", None)
reference = snakemake.input.get("reference", None)
bounds = snakemake.input.get("bounds", None)

if not bam or (isinstance(bam, list) and len(bam) != 1):
    raise ValueError("Please provide exactly one 'bam' as input.")

if not path.exists(bam + ".bai"):
    raise ValueError(
        "Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
    )

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not bounds:  # optional
    bounds_string = ""
else:
    bounds_string = "-b {}".format(bounds)

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if not any(o.endswith("-bounds.txt") for o in snakemake.output):
    raise ValueError("Please provide a file that ends with -bounds.txt in the output.")

for filename in snakemake.output:
    if filename.endswith("-bounds.txt"):
        prefix = filename[: -len("-bounds.txt")]
        break

if not any(o == "{}-genotype.txt".format(prefix) for o in snakemake.output):
    raise ValueError(
        "Please provide an output file that ends with -genotype.txt and has the same prefix as -bounds.txt"
    )

if not any(o == "{}-unplaced.txt".format(prefix) for o in snakemake.output):
    raise ValueError(
        "Please provide an output file that ends with -unplaced.txt and has the same prefix as -bounds.txt"
    )

shell(
    "(strling call "
    "{bam} "
    "{bin} "
    "{bounds_string} "
    "-o {prefix} "
    "{extra}) {log}"
)
STRLING EXTRACT

STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. extract retrieves informative read pairs to a binary format for a single sample (same as above, you can use the same bin files). Documentation at: https://strling.readthedocs.io/en/latest/run.html

URL:

Example

This wrapper can be used in the following way:

rule strling_extract:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
        index="reference/genome.fasta.str" # optional
    output:
        "extract/{sample}.bin"
    log:
        "log/strling/extract/{sample}.log"
    params:
       extra="" # optionally add further command line arguments
    wrapper:
        "v0.87.0/bio/strling/extract"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strling==0.3
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling extract"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
reference = snakemake.input.get("reference", None)
index = snakemake.input.get("index", None)

if not bam or (isinstance(bam, list) and len(bam) != 1):
    raise ValueError("Please provide exactly one 'bam' input.")

if not path.exists(bam + ".bai"):
    raise ValueError(
        "Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
    )

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if not index:  # optional
    index_string = ""
else:
    index_string = "-g {}".format(index)

if len(snakemake.output) != 1:
    raise ValueError("Please provide exactly one output file (.bin).")

shell(
    "(strling extract "
    "{bam} "
    "{snakemake.output[0]} "
    "-f {reference} "
    "{index_string} "
    "{extra}) {log}"
)
STRLING INDEX

STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. index creates a bed file of large STR regions in the reference genome. This step is performed automatically as part of strling extract. However, when running multiple samples, it is more efficient to do it once, then pass the file to strling extract using the -g option. Documentation at: https://strling.readthedocs.io/en/latest/run.html

URL:

Example

This wrapper can be used in the following way:

rule strling_index:
    input:
        "reference/genome.fasta"
    output:
        index="reference/genome.fasta.str",
        fai="reference/genome.fasta.fai"
    params:
        extra="" # optionally add further command line arguments
    log:
        "log/strling/index.log"
    wrapper:
        "v0.87.0/bio/strling/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strling==0.3
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling index"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
if len(snakemake.input) != 1:
    raise ValueError("Please provide exactly one reference genome.")

shell(
    "(strling index {snakemake.input[0]} "
    "-g {snakemake.output.index} "
    "{extra}) {log}"
)
STRLING MERGE

STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. merge prepares joint calling of STR loci across all given samples. Requires minimum read evidence from at least one sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html

URL:

Example

This wrapper can be used in the following way:

rule strling_merge:
    input:
        bins=["extract/A.bin", "extract/B.bin"],
        reference="reference/genome.fasta",
        fai="reference/genome.fasta.fai",
    output:
        "merged/group-bounds.txt" # must end with "-bounds.txt"
    params:
        extra="" # optionally add further command line arguments
    log:
        "log/strling/merge/group.log"
    wrapper:
        "v0.87.0/bio/strling/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • strling==0.3
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for strling merge"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Check inputs/arguments.
bins = snakemake.input.get("bins", None)
reference = snakemake.input.get("reference", None)
fai = snakemake.input.get("fai", None)

if not bins or len(bins) < 2:
    raise ValueError("Please provide at least two 'bins' as input.")

if not reference:
    raise ValueError("Please provide a fasta 'reference' input.")

if not path.exists(reference + ".fai"):
    raise ValueError(
        "Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
    )

if len(snakemake.output) != 1:
    raise ValueError("Please provide exactly one output file (.bin).")

if not snakemake.output[0].endswith("-bounds.txt"):
    raise ValueError(
        "Output file must end with '-bounds.txt'. Please change the output file name."
    )

prefix = snakemake.output[0][: -len("-bounds.txt")]

shell("(strling merge " "{bins} " "-o {prefix} " "{extra}) {log}")

SUBREAD

For subread, the following wrappers are available:

SUBREAD FEATURECOUNTS

FeatureCounts assign mapped reads or fragments (paired-end data) to genomic features such as genes, exons and promoters. For more information please see featureCounts tutorial, documentation of subread and commandline help.

URL:

Example

This wrapper can be used in the following way:

rule feature_counts:
    input:
        sam="{sample}.bam", # list of sam or bam files
        annotation="annotation.gtf",
        # optional input
        # chr_names="",           # implicitly sets the -A flag
        # fasta="genome.fasta"      # implicitly sets the -G flag
    output:
        multiext("results/{sample}",
                 ".featureCounts",
                 ".featureCounts.summary",
                 ".featureCounts.jcounts")
    threads:
        2
    params:
        tmp_dir="",   # implicitly sets the --tmpDir flag
        r_path="",    # implicitly sets the --Rpath flag
        extra="-O --fracOverlap 0.2"
    log:
        "logs/{sample}.log"
    wrapper:
        "v0.87.0/bio/subread/featurecounts"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • subread=2.0
Input/Output

Input:

  • a list of .sam or .bam files
  • GTF, GFF or SAF annotation file
  • optional a tab separating file that determines the sorting order and contains the chromosome names in the first column
  • optional a fasta index file

Output:

  • .featureCounts file including read counts (tab separated)
  • .featureCounts.summary file including summary statistics (tab separated)
  • .featureCounts.jcounts file including count number of reads supporting each exon-exon junction (tab separated)
Authors
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

# optional input files and directories
fasta = snakemake.input.get("fasta", "")
chr_names = snakemake.input.get("chr_names", "")
tmp_dir = snakemake.params.get("tmp_dir", "")
r_path = snakemake.params.get("r_path", "")

if fasta:
    extra += " -G {}".format(fasta)
if chr_names:
    extra += " -A {}".format(chr_names)
if tmp_dir:
    extra += " --tmpDir {}".format(tmp_dir)
if r_path:
    extra += " --Rpath {}".format(r_path)

shell(
    "(featureCounts"
    " {extra}"
    " -T {snakemake.threads}"
    " -J"
    " -a {snakemake.input.annotation}"
    " -o {snakemake.output[0]}"
    " {snakemake.input.sam})"
    " {log}"
)

TABIX

For tabix, the following wrappers are available:

TABIX

Query given file with tabix.

URL: https://github.com/samtools/htslib

Example

This wrapper can be used in the following way:

rule tabix:
    input:
        "{prefix}.bed.gz",
        "{prefix}.bed.gz.tbi"
        ## list the VCF/BCF as the first input
        ## and the index as the second input
    output:
        "{prefix}.output.bed"
    params:
        region = "1"
    log:
        "logs/tabix/query/{prefix}.log"
    wrapper:
        "v0.87.0/bio/tabix/query"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • htslib==1.12
Input/Output

Input:

  • Bgzip compressed file (e.g. BED.gz, GFF.gz, or VCF.gz)
  • Tabix index file
  • Region of interest to retrieve (params.region)

Output:

  • Uncompressed subset of the input file from the given region
Authors
  • William Rowell
Code
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "tabix {extra} {snakemake.input[0]} {snakemake.params.region} > {snakemake.output} {log}"
)

TRANSDECODER

For transdecoder, the following wrappers are available:

TRANSDECODER LONGORFS

TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.

URL:

Example

This wrapper can be used in the following way:

rule transdecoder_longorfs:
    input:
        fasta="test.fa.gz", # required
        gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
    output:
        "test.fa.transdecoder_dir/longest_orfs.pep"
    log:
        "logs/transdecoder/test-longorfs.log"
    params:
        extra=""
    wrapper:
        "v0.87.0/bio/transdecoder/longorfs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • transdecoder=5.5.0
Input/Output

Input:

  • fasta transcripts

Output:

  • ORFs peptide file(s)
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder LongOrfs"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
    gtm_cmd = " --gene_trans_map " + gtm

output_dir = path.dirname(str(snakemake.output))

# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")
TRANSDECODER PREDICT

Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.

URL:

Example

This wrapper can be used in the following way:

rule transdecoder_predict:
    input:
        fasta="test.fa.gz", # required input; optionally gzipped
        pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
        blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
        # you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
        #longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
    output:
        "test.fa.transdecoder.bed",
        "test.fa.transdecoder.cds",
        "test.fa.transdecoder.pep",
        "test.fa.transdecoder.gff3"
    log:
        "logs/transdecoder/test-predict.log"
    params:
        extra=""
    wrapper:
        "v0.87.0/bio/transdecoder/predict"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • transdecoder=5.5.0
Input/Output

Input:

  • fasta assembly

Output:

  • candidate coding regions (pep, cds, gff3, bed output formats)
Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder Predict"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
    addl_outputs += " --retain_pfam_hits " + pfam

blast = snakemake.input.get("blastp_hits", "")
if blast:
    addl_outputs += " --retain_blastp_hits " + blast

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")

TRIM_GALORE

For trim_galore, the following wrappers are available:

TRIM_GALORE-PE

Trim paired-end reads using trim_galore.

URL:

Example

This wrapper can be used in the following way:

rule trim_galore_pe:
    input:
        ["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
    output:
        "trimmed/{sample}.1_val_1.fq.gz",
        "trimmed/{sample}.1.fastq.gz_trimming_report.txt",
        "trimmed/{sample}.2_val_2.fq.gz",
        "trimmed/{sample}.2.fastq.gz_trimming_report.txt",
    params:
        extra="--illumina -q 20",
    log:
        "logs/trim_galore/{sample}.log",
    wrapper:
        "v0.87.0/bio/trim_galore/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • trim-galore==0.6.6
Input/Output

Input:

  • two (paired-end) fastq files (can be gzip compressed)

Output:

  • two trimmed (paired-end) fastq files
  • two trimming reports
Params
  • extra: additional parameters
Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " --paired"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)
TRIM_GALORE-SE

Trim unpaired reads using trim_galore.

URL:

Example

This wrapper can be used in the following way:

rule trim_galore_se:
    input:
        "reads/{sample}.fastq.gz",
    output:
        "trimmed/{sample}_trimmed.fq.gz",
        "trimmed/{sample}.fastq.gz_trimming_report.txt",
    params:
        extra="--illumina -q 20",
    log:
        "logs/trim_galore/{sample}.log",
    wrapper:
        "v0.87.0/bio/trim_galore/se"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • trim-galore==0.6.6
Input/Output

Input:

  • fastq file with untrimmed reads (can be gzip compressed)

Output:

  • trimmed fastq file
  • trimming report
Params
  • extra: additional parameters
Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)

TRIMMOMATIC

For trimmomatic, the following wrappers are available:

TRIMMOMATIC PE

Trim paired-end reads with trimmomatic . (De)compress with pigz.

URL:

Example

This wrapper can be used in the following way:

rule trimmomatic_pe:
    input:
        r1="reads/{sample}.1.fastq.gz",
        r2="reads/{sample}.2.fastq.gz"
    output:
        r1="trimmed/{sample}.1.fastq.gz",
        r2="trimmed/{sample}.2.fastq.gz",
        # reads where trimming entirely removed the mate
        r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
        r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        compression_level="-9"
    threads:
        32
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/trimmomatic/pe"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • trimmomatic==0.36
  • pigz==2.3.4
  • snakemake-wrapper-utils==0.1.3
Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/pe

Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
    gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
    snakemake.output.r1,
    snakemake.output.r1_unpaired,
    snakemake.output.r2,
    snakemake.output.r2_unpaired,
]

trimmomatic_threads, input_threads, output_threads = distribute_threads(
    input_files, output_files, snakemake.threads
)

input_r1, input_r2 = [
    compose_input_gz(filename, input_threads) for filename in input_files
]

output_r1, output_r1_unp, output_r2, output_r2_unp = [
    compose_output_gz(filename, output_threads, compression_level)
    for filename in output_files
]

shell(
    "trimmomatic PE -threads {trimmomatic_threads} {java_opts} {extra} "
    "{input_r1} {input_r2} "
    "{output_r1} {output_r1_unp} "
    "{output_r2} {output_r2_unp} "
    "{trimmer} "
    "{log}"
)
TRIMMOMATIC SE

Trim single-end reads with trimmomatic. (De)compress with pigz.

URL:

Example

This wrapper can be used in the following way:

rule trimmomatic:
    input:
        "reads/{sample}.fastq.gz"  # input and output can be uncompressed or compressed
    output:
        "trimmed/{sample}.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        # optional compression levels from -0 to -9 and -11
        compression_level="-9"
    threads:
        32
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    wrapper:
        "v0.87.0/bio/trimmomatic/se"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • trimmomatic==0.36
  • pigz==2.3.4
  • snakemake-wrapper-utils==0.1.3
Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/se

Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_file, output_file, available_threads):
    gzipped_input_files = 1 if input_file.endswith(".gz") else 0
    gzipped_output_files = 1 if output_file.endswith(".gz") else 0
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
trimmomatic_threads, input_threads, output_threads = distribute_threads(
    snakemake.input[0], snakemake.output[0], snakemake.threads
)

# Collect files
input = compose_input_gz(snakemake.input[0], input_threads)
output = compose_output_gz(snakemake.output[0], output_threads, compression_level)

shell(
    "trimmomatic SE -threads {trimmomatic_threads} "
    "{java_opts} {extra} {input} {output} {trimmer} {log}"
)

TRINITY

Generate transcriptome assembly with Trinity

URL:

Example

This wrapper can be used in the following way:

rule trinity:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "trinity_out_dir/Trinity.fasta"
    log:
        'logs/trinity/trinity.log'
    params:
        extra=""
    threads: 4
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_gb=10
    wrapper:
        "v0.87.0/bio/trinity"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • trinity==2.8.4
Input/Output

Input:

  • fastq files

Output:

  • fasta containing assembly
Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Trinity."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
# Previous wrapper reserved 10 Gigabytes by default. This behaviour is
# preserved below:
max_memory = "10G"

# Getting memory in megabytes, if java opts is not filled with -Xmx parameter
# By doing so, backward compatibility is preserved
if "mem_mb" in snakemake.resources.keys():
    # max_memory from trinity expects a value in gigabytes.
    rounded_mb_to_gb = int(snakemake.resources["mem_mb"] / 1024)
    max_memory = "{}G".format(rounded_mb_to_gb)

# Getting memory in gigabytes, for user convenience. Please prefer the use
# of mem_mb over mem_gb as advised in documentation.
elif "mem_gb" in snakemake.resources.keys():
    max_memory = "{}G".format(snakemake.resources["mem_gb"])


# allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = (
    [snakemake.input.left]
    if isinstance(snakemake.input.left, str)
    else snakemake.input.left
)
right = snakemake.input.get("right")
if right:
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) >= len(
        right
    ), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
    input_str_left = " --left " + ",".join(left)
    input_str_right = " --right " + ",".join(right)
else:
    input_str_left = " --single " + ",".join(left)
    input_str_right = ""

input_cmd = " ".join([input_str_left, input_str_right])

# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
    if "fq" in left[0] or "fastq" in left[0]:
        seqtype = "fq"
    elif "fa" in left[0] or "fasta" in left[0]:
        seqtype = "fa"
    else:  # assertion is redundant - warning or error instead?
        assert (
            seqtype is not None
        ), "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"

outdir = path.dirname(snakemake.output[0])
assert "trinity" in outdir, "output directory name must contain 'trinity'"

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "Trinity {input_cmd} --CPU {snakemake.threads} "
    " --max_memory {max_memory} --seqType {seqtype} "
    " --output {outdir} {snakemake.params.extra} "
    " {log}"
)

TXIMPORT

Import and summarize transcript-level estimates for both transcript-level and gene-level analysis.

URL:

Example

This wrapper can be used in the following way:

rule tximport:
    input:
        quant = expand("quant/A/quant.sf")
        # Optional transcript/gene links as described in tximport
        # tx2gene = /path/to/tx2gene
    output:
        txi = "txi.RDS"
    params:
        extra = "type='salmon', txOut=TRUE"
    wrapper:
        "v0.87.0/bio/tximport"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bioconductor-tximport==1.14.0
  • r-readr==1.3.1
  • r-jsonlite==1.6
Input/Output

Input:

  • A list of paths to count data

Output:

  • A tximport RDS object
Notes

Add any tximport options in the params, they will be transmitted through the R wrapper. Supplementary options will cause unknown parameters error.

Authors
  • Thibault Dayris
Code
#!/bin/R

# Loading library
base::library("tximport");   # Perform actual count importation in R
base::library("readr");      # Read faster!
base::library("jsonlite");   # Importing inferential replicates

# Cast input paths as character to avoid errors
samples_paths <- sapply(               # Sequentially apply
  snakemake@input[["quant"]],          # ... to all quantification paths
  function(quant) as.character(quant)  # ... a cast as character
);

# Collapse path into a character vector
samples_paths <- base::paste0(samples_paths, collapse = '", "');

# Building function arguments
extra <- base::paste0('files = c("', samples_paths, '")');

# Check if user provided optional transcript to gene table
if ("tx_to_gene" %in% names(snakemake@input)) {
  tx2gene <- readr::read_tsv(snakemake@input[["tx_to_gene"]]);
  extra <- base::paste(
    extra,                 # Foreward existing arguments
    ", tx2gene = ",        # Argument name
    "tx2gene"              # Add tx2gene to parameters
  );
}

# Add user defined arguments
if ("extra" %in% names(snakemake@params)) {
  if (snakemake@params[["extra"]] != "") {
    extra <- base::paste(
      extra,                       # Foreward existing parameters
      snakemake@params[["extra"]], # Add user parameters
      sep = ", "                   # Field separator
    );
  }
}


print(extra);
# Perform tximport work
txi <- base::eval(                        # Evaluate the following
  base::parse(                            # ... parsed expression
    text = base::paste0(
      "tximport::tximport(", extra, ");"  # ... of tximport and its arguments
    )
  )
);

# Save results
base::saveRDS(                       # Save R object
  object = txi,                      # The txi object
  file = snakemake@output[["txi"]]   # Output path is provided by Snakemake
);

UCSC

For ucsc, the following wrappers are available:

BEDGRAPHTOBIGWIG

Convert *.bedGraph file to *.bw file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

URL:

Example

This wrapper can be used in the following way:

rule bedGraphToBigWig:
    input:
        bedGraph="{sample}.bedGraph",
        chromsizes="genome.chrom.sizes"
    output:
        "{sample}.bw"
    log:
        "logs/{sample}.bed-graph_to_big-wig.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/ucsc/bedGraphToBigWig"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ucsc-bedgraphtobigwig==377
Input/Output

Input:

  • bedGraph: Path to *.bedGraph file
  • chromsizes: Chrom sizes file, could be generated by twoBitInfo or downloaded from UCSC

Output:

  • Path to output ‘*.bw’ file
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.bedGraph to *.bw conversion using UCSC bedGraphToBigWig tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "bedGraphToBigWig {extra}"
    " {snakemake.input.bedGraph} {snakemake.input.chromsizes}"
    " {snakemake.output} {log}"
)
FATOTWOBIT

Convert *.fa file to *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

URL:

Example

This wrapper can be used in the following way:

# Example: from *.fa file
rule faToTwoBit_fa:
    input:
        "{sample}.fa"
    output:
        "{sample}.2bit"
    log:
        "logs/{sample}.fa_to_2bit.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/ucsc/faToTwoBit"

# Example: from *.fa.gz file
rule faToTwoBit_fa_gz:
    input:
        "{sample}.fa.gz"
    output:
        "{sample}.2bit"
    log:
        "logs/{sample}.fa-gz_to_2bit.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/ucsc/faToTwoBit"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ucsc-fatotwobit==377
Input/Output

Input:

  • Path(s) to genome *.fa or *.fa.gz files

Output:

  • Path to output ‘*.2bit’ file
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC faToTwoBit tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("faToTwoBit {extra} {snakemake.input} {snakemake.output} {log}")
GTFTOGENEPRED

Convert a GTF file to genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9)

URL:

Example

This wrapper can be used in the following way:

rule gtfToGenePred:
    input:
        # annotations containing gene, transcript, exon, etc. data in GTF format
        "annotation.gtf"
    output:
        "annotation.genePred"
    log:
        "logs/gtfToGenePred.log"
    params:
        extra="-genePredExt" # optional parameters to pass to gtfToGenePred
    wrapper:
        "v0.87.0/bio/ucsc/gtfToGenePred"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ucsc-gtftogenepred==377
Input/Output

Input:

  • GTF file

Output:

  • genePred table
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("gtfToGenePred {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITINFO

Generate *.chorom.sizes file by *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

URL:

Example

This wrapper can be used in the following way:

rule twoBitInfo:
    input:
        "{sample}.2bit"
    output:
        "{sample}.chrom.sizes"
    log:
        "logs/{sample}.chrom.sizes.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/ucsc/twoBitInfo"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ucsc-twobitinfo==377
Input/Output

Input:

  • Path to genome *.2bit file

Output:

  • Path to output *.chrom.sizes file
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitInfo tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitInfo {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITTOFA

Convert *.2bit file to *.fa file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

URL:

Example

This wrapper can be used in the following way:

rule twoBitToFa:
    input:
        "{sample}.2bit"
    output:
        "{sample}.fa"
    log:
        "logs/{sample}.2bit_to_fa.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/ucsc/twoBitToFa"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ucsc-twobittofa==377
Input/Output

Input:

  • Path to genome *.2bit file

Output:

  • Path to output ‘*.fa’ file
Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitToFa tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitToFa {extra} {snakemake.input} {snakemake.output} {log}")

UMIS

For umis, the following wrappers are available:

UMIS BAMTAG

Convert a BAM/SAM with fastqtransformed read names to have UMI and

URL:

Example

This wrapper can be used in the following way:

rule umis_bamtag:
    input:
        "data/{sample}.bam"
    output:
        "data/{sample}.annotated.bam"
    log:
        "logs/umis/bamtag/{sample}.log"
    params:
        extra=""
    threads: 1
    wrapper:
        "v0.87.0/bio/umis/bamtag"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • umis==1.0.3
  • samtools==1.9
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
    raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

in_pipe = ""
if bam_input.endswith(".sam"):
    in_pipe = "cat "
else:
    in_pipe = "samtools view -h "

out_pipe = ""
if not output_file.endswith(".sam"):
    out_pipe = " | samtools view -S -b - "

shell(
    " {in_pipe} {bam_input} | " " umis bamtag -" " {out_pipe} > {output_file}" " {log}"
)

UNICYCLER

Assemble bacterial genomes with Unicycler.

You may find additional information on Unicycler’s github page.

URL:

Example

This wrapper can be used in the following way:

rule test_unicycler:
    input:
        # R1 and R2 short reads:
        paired = expand(
            "reads/{sample}.{read}.fq.gz",
            read=["R1", "R2"],
            allow_missing=True
        )
        # Long reads:
        # long = long_reads/{sample}.fq.gz
        # Unpaired reads:
        # unpaired = reads/{sample}.fq.gz
    output:
        "result/{sample}/assembly.fasta"
    log:
        "logs/{sample}.log"
    params:
        extra=""
    wrapper:
        "v0.87.0/bio/unicycler"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • bowtie2==2.4.1
  • bcftools==1.10.2
  • spades==3.14.1
  • samtools==1.10
  • pilon==1.23
  • racon==1.4.13
  • blast==2.10.1
  • unicycler==0.4.8
Input/Output

Input:

  • Fastq-formatted reads

Output:

  • Assembled reads
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Unicycler"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from os.path import dirname
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

input_reads = ""
if "paired" in snakemake.input.keys():
    input_reads += " --short1 {} --short2 {}".format(*snakemake.input.paired)
if "unpaired" in snakemake.input.keys():
    input_reads += " --unpaired {} ".format(snakemake.input["unpaired"])
if "long" in snakemake.input.keys():
    input_reads += " --long {} ".format(snakemake.input["long"])

output_dir = " --out {} ".format(dirname(snakemake.output[0]))

shell(
    " unicycler "
    " {input_reads} "
    " --threads {snakemake.threads} "
    " {output_dir} "
    " {extra} "
    " {log} "
)

VARDICT

Run Vardict to call genomic variants

URL:

Example

This wrapper can be used in the following way:

rule vardict_single_mode:
    input:
        reference="data/genome.fasta",
        regions="regions.bed",
        bam="mapped/{sample}.bam",
    output:
        vcf="vcf/{sample}.s.vcf",
    params:
        extra="",
        bed_columns="-c 1 -S 2 -E 3 -g 4",  # Optional, default is -c 1 -S 2 -E 3 -g 4
        allele_frequency_threshold="0.01",  # Optional, default is 0.01
    threads: 1
    log:
        "logs/varscan_{sample}_s_.log",
    wrapper:
        "v0.87.0/bio/vardict"


rule vardict_paired_mode:
    input:
        reference="data/genome.fasta",
        regions="regions.bed",
        bam="mapped/{sample}.bam",
        normal="mapped/b.bam",
    output:
        vcf="vcf/{sample}.tn.vcf",
    params:
        extra="",
    threads: 1
    log:
        "logs/varscan_{sample}_tn.log",
    wrapper:
        "v0.87.0/bio/vardict"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vardict-java==1.8.2
Input/Output

Input:

  • reference file
  • bam file
  • normal file, optional (must be set for tumor/normal mode)
  • region file

Output:

  • A VCF file
Params
  • extra. optional:
  • bed_columns, optional, default -c 1 -S 2 -E 3 -g 4:
  • ah_th optional, default values is 0.01:
Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for VarDict Single sample mode"""

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

reference = snakemake.input.reference
regions = snakemake.input.regions
bam = snakemake.input.bam
normal = snakemake.input.get("normal", None)
vcf = snakemake.output.vcf

extra = snakemake.params.get("extra", "")
bed_columns = snakemake.params.get("bed_columns", "-c 1 -S 2 -E 3 -g 4")
af_th = snakemake.params.get("allele_frequency_threshold", "0.01")


if normal is None:
    input_bams = bam
    name = snakemake.params.get("sample_name", Path(bam).stem)
    post_scripts = (
        "teststrandbias.R | var2vcf_valid.pl -A -N '" + name + "' -E -f " + af_th
    )
else:
    input_bams = "'" + bam + "|" + normal + "'"
    name = snakemake.params.get("sample_name", Path(bam).stem + "|" + Path(normal).stem)
    post_scripts = 'testsomatic.R | var2vcf_paired.pl -N "' + name + '" -f ' + af_th


shell(
    "vardict-java -G {reference} "
    "-f {af_th} "
    " {extra} "
    "-th {snakemake.threads} "
    "{bed_columns} "
    "-N '{name}' "
    "-b {input_bams} "
    "{regions} |"
    "{post_scripts} "
    "> {vcf}"
    "{log}"
)

VARSCAN

For varscan, the following wrappers are available:

VARSCAN MPILEUP2INDEL

Detect indel in NGS data from mpileup files with VarScan

URL:

Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling Indel with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "v0.87.0/bio/varscan/mpileup2indel"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • varscan==2.4.3
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • A mpileup file

Output:

  • A VCF file
Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2indel"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2indel "  # Tool and its subprocess
    "<( {pileup} ) "
    "{java_opts} {extra} "  # Extra parameters
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN MPILEUP2SNP

Detect variants in NGS data from Samtools mpileup with VarScan

URL:

Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling SNP with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "v0.87.0/bio/varscan/mpileup2snp"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • varscan==2.4.3
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • A mpileup file

Output:

  • A VCF file
Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2snp"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2snp "  # Tool and its subprocess
    "<( {pileup} ) "
    "{java_opts} {extra} "  # Extra parameters
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN SOMATIC

Varscan Somatic calls variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.

URL:

Example

This wrapper can be used in the following way:

rule varscan_somatic:
    input:
        # A pair of pileup files can be used *instead* of the mpileup
        # normal_pileup = ""
        # tumor_pileup = ""
        mpileup = "mpileup/{sample}.mpileup.gz"
    output:
        snp = "vcf/{sample}.snp.vcf",
        indel = "vcf/{sample}.indel.vcf"
    message:
        "Calling somatic variants {wildcards.sample}"
    threads:
        1
    # optional specification of memory usage of the JVM that snakemake will respect with global
    # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
    # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
    # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
    resources:
        mem_mb=1024
    params:
        extra = ""
    wrapper:
        "v0.87.0/bio/varscan/somatic"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • varscan==2.4.3
  • snakemake-wrapper-utils==0.1.3
Input/Output

Input:

  • A pair of pileup files (Normal/Tumor)

Output:

  • A VCF file
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for varscan somatic"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"


import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts

# Defining logging and gathering extra parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

# Building output dirs
makedirs(op.dirname(snakemake.output.snp))
makedirs(op.dirname(snakemake.output.indel))

# Output prefix
prefix = op.splitext(snakemake.output.snp)[0]

# Searching for input files
pileup_pair = ["normal_pileup", "tumor_pileup"]

in_pileup = ""
mpileup = ""
if "mpileup" in snakemake.input.keys():
    # Case there is a mpileup with both normal and tumor
    in_pileup = snakemake.input.mpileup
    mpileup = "--mpileup 1"
elif all(pileup in snakemake.input.keys() for pileup in pileup_pair):
    # Case there are two separate pileup files
    in_pileup = " {snakemake.input.normal_pileup}" " {snakemakeinput.tumor_pileup} "
else:
    raise KeyError("Could not find either a mpileup, or a pair of pileup files")

shell(
    "varscan somatic"  # Tool and its subcommand
    " {in_pileup}"  # Path to input file(s)
    " {prefix}"  # Path to output
    " {java_opts} {extra}"  # Extra parameters
    " {mpileup}"
    " --output-snp {snakemake.output.snp}"  # Path to snp output file
    " --output-indel {snakemake.output.indel}"  # Path to indel output file
)

VCFTOOLS

For vcftools, the following wrappers are available:

VCFTOOLS FILTER

Filter vcf files using vcftools

URL:

Example

This wrapper can be used in the following way:

rule filter_vcf:
    input:
        "{sample}.vcf"
    output:
        "{sample}.filtered.vcf"
    params:
        extra="--chr 1 --recode-INFO-all"
    wrapper:
        "v0.87.0/bio/vcftools/filter"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vcftools==0.1.16
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
    input_flag = "--gzvcf"

output = " > " + snakemake.output[0]
if output.endswith(".gz"):
    output = " | gzip" + output

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vcftools "
    "{input_flag} "
    "{snakemake.input} "
    "{extra} "
    "--recode "
    "--stdout "
    "{output} "
    "{log}"
)

VEMBRANE

For vembrane, the following wrappers are available:

VEMBRANE FILTER

Vembrane filter allows to simultaneously filter variants based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. If no annotation entry remains, the entire variant is deleted. https://github.com/vembrane/vembrane

URL:

Example

This wrapper can be used in the following way:

rule vembrane_filter:
    input:
        vcf="in.vcf",
    output:
        vcf="filtered/out.vcf"
    params:
        expression="POS > 4000",
        extra=""
    log:
        "logs/vembrane.log"
    wrapper:
        "v0.87.0/bio/vembrane/filter"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vembrane=0.5.1
Input/Output

Input:

  • A VCF-formatted file

Output:

  • A VCF-formatted file
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for vembrane"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vembrane filter"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    " {snakemake.params.expression:q}"
    " {snakemake.input}"  # Path to input file
    " > {snakemake.output}"  # Path to output file
    " {log}"  # Logging behaviour
)
VEMBRANE TABLE

Vembrane table allows to generate table-like textfiles from vcfs based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. https://github.com/vembrane/vembrane

URL:

Example

This wrapper can be used in the following way:

rule vembrane_table:
    input:
        vcf="in.vcf",
    output:
        vcf="table/out.tsv"
    params:
        expression="CHROM, POS, ALT, REF",
        extra=""
    log:
        "logs/vembrane.log"
    wrapper:
        "v0.87.0/bio/vembrane/table"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vembrane=0.5.1
Input/Output

Input:

  • A VCF-formatted file

Output:

  • A table-like textfile
Authors
  • Christopher Schröder
Code
"""Snakemake wrapper for vembrane"""

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vembrane table"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    " {snakemake.params.expression:q}"
    " {snakemake.input}"  # Path to input file
    " > {snakemake.output}"  # Path to output file
    " {log}"  # Logging behaviour
)

VEP

For vep, the following wrappers are available:

VEP ANNOTATE

Annotate variant calls with VEP.

URL:

Example

This wrapper can be used in the following way:

rule annotate_variants:
    input:
        calls="variants.bcf",  # .vcf, .vcf.gz or .bcf
        cache="resources/vep/cache",  # can be omitted if fasta and gff are specified
        plugins="resources/vep/plugins",
        # optionally add reference genome fasta
        # fasta="genome.fasta",
        # fai="genome.fasta.fai", # fasta index
        # gff="annotation.gff",
        # csi="annotation.gff.csi", # tabix index
        # add mandatory aux-files required by some plugins if not present in the VEP plugin directory specified above.
        # aux files must be defined as following: "<plugin> = /path/to/file" where plugin must be in lowercase
        # revel = path/to/revel_scores.tsv.gz
    output:
        calls="variants.annotated.bcf",  # .vcf, .vcf.gz or .bcf
        stats="variants.html",
    params:
        # Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
        # Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
        plugins=["LoFtool"],
        extra="--everything",  # optional: extra arguments
    log:
        "logs/vep/annotate.log",
    threads: 4
    wrapper:
        "v0.87.0/bio/vep/annotate"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ensembl-vep=105
  • bcftools=1.12
Authors
  • Johannes Köster
  • Felix Mölder
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
from pathlib import Path
from snakemake.shell import shell


def get_only_child_dir(path):
    children = [child for child in path.iterdir() if child.is_dir()]
    assert (
        len(children) == 1
    ), "Invalid VEP cache directory, only a single entry is allowed, make sure that cache was created with the snakemake VEP cache wrapper"
    return children[0]


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fork = "--fork {}".format(snakemake.threads) if snakemake.threads > 1 else ""
stats = snakemake.output.stats
cache = snakemake.input.get("cache", "")
plugins = snakemake.input.plugins
plugin_aux_files = {"LoFtool": "LoFtool_scores.txt", "ExACpLI": "ExACpLI_values.txt"}

load_plugins = []
for plugin in snakemake.params.plugins:
    if plugin in plugin_aux_files.keys():
        aux_path = os.path.join(plugins, plugin_aux_files[plugin])
        load_plugins.append(",".join([plugin, aux_path]))
    else:
        load_plugins.append(",".join([plugin, snakemake.input.get(plugin.lower(), "")]))
load_plugins = " ".join(map("--plugin {}".format, load_plugins))

if snakemake.output.calls.endswith(".vcf.gz"):
    fmt = "z"
elif snakemake.output.calls.endswith(".bcf"):
    fmt = "b"
else:
    fmt = "v"

fasta = snakemake.input.get("fasta", "")
if fasta:
    fasta = "--fasta {}".format(fasta)

gff = snakemake.input.get("gff", "")
if gff:
    gff = "--gff {}".format(gff)

if cache:
    entrypath = get_only_child_dir(get_only_child_dir(Path(cache)))
    species = (
        entrypath.parent.name[:-7]
        if entrypath.parent.name.endswith("_refseq")
        else entrypath.parent.name
    )
    release, build = entrypath.name.split("_")
    cache = (
        "--offline --cache --dir_cache {cache} --cache_version {release} --species {species} --assembly {build}"
    ).format(cache=cache, release=release, build=build, species=species)

shell(
    "(bcftools view '{snakemake.input.calls}' | "
    "vep {extra} {fork} "
    "--format vcf "
    "--vcf "
    "{cache} "
    "{gff} "
    "{fasta} "
    "--dir_plugins {plugins} "
    "{load_plugins} "
    "--output_file STDOUT "
    "--stats_file {stats} | "
    "bcftools view -O{fmt} > {snakemake.output.calls}) {log}"
)
VEP DOWNLOAD CACHE

Download VEP cache for given species, build and release.

URL:

Example

This wrapper can be used in the following way:

rule get_vep_cache:
    output:
        directory("resources/vep/cache")
    params:
        species="saccharomyces_cerevisiae",
        build="R64-1-1",
        release="98"
    log:
        "logs/vep/cache.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "v0.87.0/bio/vep/cache"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • ensembl-vep=105
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "vep_install --AUTO cf "
    "--SPECIES {snakemake.params.species} "
    "--ASSEMBLY {snakemake.params.build} "
    "--VERSION {snakemake.params.release} "
    "--CACHEDIR {snakemake.output} "
    "--CONVERT "
    "--NO_UPDATE "
    "{extra} {log}"
)
VEP DOWNLOAD PLUGINS

Download VEP plugins.

URL:

Example

This wrapper can be used in the following way:

rule download_vep_plugins:
    output:
        directory("resources/vep/plugins")
    params:
        release=100
    wrapper:
        "v0.87.0/bio/vep/plugins"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • python=3
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import sys
from pathlib import Path
from urllib.request import urlretrieve
from zipfile import ZipFile
from tempfile import NamedTemporaryFile

if snakemake.log:
    sys.stderr = open(snakemake.log[0], "w")

outdir = Path(snakemake.output[0])
outdir.mkdir()

with NamedTemporaryFile() as tmp:
    urlretrieve(
        "https://github.com/Ensembl/VEP_plugins/archive/release/{release}.zip".format(
            release=snakemake.params.release
        ),
        tmp.name,
    )

    with ZipFile(tmp.name) as f:
        for member in f.infolist():
            memberpath = Path(member.filename)
            if len(memberpath.parts) == 1:
                # skip root dir
                continue
            targetpath = outdir / memberpath.relative_to(memberpath.parts[0])
            if member.is_dir():
                targetpath.mkdir()
            else:
                with open(targetpath, "wb") as out:
                    out.write(f.read(member.filename))

VERIFYBAMID

For verifybamid, the following wrappers are available:

VERIFYBAMID2

Run verifybamid2.

URL:

Example

This wrapper can be used in the following way:

rule verify_bam_id:
    input:
        bam="a.bam",
        ref="genome.fasta",
        # optional - this can be used to specify custom resource files if
        # necessary (if using GRCh37 or GRCh38 instead simply specify
        # params.genome_build="38", for example)
        # N.B. if svd_mu={prefix}.mu, then {prefix}.bed, {prefix}.UD, and
        # {prefix}.V must also exist
        svd_mu="ref.vcf.mu",
    output:
        selfsm="a.selfSM",
        ancestry="a.ancestry",
    params:
        # optional - see note for input.svd_mu
        # current choices are {37,38}
        # genome_build="38",
    log:
        "logs/verifybamid2/a.log",
    wrapper:
        "v0.87.0/bio/verifybamid/verifybamid2"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • verifybamid2==2.0.1
Input/Output

Input:

  • bam file

Output:

  • estimated intraspacies contamination
Notes
Authors
  • Brett Copeland
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os
from tempfile import TemporaryDirectory
from shutil import copyfile

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
svd_mu = snakemake.input.get("svd_mu", "")
if svd_mu:
    svd_prefix = os.path.splitext(svd_mu)[0]
    for suffix in ("bed", "UD", "V"):
        fn = f"{svd_prefix}.{suffix}"
        if not os.path.isfile(fn):
            raise Exception(f"Failed to find required input {fn}.")
else:
    genome_build = snakemake.params.get("genome_build", "38")
    if genome_build not in ("37", "38"):
        raise Exception(
            f"No svd_prefix given and improper {genome_build=} "
            f"given.  Valid choices are 37,38."
        )
    verifybamid2_found = False
    for path in os.getenv("PATH").split(os.path.pathsep):
        path_to_verifybamid2 = os.path.join(path, "verifybamid2")
        if os.path.isfile(path_to_verifybamid2):
            verifybamid2_found = True
            resources_directory = os.path.join(
                os.path.dirname(os.path.realpath(path_to_verifybamid2)), "resource"
            )
            svd_prefix = os.path.join(
                resources_directory, f"1000g.phase3.100k.b{genome_build}.vcf.gz.dat"
            )
            break
    if not verifybamid2_found:
        raise Exception("Failed to find verifybamid2 location.")


def move_file(src, dst):
    "this function will move `fn` while respecting ACLs in the target directory"
    copyfile(src, dst)
    os.remove(src)


# verifybamid2 outputs results to result.selfSM and result.Ancestry in the working directory,
# so to avoid collisions we have to run it from a temporary directory and fix the paths
# to inputs, outputs, and the log file
ref_path = os.path.abspath(snakemake.input.ref)
svd_prefix = os.path.abspath(svd_prefix)
bam_path = os.path.abspath(snakemake.input.bam)
selfsm_path = os.path.abspath(snakemake.output.selfsm)
ancestry_path = os.path.abspath(snakemake.output.ancestry)
if snakemake.log:
    snakemake.log[0] = os.path.abspath(snakemake.log[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with TemporaryDirectory() as tmp_dir:
    os.chdir(tmp_dir)
    shell(
        "verifybamid2 --SVDPrefix {svd_prefix} "
        "--Reference {ref_path} --BamFile {bam_path} {extra} "
        "--NumThread {snakemake.threads} {log}"
    )
    move_file("result.selfSM", selfsm_path)
    move_file("result.Ancestry", ancestry_path)

VG

For vg, the following wrappers are available:

VG CONSTRUCT

Construct variation graphs from a reference and variant calls.

URL:

Example

This wrapper can be used in the following way:

rule construct:
    input:
        ref="c.fa",
        vcfgz="c.vcf.gz"
    output:
        vg="graph/c.vg"
    params:
        "--node-max 10"
    log:
        "logs/vg/construct/c.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/construct"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg construct {snakemake.params} --reference {snakemake.input.ref}"
    " --vcf {snakemake.input.vcfgz} --threads {snakemake.threads}"
    " > {snakemake.output.vg}) {log}"
)
VG IDS

Manipulate id space of input graphs. NOTE Use bio/vg/merge for making a joint id space for graphs.

URL:

Example

This wrapper can be used in the following way:

rule ids:
    input:
        vgs="c.vg"
    output:
        mod="graph/c_mod.vg"
    log:
        "logs/vg/ids/c.log"
    wrapper:
        "v0.87.0/bio/vg/ids"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg ids {snakemake.params} {snakemake.input.vgs}"
    " > {snakemake.output.mod}) {log}"
)
VG INDEX GCSA

Build GCSA index for variation graphs.

URL:

Example

This wrapper can be used in the following way:

rule gcsa:
    input:
        vgs=["x.vg", "c.vg"]
    output:
        gcsa="index/wg.gcsa"
    params:
        "-Z 3000 -X 3"
    log:
        "logs/vg/index/gcsa/wg.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/index/gcsa"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell()

shell(
    "(vg index -g {snakemake.output.gcsa} --threads {snakemake.threads}"
    " {snakemake.params} {snakemake.input.vgs}) {log}"
)
VG INDEX XG

Create an xg index on variation graphs.

URL:

Example

This wrapper can be used in the following way:

rule xg:
    input:
        vgs="x.vg"
    output:
        xg="index/x.xg"
    log:
        "logs/vg/index/xg/x.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/index/xg"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell()

shell(
    "(vg index --xg-name {snakemake.output.xg} --threads {snakemake.threads}"
    " {snakemake.params} {snakemake.input.vgs}) {log}"
)
VG KMERS

Generates kmers from both strands of variation graphs.

URL:

Example

This wrapper can be used in the following way:

rule kmers:
    input:
        vgs="c.vg"
    output:
        kmers="kmers/c.kmers"
    params:
       "-gBk 16 -H 1000000000 -T 1000000001"
    log:
        "logs/vg/kmers/c.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/kmers"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg kmers {snakemake.params} --threads {snakemake.threads}"
    " {snakemake.input.vgs} > {snakemake.output.kmers}) {log}"
)
VG MERGE

Generate a joint id space across each graph and merge them all.

URL:

Example

This wrapper can be used in the following way:

rule merge:
    input:
        vgs=["c.vg", "x.vg"]
    output:
        merged="graph/wg.vg"
    log:
        "logs/vg/merge/wg.log"
    wrapper:
        "v0.87.0/bio/vg/merge"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg ids --join {snakemake.input.vgs} &&"
    " for VGFILE in {snakemake.input.vgs};"
    " do cat $VGFILE >> {snakemake.output.merged};"
    " done) {log}"
)
VG PRUNE

Prunes the complex regions of the graph for GCSA2 indexing.

URL:

Example

This wrapper can be used in the following way:

rule prune:
    input:
        vg="c.vg"
    output:
        pruned="graph/c.pruned.vg"
    params:
        "-r"
    log:
        "logs/vg/prune/c.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/prune"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg prune --threads {snakemake.threads} {snakemake.params}"
    " {snakemake.input.vg} > {snakemake.output.pruned}) {log}"
)
VG SIM

Samples sequences from the xg-indexed graph.

URL:

Example

This wrapper can be used in the following way:

rule sim:
    input:
        xg="x.xg"
    output:
        reads="reads/x.seq"
    params:
        "--read-length 100 --num-reads 100 -f"
    log:
        "logs/vg/sim/x.log"
    threads:
        4
    wrapper:
        "v0.87.0/bio/vg/sim"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • vg==1.27.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2018, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False)

shell(
    "(vg sim {snakemake.params} --xg-name {snakemake.input.xg}"
    " --threads {snakemake.threads} > {snakemake.output.reads}) {log}"
)

WGSIM

Short read simulator.

URL:

Example

This wrapper can be used in the following way:

rule wgsim:
    input:
        ref="genome.fa"
    output:
        read1="reads/1.fq",
        read2="reads/2.fq"
    log:
        "logs/wgsim/sim.log"
    params:
        "-X 0 -R 0 -r 0.1 -h"
    wrapper:
        "v0.87.0/bio/wgsim"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies
  • wgsim==1.0.0
Authors
  • Ali Ghaffaari
Code
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2018, Ali Ghaffaari"
__email__ = "ali.ghaffaari@mpi-inf.mpg.de"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell()

shell(
    "(wgsim {snakemake.params} {snakemake.input.ref}"
    " {snakemake.output.read1} {snakemake.output.read2}) {log}"
)

Meta-Wrappers

Meta-wrappers offer curated and tested combinations of Wrappers that fulfil common tasks with popular tools, in a best-practice way. For using them, simply copy-paste the offered snippets into your Snakemake workflow.

The menu on the left (expand by clicking (+) if necessary), lists all available meta-wrappers.

BWA_MAPPING

Map reads with bwa-mem and index with samtools index - this is just a test for subworkflows

Example

This meta-wrapper can be used by integrating the following into your workflow:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="samtools",             # Can be 'none', 'samtools' or 'picard'.
        sort_order="coordinate",  # Can be 'queryname' or 'coordinate'.
        sort_extra=""            # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "v0.87.0/bio/bwa/mem"

rule samtools_index:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.bam.bai"
    log:
        "logs/samtools_index/{sample}.log"
    params:
        "" # optional params string
    wrapper:
        "v0.87.0/bio/samtools/index"

Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Used wrappers

The following individual wrappers are used in this meta-wrapper:

Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.

Authors
  • Jan Forster

DADA2-PE

A subworkflow for processing paired-end sequences from metabarcoding projects in order to construct ASV tables using DADA2. The example is based on the data provided by the R package. For more details, see the official website and the tutorial.

Example

This meta-wrapper can be used by integrating the following into your workflow:

# Make sure that you set the `truncLen=` option in the rule `dada2_filter_and_trim_pe` according
# to the results of the quality profile checks (after rule `dada2_quality_profile_pe` has finished on all samples).
# If in doubt, check https://benjjneb.github.io/dada2/tutorial.html#inspect-read-quality-profiles

rule all:
    input:
        # In a first run of this meta-wrapper, comment out all other inputs and only keep this one.
        # Looking at the resulting plot, adjust the `truncLen` in rule `dada2_filter_trim_pe` and then
        # rerun with all inputs uncommented.
        expand(
            "reports/dada2/quality-profile/{sample}-quality-profile.png",
            sample=["a","b"]
        ),
        "results/dada2/taxa.RDS"

rule dada2_quality_profile_pe:
    input:
        # FASTQ file without primer sequences
        expand("trimmed/{{sample}}.{orientation}.fastq.gz",orientation=[1,2])
    output:
        "reports/dada2/quality-profile/{sample}-quality-profile.png"
    log:
        "logs/dada2/quality-profile/{sample}-quality-profile-pe.log"
    wrapper:
        "v0.87.0/bio/dada2/quality-profile"

rule dada2_filter_trim_pe:
    input:
        # Paired-end files without primer sequences
        fwd="trimmed/{sample}.1.fastq.gz",
        rev="trimmed/{sample}.2.fastq.gz"
    output:
        filt="filtered-pe/{sample}.1.fastq.gz",
        filt_rev="filtered-pe/{sample}.2.fastq.gz",
        stats="reports/dada2/filter-trim-pe/{sample}.tsv"
    params:
        # Set the maximum expected errors tolerated in filtered reads
        maxEE=1,
        # Set the number of kept bases in forward and reverse reads
        truncLen=[240,200]
    log:
        "logs/dada2/filter-trim-pe/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/filter-trim"

rule dada2_learn_errors:
    input:
    # Quality filtered and trimmed forward FASTQ files (potentially compressed)
        expand("filtered-pe/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
    output:
        err="results/dada2/model_{orientation}.RDS",# save the error model
        plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
    params:
        randomize=True
    log:
        "logs/dada2/learn-errors/learn-errors_{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/learn-errors"

rule dada2_dereplicate_fastq:
    input:
    # Quality filtered FASTQ file
        "filtered-pe/{fastq}.fastq.gz"
    output:
    # Dereplicated sequences stored as `derep-class` object in a RDS file
        "uniques/{fastq}.RDS"
    log:
        "logs/dada2/dereplicate-fastq/{fastq}.log"
    wrapper:
        "v0.87.0/bio/dada2/dereplicate-fastq"

rule dada2_sample_inference:
    input:
    # Dereplicated (aka unique) sequences of the sample
        derep="uniques/{sample}.{orientation}.RDS",
        err="results/dada2/model_{orientation}.RDS" # Error model
    output:
        "denoised/{sample}.{orientation}.RDS" # Inferred sample composition
    log:
        "logs/dada2/sample-inference/{sample}.{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/sample-inference"

rule dada2_merge_pairs:
    input:
      dadaF="denoised/{sample}.1.RDS",# Inferred composition
      dadaR="denoised/{sample}.2.RDS",
      derepF="uniques/{sample}.1.RDS",# Dereplicated sequences
      derepR="uniques/{sample}.2.RDS"
    output:
        "merged/{sample}.RDS"
    log:
        "logs/dada2/merge-pairs/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/merge-pairs"

rule dada2_make_table_pe:
    input:
    # Merged composition
        expand("merged/{sample}.RDS", sample=['a','b'])
    output:
        "results/dada2/seqTab-pe.RDS"
    params:
        names=['a','b'], # Sample names instead of paths
        orderBy="nsamples" # Change the ordering of samples
    log:
        "logs/dada2/make-table/make-table-pe.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/make-table"

rule dada2_remove_chimeras:
    input:
        "results/dada2/seqTab-pe.RDS" # Sequence table
    output:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    log:
        "logs/dada2/remove-chimeras/remove-chimeras.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/remove-chimeras"

rule dada2_collapse_nomismatch:
    input:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    output:
        "results/dada2/seqTab.collapsed.RDS"
    log:
        "logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/collapse-nomismatch"

rule dada2_assign_taxonomy:
    input:
        seqs="results/dada2/seqTab.collapsed.RDS", # Chimera-free sequence table
        refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
    output:
        "results/dada2/taxa.RDS" # Taxonomic assignments
    log:
        "logs/dada2/assign-taxonomy/assign-taxonomy.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/assign-taxonomy"

Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Used wrappers

The following individual wrappers are used in this meta-wrapper:

Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.

Authors
  • Charlie Pauvert

DADA2-SE

A subworkflow for processing single-end sequences from metabarcoding projects in order to construct ASV tables using DADA2. The example is based on the data provided in the R package. For more details, see the official website. While the tutorial is tailored for paired-end sequences, useful information can be found regarding common functions to singled-end sequences processing.

Example

This meta-wrapper can be used by integrating the following into your workflow:

# Make sure that you set the `truncLen=` option in the rule `dada2_filter_and_trim_se` according
# to the results of the quality profile checks (after rule `dada2_quality_profile_se` has finished on all samples).
# If in doubt, check https://benjjneb.github.io/dada2/tutorial.html#inspect-read-quality-profiles

rule all:
    input:
        # In a first run of this meta-wrapper, comment out all other inputs and only keep this one.
        # Looking at the resulting plot, adjust the `truncLen` in rule `dada2_filter_trim_se` and then
        # rerun with all inputs uncommented.
        expand(
            "reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png",
            sample=["a","b"], orientation=1
        ),
        "results/dada2/taxa.RDS"

rule dada2_quality_profile_se:
    input:
        # FASTQ file without primer sequences
        "trimmed/{sample}.{orientation}.fastq.gz"
    output:
        "reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png"
    log:
        "logs/dada2/quality-profile/{sample}.{orientation}-quality-profile-se.log"
    wrapper:
        "v0.87.0/bio/dada2/quality-profile"

rule dada2_filter_trim_se:
    input:
        # Single-end files without primer sequences
        fwd="trimmed/{sample}.1.fastq.gz"
    output:
        filt="filtered-se/{sample}.1.fastq.gz",
        stats="reports/dada2/filter-trim-se/{sample}.tsv"
    params:
        # Set the maximum expected errors tolerated in filtered reads
        maxEE=1,
        # Set the number of kept bases
        truncLen=240
    log:
        "logs/dada2/filter-trim-se/{sample}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/filter-trim"

rule dada2_learn_errors:
    input:
    # Quality filtered and trimmed forward FASTQ files (potentially compressed)
        expand("filtered-se/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
    output:
        err="results/dada2/model_{orientation}.RDS",# save the error model
        plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
    params:
        randomize=True
    log:
        "logs/dada2/learn-errors/learn-errors_{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/learn-errors"

rule dada2_dereplicate_fastq:
    input:
    # Quality filtered FASTQ file
        "filtered-se/{fastq}.fastq.gz"
    output:
    # Dereplicated sequences stored as `derep-class` object in a RDS file
        "uniques/{fastq}.RDS"
    log:
        "logs/dada2/dereplicate-fastq/{fastq}.log"
    wrapper:
        "v0.87.0/bio/dada2/dereplicate-fastq"

rule dada2_sample_inference:
    input:
    # Dereplicated (aka unique) sequences of the sample
        derep="uniques/{sample}.{orientation}.RDS",
        err="results/dada2/model_{orientation}.RDS" # Error model
    output:
        "denoised/{sample}.{orientation}.RDS" # Inferred sample composition
    log:
        "logs/dada2/sample-inference/{sample}.{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/sample-inference"

rule dada2_make_table_se:
    input:
    # Inferred composition
        expand("denoised/{sample}.1.RDS", sample=['a','b'])
    output:
        "results/dada2/seqTab-se.RDS"
    params:
        names=['a','b'] # Sample names instead of paths
    log:
        "logs/dada2/make-table/make-table-se.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/make-table"

rule dada2_remove_chimeras:
    input:
        "results/dada2/seqTab-se.RDS" # Sequence table
    output:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    log:
        "logs/dada2/remove-chimeras/remove-chimeras.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/remove-chimeras"

rule dada2_collapse_nomismatch:
    input:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    output:
        "results/dada2/seqTab.collapsed.RDS"
    log:
        "logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/collapse-nomismatch"

rule dada2_assign_taxonomy:
    input:
        seqs="results/dada2/seqTab.collapsed.RDS", # Chimera-free sequence table
        refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
    output:
        "results/dada2/taxa.RDS" # Taxonomic assignments
    log:
        "logs/dada2/assign-taxonomy/assign-taxonomy.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v0.87.0/bio/dada2/assign-taxonomy"

Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Used wrappers

The following individual wrappers are used in this meta-wrapper:

Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.

Authors
  • Charlie Pauvert

STAR-ARRIBA

A subworkflow for fusion detection from RNA-seq data with arriba. The fusion calling is based on splice-aware, chimeric alignments done with STAR. STAR is used with specific parameters to ensure optimal functionality of the arriba fusion detection, for details, see the documentation.

Example

This meta-wrapper can be used by integrating the following into your workflow:

rule star_index:
    input:
        fasta="resources/genome.fasta",
        annotation="resources/genome.gtf"
    output:
        directory("resources/star_genome")
    threads: 4
    params:
        extra="--sjdbGTFfile resources/genome.gtf --sjdbOverhang 100"
    log:
        "logs/star_index_genome.log"
    cache: True
    wrapper:
        "v0.87.0/bio/star/index"

rule star_align:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1="reads/{sample}_R1.1.fastq",
        fq2="reads/{sample}_R2.1.fastq", #optional
        index="resources/star_genome"
    output:
        # see STAR manual for additional output files
        "star/{sample}/Aligned.out.bam",
        "star/{sample}/ReadsPerGene.out.tab"
    log:
        "logs/star/{sample}.log"
    params:
        # path to STAR reference genome index
        index="resources/star_genome",
        # specific parameters to work well with arriba
        extra="--quantMode GeneCounts --sjdbGTFfile resources/genome.gtf"
            " --outSAMtype BAM Unsorted --chimSegmentMin 10 --chimOutType WithinBAM SoftClip"
            " --chimJunctionOverhangMin 10 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0"
            " --chimScoreSeparation 1 --alignSJstitchMismatchNmax 5 -1 5 5 --chimSegmentReadGapMax 3"
    threads: 12
    wrapper:
        "v0.87.0/bio/star/align"

rule arriba:
    input:
        bam="star/{sample}/Aligned.out.bam",
        genome="resources/genome.fasta",
        annotation="resources/genome.gtf"
    output:
        fusions="results/arriba/{sample}.fusions.tsv",
        discarded="results/arriba/{sample}.fusions.discarded.tsv"
    params:
        # A tsv containing identified artifacts, such as read-through fusions of neighbouring genes, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
        blacklist="arriba_blacklist.tsv",
        extra="-T -P -i 1,2" # -i describes the wanted contigs, remove if you want to use all hg38 chromosomes
    log:
        "logs/arriba/{sample}.log"
    threads: 1
    wrapper:
        "v0.87.0/bio/arriba"

Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Used wrappers

The following individual wrappers are used in this meta-wrapper:

Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.

Authors
  • Jan Forster

Contributing

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:

  1. Fork the repository: https://github.com/snakemake/snakemake-wrappers
  2. Clone your fork locally.
  3. Locally, create a new branch: git checkout -b my-new-snakemake-wrapper
  4. Commit your contributions to that branch and push them to your fork: git push -u origin my-new-snakemake-wrapper
  5. Create a pull request.

The pull request will be reviewed and included as fast as possible. If your pull request does not get a review quickly, you can @mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/> previous contributors to a particular wrapper (git blame) or regular contributors that you think might be able to give a review. Contributions should follow the coding style of the already present examples, i.e.:

  • provide a meta.yaml that describes the wrapper (see the meta.yaml documentation below)
  • provide an environment.yaml which lists all required software packages and follows the respective best practices. The packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge. Other sustainable community maintained channels are possible as well.
  • add a wrapper.py or wrapper.R file that can deal with arbitrary input: and output: paths.
  • provide a minimal test case in a subfolder called test, with an example Snakefile that shows how to use the wrapper (rule names should be descriptive and written in snake_case), some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test in test.py
  • ensure consistent formatting of Python files and linting of Snakefiles.

meta.yaml file

The following fields are available to use in the wrapper meta.yaml file. All, except those marked optional, should be provided.

  • name: The name of the wrapper.
  • description: a description of what the wrapper does.
  • url: URL to the wrapper tool webpage.
  • authors: A sequence of names of the people who have contributed to the wrapper.
  • input: A mapping or sequence of required inputs for the wrapper.
  • output: A mapping or sequence of output(s) from the wrapper.
  • params (optional): A mapping of parameters that can be used in the wrapper’s params directive. If no parameters are used for the wrapper, this field can be omitted.
  • notes (optional): Anything of note that does not fit into the scope of the other fields.

You can add a newline to the rendered text in these fields with the addition of |nl|

Example
name: seqtk mergepe
description: Interleave two paired-end FASTA/Q files
url: https://github.com/lh3/seqtk
authors:
  - Michael Hall
input:
  - paired fastq files - can be compressed.
output:
  - >
    a single, interleaved FASTA/Q file. By default, the output will be compressed,
    use the param ``compress_lvl`` to change this.
params:
  compress_lvl: >
    Regulate the speed of compression using the specified digit,
    where 1 indicates the fastest compression method (less compression)
    and 9 indicates the slowest compression method (best compression).
    0 is no compression. 11 gives a few percent better compression at a severe cost
    in execution time, using the zopfli algorithm. The default is 6.
notes: Multiple threads can be used during compression of the output file with ``pigz``.

Formatting

Please ensure Python files such as test.py and wrapper.py are formatted with black. Additionally, please format your test Snakefile with snakefmt.

Linting

Please lint your test Snakefile with:

snakemake -s <path/to/wrapper/test/Snakefile> --lint

Testing locally

If you want to debug your contribution locally (before creating a pull request), you can install all dependencies with mamba (or conda). Install miniconda with the channels as described for bioconda and set up an environment with the necessary dependencies and activate it:

mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black
conda activate test-snakemake-wrappers

Afterwards, from the main directory of the repo, you can run the test(s) for your contribution by specifying an expression that matches the name(s) of your test(s) via the -k option of pytest:

pytest test.py -v -k your_test

If you also want to test the docs generation locally, create another environment and activate it:

mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
conda activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs:

cd docs
make html

If it runs through, you can open the main page at docs/_build/html/index.html in a web browser. If you want to start fresh, you can clean up the build with make clean.