The Snakemake Wrappers repository

https://img.shields.io/badge/snakemake-≥5.7.0-brightgreen.svg?style=flat-square https://github.com/snakemake/snakemake-wrappers/workflows/CI/badge.svg?branch=master

The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.

Usage

The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads: 8
    wrapper:
        "0.2.0/bio/samtools/sort"

Here, Snakemake will automatically download the corresponding wrapper from https://bitbucket.org/snakemake/snakemake-wrappers/src/0.2.0/bio/samtools/sort/wrapper.py. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id. This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://.

Each wrapper defines required software packages and versions. In combination with the --use-conda flag of Snakemake, these will be deployed automatically.

Contribute

We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:

  1. Fork the repository: https://github.com/snakemake/snakemake-wrappers
  2. Clone your fork locally.
  3. Locally, create a new branch: git checkout -b my-new-snakemake-wrapper
  4. Commit your contributions to that branch and push them to your fork: git push -u origin my-new-snakemake-wrapper
  5. Create a pull request.

The pull request will be reviewed and included as fast as possible. Contributions should follow the coding style of the already present examples, i.e.:

  • provide a meta.yaml with name, description and author(s) of the wrapper
  • provide an environment.yaml which lists all required software packages (the packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge. Other sustainable community maintained channels are possible as well.)
  • provide a minimal test case in a subfolder called test, with an example Snakefile that shows how to use the wrapper, some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test in test.py
  • follow the python style guide, using 4 spaces for indentation.

Testing locally

If you want to debug your contribution locally, before creating a pull request, we recommend adding your test case to the start of the list in test.py, so that it runs first. Then, install miniconda with the channels as described for bioconda and set up an environment with the necessary dependencies and activate it:

conda create -n test-snakemake-wrappers snakemake pytest conda
conda activate test-snakemake-wrappers

Afterwards, from the main directory of the repo, you can run the tests with:

pytest test.py -v

If you use a keyboard interrupt after your test has failed, you will get all the relevant stdout and stderr messages printed.

If you also want to test the docs generation locally, create another environment and activate it:

conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml
conda activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs:

cd docs
make html

If it runs through, you can open the main page at docs/_build/html/index.html in a web browser. If you want to start fresh, you can clean up the build with make clean.

ARRIBA

Detect gene fusions from chimeric STAR output

Software dependencies
  • arriba ==1.1.0
Example

This wrapper can be used in the following way:

rule arriba:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fasta",
        # path to annotation gtf
        annotation="annotation.gtf",
    output:
        # approved gene fusions
        fusions="fusions/{sample}.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.discarded.tsv" # optional
    log:
        "logs/arriba/{sample}.log"
    params:
        # arriba blacklist file
        blacklist="blacklist.tsv", # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
        # file containing known fusions
        known_fusions="", # optional
        # file containing information from structural variant analysis
        sv_file="", # optional
        # optional parameters
        extra="-T -P -i 1,2"
    threads: 1
    wrapper:
        "0.53.0/bio/arriba"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
    discarded_cmd = "-O " + discarded_fusions
else:
    discarded_cmd = ""

blacklist = snakemake.params.get("blacklist")
if blacklist:
    blacklist_cmd = "-b " + blacklist
else:
    blacklist_cmd = ""

known_fusions = snakemake.params.get("known_fusions")
if known_fusions:
    known_cmd = "-k" + known_fusions
else:
    known_cmd = ""

sv_file = snakemake.params.get("sv_file")
if sv_file:
    sv_cmd = "-d" + sv_file
else:
    sv_cmd = ""

shell(
    "arriba "
    "-x {snakemake.input.bam} "
    "-a {snakemake.input.genome} "
    "-g {snakemake.input.annotation} "
    "{blacklist_cmd} "
    "{known_cmd} "
    "{sv_cmd} "
    "-o {snakemake.output.fusions} "
    "{discarded_cmd} "
    "{extra} "
    "{log}"
)

ART

For art, the following wrappers are available:

ART_PROFILER_ILLUMINA

Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.

Software dependencies
  • art ==2016.06.05
Example

This wrapper can be used in the following way:

rule art_profiler_illumina:
    input:
        "data/{sample}.fq",
    output:
        "profiles/{sample}.txt"
    log:
        "logs/art_profiler_illumina/{sample}.log"
    params: ""
    threads: 2
    wrapper:
        "0.53.0/bio/art/profiler_illumina"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


from snakemake.shell import shell
import os
import tempfile
import re


# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
    # ensure that .fastq and .fastq.gz input files work, as well
    filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")

    # figure out the exact file extension after the above substitution
    ext = re.search("fq(\.gz)?$", filename)
    if ext:
        fq_extension = ext.group(0)
    else:
        raise IOError(
            "Incompatible extension: This art_profiler_illumina "
            "wrapper requires input files with one of the following "
            "extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
            "your input and the invocation of the wrapper accordingly."
        )

    os.symlink(
        # snakemake paths are relative, but the symlink needs to be absolute
        os.path.abspath(snakemake.input[0]),
        # the following awkward file name generation has reasons:
        # * the file name needs to be unique to the execution of the
        #   rule, as art will create and mv temporary files with its basename
        #   in the output directory, which causes utter confusion when
        #   executing instances of the rule in parallel
        # * temp file name cannot have any read infixes before the file
        #   extension, because otherwise art does read enumeration magic
        #   that messes up output file naming
        os.path.join(
            temp_input,
            filename.replace(
                "." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
            ),
        ),
    )

    # include output folder name in the profile_name command line argument and
    # strip off the file extension, as art will add its own ".txt"
    profile_name = os.path.join(
        os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
    )

    shell(
        "( art_profiler_illumina {snakemake.params} {profile_name}"
        " {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
    )

BCFTOOLS

For bcftools, the following wrappers are available:

BCFTOOLS CALL

Call variants with bcftools.

Software dependencies
  • samtools ==1.10
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_call:
    input:
        ref="genome.fasta",
        samples=expand("mapped/{sample}.sorted.bam", sample=config["samples"]),
        indexes=expand("mapped/{sample}.sorted.bam.bai", sample=config["samples"])
    output:
        # Here, we optionally use a region as wildcard and constrain it to the
        # format accepted by samtools mpileup.
        "called/{region,.+(:[0-9]+-[0-9]+)?}.bcf"
    params:
        # Optional parameters for samtools mpileup (except -g, -f).
        # In this example, we forward the region wildcard from the output file to mpileup.
        mpileup="--region {region}",
        # Optional parameters for bcftools call (except -v, -o, -m).
        call=""
    log:
        "logs/bcftools_call/{region}.log"
    wrapper:
        "0.53.0/bio/bcftools/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "(samtools mpileup {snakemake.params.mpileup} {snakemake.input.samples} "
    "--fasta-ref {snakemake.input.ref} --BCF --uncompressed | "
    "bcftools call -m {snakemake.params.call} -o {snakemake.output[0]} -v -) 2> {snakemake.log}"
)
BCFTOOLS CONCAT

Concatenate vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_concat:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.53.0/bio/bcftools/concat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools concat {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}"
)
BCFTOOLS INDEX

Index vcf/bcf file.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_index:
    input:
        "a.bcf"
    output:
        "a.bcf.csi"
    params:
        extra=""  # optional parameters for bcftools index
    wrapper:
        "0.53.0/bio/bcftools/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

shell("bcftools index" " {extra}" " {snakemake.input[0]}")
BCFTOOLS MERGE

Merge vcf/bcf files with bcftools.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_merge:
    input:
        calls=["a.bcf", "b.bcf"]
    output:
        "all.bcf"
    params:
        ""  # optional parameters for bcftools concat (except -o)
    wrapper:
        "0.53.0/bio/bcftools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell(
    "bcftools merge {snakemake.params} -o {snakemake.output[0]} "
    "{snakemake.input.calls}"
)
BCFTOOLS NORM

Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule norm_vcf:
    input:
        "{prefix}.vcf"
    output:
        "{prefix}.vcf"
    params:
        ""  # optional parameters for bcftools norm (except -o)
    wrapper:
        "0.53.0/bio/bcftools/norm"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Dayne Filer
Code
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools norm {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)
BCFTOOLS REHEADER

Change header or sample names of vcf/bcf file.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcftools_reheader:
    input:
        vcf="a.bcf",
        ## new header, can be omitted if "samples" is set
        header="header.txt",
        ## file containing new sample names, can be omitted if "header" is set
        samples="samples.tsv"
    output:
        "a.reheader.bcf"
    params:
        extra="",  # optional parameters for bcftools reheader
        view_extra="-O b"  # add output format for internal bcftools view call
    wrapper:
        "0.53.0/bio/bcftools/reheader"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


from snakemake.shell import shell

## Extract arguments
header = snakemake.input.get("header", "")
if header:
    header_cmd = "-h " + header
else:
    header_cmd = ""

samples = snakemake.input.get("samples", "")
if samples:
    samples_cmd = "-s " + samples
else:
    samples_cmd = ""

extra = snakemake.params.get("extra", "")
view_extra = snakemake.params.get("view_extra", "")

shell(
    "bcftools reheader "
    "{extra} "
    "{header_cmd} "
    "{samples_cmd} "
    "{snakemake.input.vcf} "
    "| bcftools view "
    "{view_extra} "
    "> {snakemake.output}"
)
BCFTOOLS VIEW

View vcf/bcf file in a different format.

Software dependencies
  • bcftools ==1.10
Example

This wrapper can be used in the following way:

rule bcf_to_vcf:
    input:
        "{prefix}.bcf"
    output:
        "{prefix}.vcf"
    params:
        ""  # optional parameters for bcftools view (except -o)
    wrapper:
        "0.53.0/bio/bcftools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "bcftools view {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)

BEDTOOLS

For bedtools, the following wrappers are available:

COVERAGEBED

Returns the depth and breadth of coverage of features from B on the intervals in A.

Software dependencies
  • bedtools ==2.29.0
Example

This wrapper can be used in the following way:

rule coverageBed:
    input:
        a="bed/{sample}.bed",
        b="mapped/{sample}.bam"
    output:
        "stats/{sample}.cov"
    log:
        "logs/coveragebed/{sample}.log"
    params:
        extra=""  # optional parameters
    threads: 8
    wrapper:
        "0.53.0/bio/bedtools/coveragebed"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

input_a = snakemake.input.a
input_b = snakemake.input.b

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one file: " + str(output_file) + "!")

shell(
    "coverageBed"
    " -a {input_a}"
    " -b {input_b}"
    " {extra_params}"
    " > {output_file}"
    " {log}"
)
BEDTOOLS INTERSECT

Intersect BED/BAM/VCF files with bedtools.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        left="A.bed",
        right="B.bed"
    output:
        "A_B.intersected.bed"
    params:
        ## Add optional parameters
        extra="-wa -wb" ## In this example, we want to write original entries in A and B for each overlap.
    log:
        "logs/intersect/A_B.log"
    wrapper:
        "0.53.0/bio/bedtools/intersect"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools intersect"
    " {extra}"
    " -a {snakemake.input.left}"
    " -b {snakemake.input.right}"
    " > {snakemake.output})"
    " {log}"
)
BEDTOOLS MERGE

Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        "A.bed"
    output:
        "A.merged.bed"
    params:
        ## Add optional parameters
        extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
    log:
        "logs/merge/A.log"
    wrapper:
        "0.53.0/bio/bedtools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "( bedtools merge"
    " {extra}"
    " -i {snakemake.input}"
    " > {snakemake.output})"
    " {log}"
)
BEDTOOLS SLOP

Increase the size of each feature in a BED/BAM/VCF by a specified factor.

Software dependencies
  • bedtools =2.29.0
Example

This wrapper can be used in the following way:

rule bedtools_merge:
    input:
        "A.bed"
    output:
        "A.slop.bed"
    params:
        ## Genome file, tab-seperated file defining the length of every contig
        genome="genome.txt",
        ## Add optional parameters
        extra = "-b 10" ## in this example, we want to increase the feature by 10 bases to both sides
    log:
        "logs/slop/A.log"
    wrapper:
        "0.53.0/bio/bedtools/slop"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(bedtools slop"
    " {extra}"
    " -i {snakemake.input[0]}"
    " -g {snakemake.params.genome}"
    " > {snakemake.output})"
    " {log}"
)

BISMARK

For bismark, the following wrappers are available:

BAM2NUC

Calculate mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence composition (see https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Nucleotide stats for genome is required for further stats for BAM file
rule bam2nuc_for_genome:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz"
    output:
        "indexes/{genome}/genomic_nucleotide_frequencies.txt"
    log:
        "logs/indexes/{genome}/genomic_nucleotide_frequencies.txt.log"
    wrapper:
        "0.53.0/bio/bismark/bam2nuc"

# Nucleotide stats for BAM file
rule bam2nuc_for_bam:
    input:
        genome_fa="indexes/{genome}/{genome}.fa.gz",
        bam="bams/{sample}_{genome}.bam"
    output:
        report="bams/{sample}_{genome}.nucleotide_stats.txt"
    log:
        "logs/{sample}_{genome}.nucleotide_stats.txt.log"
    wrapper:
        "0.53.0/bio/bismark/bam2nuc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for bam2nuc tool that calculates mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence
composition."""
# https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmdline_args = ["bam2nuc {extra}"]

genome_fa = snakemake.input.get("genome_fa", None)
if not genome_fa:
    raise ValueError("bismark/bam2nuc: Error 'genome_fa' input not specified.")
genome_folder = os.path.dirname(genome_fa)
cmdline_args.append("--genome_folder {genome_folder:q}")


bam = snakemake.input.get("bam", None)
if bam:
    cmdline_args.append("{bam}")
    bams = bam if isinstance(bam, list) else [bam]

    report = snakemake.output.get("report", None)
    if not report:
        raise ValueError("bismark/bam2nuc: Error 'report' output isn't specified.")

    reports = report if isinstance(report, list) else [report]
    if len(reports) != len(bams):
        raise ValueError(
            "bismark/bam2nuc: Error number of paths in output:report ({} files)"
            " should be same as in input:bam ({} files).".format(
                len(reports), len(bams)
            )
        )
    output_dir = os.path.dirname(reports[0])
    if any(output_dir != os.path.dirname(p) for p in reports):
        raise ValueError(
            "bismark/bam2nuc: Error all reports should be in same directory:"
            " {}".format(output_dir)
        )
    if output_dir:
        cmdline_args.append("--dir {output_dir:q}")
else:
    cmdline_args.append("--genomic_composition_only")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))


# Move outputs into proper position.
if bam:
    log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)

    expected_2_actual_paths = []
    for bam_path, report_path in zip(bams, reports):
        bam_name = os.path.basename(bam_path)
        bam_basename = os.path.splitext(bam_name)[0]
        expected_2_actual_paths.append(
            (
                report_path,
                os.path.join(
                    output_dir, "{}.nucleotide_stats.txt".format(bam_basename)
                ),
            )
        )

    for (exp_path, actual_path) in expected_2_actual_paths:
        if exp_path and (exp_path != actual_path):
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK

Align BS-Seq reads using Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark_pe:
    input:
        fq_1="reads/{sample}.1.fastq",
        fq_2="reads/{sample}.2.fastq",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}_pe.bam",
        report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        bam_unmapped_1="bams/{sample}_{genome}_unmapped_reads_1.fq.gz",
        bam_unmapped_2="bams/{sample}_{genome}_unmapped_reads_2.fq.gz",
        ambiguous_1="bams/{sample}_{genome}_ambiguous_reads_1.fq.gz",
        ambiguous_2="bams/{sample}_{genome}_ambiguous_reads_2.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log"
    params:
        # optional params string, e.g: -L32 -N0 -X400 --gzip
        # Useful options to tune:
        # (for bowtie2)
        # -N: The maximum number of mismatches permitted in the "seed", i.e. the first L base pairs
        # of the read (deafault: 1)
        # -L: The "seed length" (deafault: 28)
        # -I: The minimum insert size for valid paired-end alignments. ~ min fragment size filter (for
        # PE reads)
        # -X: The maximum insert size for valid paired-end alignments. ~ max fragment size filter (for
        # PE reads)
        # --gzip: Gzip intermediate fastq files
        # --ambiguous --unmapped
        # -p: bowtie2 parallel execution
        # --multicore: bismark parallel execution
        # --temp_dir: tmp dir for intermediate files instead of output directory
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "0.53.0/bio/bismark/bismark"

# Example: Single-ended reads
rule bismark_se:
    input:
        fq="reads/{sample}.fq.gz",
        genome="indexes/{genome}/{genome}.fa",
        bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
        genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
    output:
        bam="bams/{sample}_{genome}.bam",
        report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_stats="bams/{sample}_{genome}.nucleotide_stats.txt",
        bam_unmapped="bams/{sample}_{genome}_unmapped_reads.fq.gz",
        ambiguous="bams/{sample}_{genome}_ambiguous_reads.fq.gz"
    log:
        "logs/bams/{sample}_{genome}.log",
    params:
        # optional params string
        extra=' --ambiguous --unmapped --nucleotide_coverage',
        basename='{sample}_{genome}'
    wrapper:
        "0.53.0/bio/bismark/bismark"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for aligning methylation BS-Seq data using Bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os

from snakemake.shell import shell
from tempfile import TemporaryDirectory


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = os.path.basename(file_path)

    split_ind = 2 if base.endswith(".gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark {extra} --bowtie2"]

outdir = os.path.dirname(snakemake.output.bam)
if outdir:
    cmdline_args.append("--output_dir {outdir}")

genome_indexes_dir = os.path.dirname(snakemake.input.bismark_indexes_dir)
cmdline_args.append("{genome_indexes_dir}")

if not snakemake.output.get("bam", None):
    raise ValueError("bismark/bismark: Error 'bam' output file isn't specified.")
if not snakemake.output.get("report", None):
    raise ValueError("bismark/bismark: Error 'report' output file isn't specified.")

# basename
if snakemake.params.get("basename", None):
    cmdline_args.append("--basename {snakemake.params.basename:q}")
    basename = snakemake.params.basename
else:
    basename = None

# reads input
single_end_mode = snakemake.input.get("fq", None)
if single_end_mode:
    # for SE data, you only have to specify read1 input by -i or --in1, and
    # specify read1 output by -o or --out1.
    cmdline_args.append("--se {snakemake.input.fq:q}")
    mode_prefix = "se"
    if basename is None:
        basename = basename_without_ext(snakemake.input.fq)
else:
    # for PE data, you should also specify read2 input by -I or --in2, and
    # specify read2 output by -O or --out2.
    cmdline_args.append("-1 {snakemake.input.fq_1:q} -2 {snakemake.input.fq_2:q}")
    mode_prefix = "pe"

    if basename is None:
        # default basename
        basename = basename_without_ext(snakemake.input.fq_1) + "_bismark_bt2"

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

# Move outputs into proper position.
expected_2_actual_paths = [
    (
        snakemake.output.bam,
        os.path.join(
            outdir, "{}{}.bam".format(basename, "" if single_end_mode else "_pe")
        ),
    ),
    (
        snakemake.output.report,
        os.path.join(
            outdir,
            "{}_{}_report.txt".format(basename, "SE" if single_end_mode else "PE"),
        ),
    ),
    (
        snakemake.output.get("nucleotide_stats", None),
        os.path.join(
            outdir,
            "{}{}.nucleotide_stats.txt".format(
                basename, "" if single_end_mode else "_pe"
            ),
        ),
    ),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK2BEDGRAPH

Generate bedGraph and coverage files from positional methylation files created by bismark_methylation_extractor (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example for CHG+CHH summary coverage:
rule bismark2bedGraph_noncpg:
    input:
        "meth/CHG_context_{sample}.txt.gz",
        "meth/CHH_context_{sample}.txt.gz"
    output:
        bedGraph="meth_non_cpg/{sample}_non_cpg.bedGraph.gz",
        cov="meth_non_cpg/{sample}_non_cpg.bismark.cov.gz"
    log:
        "logs/meth_non_cpg/{sample}_non_cpg.log"
    params:
        extra="--CX"
    wrapper:
        "0.53.0/bio/bismark/bismark2bedGraph"

# Example for CpG only coverage
rule bismark2bedGraph_cpg:
    input:
        "meth/CpG_context_{sample}.txt.gz"
    output:
        bedGraph="meth_cpg/{sample}_CpG.bedGraph.gz",
        cov="meth_cpg/{sample}_CpG.bismark.cov.gz"
    log:
        "logs/meth_cpg/{sample}_CpG.log"
    wrapper:
        "0.53.0/bio/bismark/bismark2bedGraph"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark bismark2bedGraph tool."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

bedGraph = snakemake.output.get("bedGraph", "")
if not bedGraph:
    raise ValueError("bismark/bismark2bedGraph: Please specify bedGraph output path")

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark2bedGraph {params_extra}"]

dir_name = os.path.dirname(bedGraph)
if dir_name:
    cmdline_args.append("--dir {dir_name}")

fname = os.path.basename(bedGraph)
cmdline_args.append("--output {fname}")

cmdline_args.append("{snakemake.input}")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))
BISMARK2REPORT

Generate graphical HTML report from Bismark reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2report).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# Example: Pair-ended reads
rule bismark2report_pe:
    input:
        alignment_report="bams/{sample}_{genome}_PE_report.txt",
        nucleotide_report="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}_pe.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}_pe.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}_pe.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "0.53.0/bio/bismark/bismark2report"

# Example: Single-ended reads
rule bismark2report_se:
    input:
        alignment_report="bams/{sample}_{genome}_SE_report.txt",
        nucleotide_report="bams/{sample}_{genome}.nucleotide_stats.txt",
        dedup_report="bams/{sample}_{genome}.deduplication_report.txt",
        mbias_report="meth/{sample}_{genome}.deduplicated.M-bias.txt",
        splitting_report="meth/{sample}_{genome}.deduplicated_splitting_report.txt"
    output:
        html="qc/meth/{sample}_{genome}.bismark2report.html",
    log:
        "logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
    params:
        skip_optional_reports=True
    wrapper:
        "0.53.0/bio/bismark/bismark2report"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate graphical HTML report from Bismark reports."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2report

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell


def answer2bool(v):
    return str(v).lower() in ("yes", "true", "t", "1")


extra = snakemake.params.get("extra", "")
cmds = ["bismark2report {extra}"]

# output
html_file = snakemake.output.get("html", "")
output_dir = snakemake.output.get("html_dir", None)
if output_dir is None:
    if html_file:
        output_dir = os.path.dirname(html_file)
else:
    if html_file:
        raise ValueError(
            "bismark/bismark2report: Choose one: 'html=...' for a single dir or 'html_dir=...' for batch processing."
        )

if output_dir is None:
    raise ValueError(
        "bismark/bismark2report: Output file or directory not specified. "
        "Use 'html=...' for a single dir or 'html_dir=...' for batch "
        "processing."
    )

if output_dir:
    cmds.append("--dir {output_dir:q}")

if html_file:
    html_file_name = os.path.basename(html_file)
    cmds.append("--output {html_file_name:q}")

# reports
reports = [
    "alignment_report",
    "dedup_report",
    "splitting_report",
    "mbias_report",
    "nucleotide_report",
]
skip_optional_reports = answer2bool(
    snakemake.params.get("skip_optional_reports", False)
)
for report_name in reports:
    path = snakemake.input.get(report_name, "")
    if path:
        locals()[report_name] = path
        cmds.append("--{0} {{{1}:q}}".format(report_name, report_name))
    elif skip_optional_reports:
        cmds.append("--{0} 'none'".format(report_name))

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK2SUMMARY

Generate summary graphical HTML report from several Bismark text report files reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

import  os

rule bismark2summary:
    input:
        bam=["bams/a_genome_pe.bam", "bams/b_genome.bam"],

        # Bismark `bismark2summary` discovers reports automatically based
        # on files available in bam file containing folder
        #
        # If your per BAM file reports aren't in the same folder
        # you will need an additional task which symlinks all reports
        # (E.g. your splitting report generated by `bismark_methylation_extractor`
        # tool is in `meth` folder, and alignment related reports in `bams` folder)

        # These dependencies are here just to ensure that corresponding rules
        # has already finished at rule execution time, otherwise some reports
        # will be missing.
        dependencies=[
            "bams/a_genome_PE_report.txt",
            "bams/a_genome_pe.deduplication_report.txt",
            # for example splitting report is missing for 'a' sample

            "bams/b_genome_SE_report.txt",
            "bams/b_genome.deduplication_report.txt",
            "bams/b_genome.deduplicated_splitting_report.txt"
        ]
    output:
        html="qc/{experiment}.bismark2summary.html",
        txt="qc/{experiment}.bismark2summary.txt"
    log:
        "logs/qc/{experiment}.bismark2summary.log"
    wrapper:
        "0.53.0/bio/bismark/bismark2summary"

rule bismark2summary_prepare_symlinks:
    input:
        "meth/b_genome.deduplicated_splitting_report.txt",
    output:
        temp("bams/b_genome.deduplicated_splitting_report.txt"),
    log:
        "qc/bismark2summary_prepare_symlinks.symlinks.log"
    run:
        wd = os.getcwd()
        shell("echo 'Making symlinks' > {log}")
        for source, target in zip(input, output):
           target_dir = os.path.dirname(target)
           target_name = os.path.basename(target)
           log_path = os.path.join(wd, log[0])
           abs_src_path = os.path.abspath(source)
           shell("cd {target_dir} && ln -f -s {abs_src_path} {target_name} >> {log_path} 2>&1")

        shell("echo 'Done' >> {log}")

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper to generate summary graphical HTML report from several Bismark text report files."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
cmds = ["bismark2summary {extra}"]

# basename
bam = snakemake.input.get("bam", None)
if not bam:
    raise ValueError(
        "bismark/bismark2summary: Please specify aligned BAM file path"
        " (one or several) using 'bam=..'"
    )

html = snakemake.output.get("html", None)
txt = snakemake.output.get("txt", None)
if not html or not txt:
    raise ValueError(
        "bismark/bismark2summary: Please specify both 'html=..' and"
        " 'txt=..' paths in output section"
    )

basename, ext = os.path.splitext(html)
if ext.lower() != ".html":
    raise ValueError(
        "bismark/bismark2summary: HTML report file should end"
        " with suffix '.html' but was {} ({})".format(ext, html)
    )

suggested_txt = basename + ".txt"
if suggested_txt != txt:
    raise ValueError(
        "bismark/bismark2summary: Expected '{}' TXT report, "
        "but was: '{}'".format(suggested_txt, txt)
    )

cmds.append("--basename {basename:q}")

# title
title = snakemake.params.get("title", None)
if title:
    cmds.append("--title {title:q}")

cmds.append("{bam}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")

# run shell command:
shell(" ".join(cmds))
BISMARK_GENOME_PREPARATION

Generate indexes for Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

# For *.fa file
rule bismark_genome_preparation_fa:
    input:
        "indexes/{genome}/{genome}.fa"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "0.53.0/bio/bismark/bismark_genome_preparation"

# Fo *.fa.gz file:
rule bismark_genome_preparation_fa_gz:
    input:
        "indexes/{genome}/{genome}.fa.gz"
    output:
        directory("indexes/{genome}/Bisulfite_Genome")
    log:
        "logs/indexes/{genome}/Bisulfite_Genome.log"
    params:
        ""  # optional params string
    wrapper:
        "0.53.0/bio/bismark/bismark_genome_preparation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark indexes preparing using bismark_genome_preparation."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


from os import path
from snakemake.shell import shell

input_dir = path.dirname(snakemake.input[0])

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell("bismark_genome_preparation --verbose --bowtie2 {params_extra} {input_dir} {log}")
BISMARK_METHYLATION_EXTRACTOR

Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
  • perl-gdgraph == 1.54
Example

This wrapper can be used in the following way:

rule bismark_methylation_extractor:
    input: "bams/{sample}.bam"
    output:
        mbias_r1="qc/meth/{sample}.M-bias_R1.png",
        # Only for PE BAMS:
        # mbias_r2="qc/meth/{sample}.M-bias_R2.png",

        mbias_report="meth/{sample}.M-bias.txt",
        splitting_report="meth/{sample}_splitting_report.txt",

        # 1-based start, 1-based end ('inclusive') methylation info: % and counts
        methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
        # BedGraph with methylation percentage: 0-based start, end exclusive
        methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",

        # Primary output files: methylation status at each read cytosine position: (extremely large)
        read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
        # * You could merge CHG, CHH using: --merge_non_CpG
        read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
        read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
    log:
        "logs/meth/{sample}.log"
    params:
        output_dir="meth",  # optional output dir
        extra="--gzip --comprehensive --bedGraph"  # optional params string
    wrapper:
        "0.53.0/bio/bismark/bismark_methylation_extractor"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]

# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
    cmdline_args.append("-o {output_dir:q}")

# trimming options
trimming_options = [
    "ignore",  # meth_bias_r1_5end
    "ignore_3prime",  # meth_bias_r1_3end
    "ignore_r2",  # meth_bias_r2_5end
    "ignore_3prime_r2",  # meth_bias_r2_3end
]
for key in trimming_options:
    value = snakemake.params.get(key, None)
    if value:
        cmdline_args.append("--{} {}".format(key, value))

# Input
cmdline_args.append("{snakemake.input}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

key2prefix_suffix = [
    ("mbias_report", ("", ".M-bias.txt")),
    ("mbias_r1", ("", ".M-bias_R1.png")),
    ("mbias_r2", ("", ".M-bias_R2.png")),
    ("splitting_report", ("", "_splitting_report.txt")),
    ("methylome_CpG_cov", ("", ".bismark.cov.gz")),
    ("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
    ("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
    ("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
    ("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]

log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (key, (prefix, suffix)) in key2prefix_suffix:
    exp_path = snakemake.output.get(key, None)
    if exp_path:
        if len(snakemake.input) != 1:
            raise ValueError(
                "bismark/bismark_methylation_extractor: Error: only one BAM file is"
                " expected in input, but was <{}>".format(snakemake.input)
            )
        bam_file = snakemake.input[0]
        bam_name = os.path.basename(bam_file)
        bam_wo_ext = os.path.splitext(bam_name)[0]

        actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
        if exp_path != actual_path:
            shell("mv {actual_path:q} {exp_path:q} {log_append}")
DEDUPLICATE_BISMARK

Deduplicate Bismark Bam Files and saves as *.bam file (see https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark).

Software dependencies
  • bowtie2 == 2.3.4.3
  • bismark == 0.22.1
  • samtools == 1.9
Example

This wrapper can be used in the following way:

rule deduplicate_bismark:
    input: "bams/a_genome_pe.bam"
    output:
        bam="bams/{sample}.deduplicated.bam",
        report="bams/{sample}.deduplication_report.txt",
    log:
        "logs/bams/{sample}.deduplicated.log",
    params:
        extra=""  # optional params string
    wrapper:
        "0.53.0/bio/bismark/deduplicate_bismark"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for Bismark aligned reads deduplication using deduplicate_bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

bam_path = snakemake.output.get("bam", None)
report_path = snakemake.output.get("report", None)
if not bam_path or not report_path:
    raise ValueError(
        "bismark/deduplicate_bismark: Please specify both 'bam=..' and 'report=..' paths in output section"
    )

output_dir = os.path.dirname(bam_path)
if output_dir != os.path.dirname(report_path):
    raise ValueError(
        "bismark/deduplicate_bismark: BAM and Report files expected to have the same parent directory"
        " but was {} and {}".format(bam_path, report_path)
    )

arg_output_dir = "--output_dir '{}'".format(output_dir) if output_dir else ""
arg_multiple = "--multiple" if len(snakemake.input) > 1 else ""

params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
    "deduplicate_bismark {params_extra} --bam {arg_multiple}"
    " {arg_output_dir} {snakemake.input} {log}"
)

# Move outputs into proper position.
fst_input_filename = os.path.basename(snakemake.input[0])
fst_input_basename = os.path.splitext(fst_input_filename)[0]
prefix = os.path.join(output_dir, fst_input_basename)

deduplicated_bam_actual_name = prefix + ".deduplicated.bam"
if arg_multiple:
    # bismark does it exactly like this:
    deduplicated_bam_actual_name = deduplicated_bam_actual_name.replace(
        "deduplicated", "multiple.deduplicated", 1
    )

expected_2_actual_paths = [
    (bam_path, deduplicated_bam_actual_name),
    (
        report_path,
        prefix + (".multiple" if arg_multiple else "") + ".deduplication_report.txt",
    ),
]
for (exp_path, actual_path) in expected_2_actual_paths:
    if exp_path and (exp_path != actual_path):
        shell("mv {actual_path:q} {exp_path:q} {log_append}")

BOWTIE2

For bowtie2, the following wrappers are available:

BOWTIE2

Map reads with bowtie2.

Software dependencies
  • bowtie2 ==2.4.1
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule bowtie2:
    input:
        sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bowtie2/{sample}.log"
    params:
        index="index/genome",  # prefix of reference genome index (built with bowtie2-build)
        extra=""  # optional parameters
    threads: 8  # Use at least two threads
    wrapper:
        "0.53.0/bio/bowtie2/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "-U {}".format(*snakemake.input.sample)
else:
    reads = "-1 {} -2 {}".format(*snakemake.input.sample)

shell(
    "(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
    "-x {snakemake.params.index} {reads} "
    "| samtools view -Sbh -o {snakemake.output[0]} -) {log}"
)

BUSCO

Assess assembly and annotation completeness with BUSCO

Software dependencies
  • python ==3.6
  • busco
Example

This wrapper can be used in the following way:

rule run_busco:
    input:
        "sample_data/target.fa"
    output:
        "txome_busco/full_table_txome_busco.tsv",
    log:
        "logs/quality/transcriptome_busco.log"
    threads: 8
    params:
        mode="transcriptome",
        lineage_path="sample_data/example",
        # optional parameters
        extra=""
    wrapper:
        "0.53.0/bio/busco"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for BUSCO assessment"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"

# busco does not allow you to direct output location: handle this by moving output
outdir = path.dirname(snakemake.output[0])
if "/" in outdir:
    out_name = path.basename(outdir)
else:
    out_name = outdir

# note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell(
    "run_busco --in {snakemake.input} --out {out_name} --force "
    " --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
    " {extra} {log}"
)

busco_outname = "run_" + out_name

# move to intended location
shell("cp -r {busco_outname}/* {outdir}")
shell("rm -rf {busco_outname}")

BWA

For bwa, the following wrappers are available:

BWA ALN

Map reads with bwa aln.

Software dependencies
  • bwa ==0.7.17
Example

This wrapper can be used in the following way:

rule bwa_aln:
    input:
        "reads/{sample}.{pair}.fastq"
    output:
        "sai/{sample}.{pair}.sai"
    params:
        index="genome",
        extra=""
    log:
        "logs/bwa_aln/{sample}.{pair}.log"
    threads: 8
    wrapper:
        "0.53.0/bio/bwa/aln"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa aln."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "bwa aln"
    " {extra}"
    " -t {snakemake.threads}"
    " {snakemake.params.index}"
    " {snakemake.input[0]}"
    " > {snakemake.output[0]} {log}"
)
BWA INDEX

Creates a BWA index.

Software dependencies
  • bwa ==0.7.17
Example

This wrapper can be used in the following way:

rule bwa_index:
    input:
        "{genome}.fasta"
    output:
        "{genome}.amb",
        "{genome}.ann",
        "{genome}.bwt",
        "{genome}.pac",
        "{genome}.sa"
    log:
        "logs/bwa_index/{genome}.log"
    params:
        prefix="{genome}",
        algorithm="bwtsw"
    wrapper:
        "0.53.0/bio/bwa/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if len(snakemake.input) == 0:
    raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
    raise ValueError("Only one reference genome can be inputed!")

# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")

if len(prefix) > 0:
    prefix = "-p " + prefix

# Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")

if len(construction_algorithm) != 0:
    construction_algorithm = "-a " + construction_algorithm

shell(
    "bwa index" " {prefix}" " {construction_algorithm}" " {snakemake.input[0]}" " {log}"
)
BWA MEM

Map reads using bwa mem, with optional sorting using samtools or picard.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        "mapped/{sample}.bam"
    log:
        "logs/bwa_mem/{sample}.log"
    params:
        index="genome",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort="none",             # Can be 'none', 'samtools' or 'picard'.
        sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
        sort_extra=""            # Extra args for samtools/picard.
    threads: 8
    wrapper:
        "0.53.0/bio/bwa/mem"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Julian de Ruiter
Code
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

if sort_order not in {"coordinate", "queryname"}:
    raise ValueError("Unexpected value for sort_order ({})".format(sort_order))

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER

Map reads using bwa mem, mark duplicates by samblaster and sort and index by sambamba.

Software dependencies
  • bwa ==0.7.17
  • sambamba ==0.7.1
  • samblaster ==0.1.24
Example

This wrapper can be used in the following way:

rule bwa_mem:
    input:
        reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        bam="mapped/{sample}.bam",
        index="mapped/{sample}.bam.bai"
    log:
        "logs/bwa_mem_sambamba/{sample}.log"
    params:
        index="genome",
        extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
        sort_extra="" # Extra args for sambamba.
    threads: 8
    wrapper:
        "0.53.0/bio/bwa/mem-samblaster"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Schröder
Code
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
    1,
    2,
}:
    raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")

shell(
    "(bwa mem"
    " -t {snakemake.threads}"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.reads}"
    " | samblaster"
    " {samblaster_extra}"
    " | sambamba view -S -f bam /dev/stdin"
    " -t {snakemake.threads}"
    " | sambamba sort /dev/stdin"
    " -t {snakemake.threads}"
    " -o {snakemake.output.bam}"
    " {sort_extra}"
    ") {log}"
)
BWA SAMPE

Map paired-end reads with bwa sampe.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_sampe:
    input:
        fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
        sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_sampe/{sample}.log"
    wrapper:
        "0.53.0/bio/bwa/sampe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Check inputs.
if not len(snakemake.input.sai) == 2:
    raise ValueError("input.sai must have 2 elements")

if not len(snakemake.input.fastq) == 2:
    raise ValueError("input.fastq must have 2 elements")

# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa sampe"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)
BWA SAMSE

Map single-end reads with bwa samse.

Software dependencies
  • bwa ==0.7.17
  • samtools ==1.9
  • picard ==2.20.1
Example

This wrapper can be used in the following way:

rule bwa_samse:
    input:
        fastq="reads/{sample}.1.fastq",
        sai="sai/{sample}.1.sai"
    output:
        "mapped/{sample}.bam"
    params:
        index="genome",
        extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
        sort="none",                                 # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
        sort_order="queryname",                      # optional: Sort by 'queryname' or 'coordinate'
        sort_extra=""                                # optional: extra arguments for samtools/picard
    log:
        "logs/bwa_samse/{sample}.log"
    wrapper:
        "0.53.0/bio/bwa/samse"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for bwa sampe."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
extra = snakemake.params.get("extra", "")

sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":

    # Simply convert to bam using samtools view.
    pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"

elif sort == "samtools":

    # Sort alignments using samtools sort.
    pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"

    # Add name flag if needed.
    if sort_order == "queryname":
        sort_extra += " -n"

    # Use prefix for temp.
    prefix = path.splitext(snakemake.output[0])[0]
    sort_extra += " -T " + prefix + ".tmp"

elif sort == "picard":

    # Sort alignments using picard SortSam.
    pipe_cmd = (
        "picard SortSam {sort_extra} INPUT=/dev/stdin"
        " OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
    )

else:
    raise ValueError("Unexpected value for params.sort ({})".format(sort))

# Run command.
shell(
    "(bwa samse"
    " {extra}"
    " {snakemake.params.index}"
    " {snakemake.input.sai}"
    " {snakemake.input.fastq}"
    " | " + pipe_cmd + ") {log}"
)

CAIROSVG

Convert SVG files with cairosvg.

Software dependencies
  • cairosvg =2.4.2
Example

This wrapper can be used in the following way:

rule:
    input:
        "{prefix}.svg"
    output:
        "{prefix}.{fmt,(pdf|png)}"
    wrapper:
        "0.53.0/utils/cairosvg"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

_, ext = os.path.splitext(snakemake.output[0])

if ext not in (".png", ".pdf", ".ps", ".svg"):
    raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]

shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")

CLUSTALO

Multiple alignment of nucleic acid and protein sequences.

Software dependencies
  • clustalo ==1.2.4
Example

This wrapper can be used in the following way:

rule clustalo:
    input:
        "{sample}.fa"
    output:
        "{sample}.msa.fa"
    params:
        extra=""
    log:
        "logs/clustalo/test/{sample}.log"
    threads: 8
    wrapper:
        "0.53.0/bio/clustalo"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for clustal omega."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "clustalo {extra}"
    " --threads={snakemake.threads}"
    " --in {snakemake.input[0]}"
    " --out {snakemake.output[0]} "
    " {log}"
)

CUTADAPT

For cutadapt, the following wrappers are available:

CUTADAPT-PE

Trim paired-end reads using cutadapt.

Software dependencies
  • cutadapt ==2.5
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
    output:
        fastq1="trimmed/{sample}.1.fastq",
        fastq2="trimmed/{sample}.2.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        # https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
        adapters = "-a AGAGCACACGTCTGAACTCCAGTCAC -g AGATCGGAAGAGCACACGT -A AGAGCACACGTCTGAACTCCAGTCAC -G AGATCGGAAGAGCACACGT",
        # https://cutadapt.readthedocs.io/en/stable/guide.html#
        others = "--minimum-length 1 -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "0.53.0/bio/cutadapt/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
  • David Laehnemann
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params.adapters}"
    " {snakemake.params.others}"
    " -o {snakemake.output.fastq1}"
    " -p {snakemake.output.fastq2}"
    " -j {snakemake.threads}"
    " {snakemake.input}"
    " > {snakemake.output.qc} {log}"
)
CUTADAPT-SE

Trim single-end reads using cutadapt.

Software dependencies
  • cutadapt ==2.5
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule cutadapt:
    input:
        "reads/{sample}.fastq"
    output:
        fastq="trimmed/{sample}.fastq",
        qc="trimmed/{sample}.qc.txt"
    params:
        "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
    log:
        "logs/cutadapt/{sample}.log"
    threads: 4 # set desired number of threads here
    wrapper:
        "0.53.0/bio/cutadapt/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cutadapt"
    " {snakemake.params}"
    " -j {snakemake.threads}"
    " -o {snakemake.output.fastq}"
    " {snakemake.input[0]}"
    " > {snakemake.output.qc} {log}"
)

DELLY

Call variants with delly.

Software dependencies
  • delly ==0.8.1
Example

This wrapper can be used in the following way:

rule delly:
    input:
        ref="genome.fasta",
        samples=["mapped/a.bam"],
        # optional exclude template (see https://github.com/dellytools/delly)
        exclude="human.hg19.excl.tsv"
    output:
        "sv/calls.bcf"
    params:
        extra=""  # optional parameters for delly (except -g, -x)
    log:
        "logs/delly.log"
    threads: 2  # It is best to use as many threads as samples
    wrapper:
        "0.53.0/bio/delly"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


try:
    exclude = "-x " + snakemake.input.exclude
except AttributeError:
    exclude = ""


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
    "{exclude} -g {snakemake.input.ref} "
    "-o {snakemake.output[0]} {snakemake.input.samples} {log}"
)

EPIC

For epic, the following wrappers are available:

EPIC

Find broad enriched domains in ChIP-Seq data with epic

Software dependencies
  • epic =0.2.7
  • pandas =0.22.0
Example

This wrapper can be used in the following way:

rule epic:
    input:
      treatment = "bed/test.bed",
      background = "bed/control.bed"
    output:
      enriched_regions = "epic/enriched_regions.csv", # required
      bed = "epic/enriched_regions.bed", # optional
      matrix = "epic/matrix.gz" # optional
    log:
        "logs/epic/epic.log"
    params:
      genome = "hg19", # optional, default hg19
      extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
    threads: 1 # optional, defaults to 1
    wrapper:
        "0.53.0/bio/epic/peaks"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • All/any of the different bigwig options must be given as extra parameters
Authors
  • Endre Bakken Stovner
Code
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1

treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")

# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")

bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")

if len(snakemake.log) > 0:
    log = snakemake.log[0]

genome = snakemake.params.get("genome")

cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"

if bed:
    cmd += " -b {bed}"
if matrix:
    cmd += " -sm {matrix}"
if log:
    cmd += " -l {log}"

cmd += " {extra}"

shell(cmd)

FASTP

trim and QC fastq reads with fastp

Software dependencies
  • fastp ==0.20.0
Example

This wrapper can be used in the following way:

rule fastp_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        trimmed="trimmed/se/{sample}.fastq",
        html="report/se/{sample}.html",
        json="report/se/{sample}.json"
    log:
        "logs/fastp/se/{sample}.log"
    params:
        extra=""
    threads: 1
    wrapper:
        "0.53.0/bio/fastp"


rule fastp_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
        html="report/pe/{sample}.html",
        json="report/pe/{sample}.json"
    log:
        "logs/fastp/pe/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "0.53.0/bio/fastp"

rule fastp_pe_wo_trimming:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        html="report/pe_wo_trimming/{sample}.html",
        json="report/pe_wo_trimming/{sample}.json"
    log:
        "logs/fastp/pe_wo_trimming/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "0.53.0/bio/fastp"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."

if n == 1:
    reads = "--in1 {}".format(snakemake.input.sample)
else:
    reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)

trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths is not None:
    if n == 1:
        trimmed = "--out1 {}".format(snakemake.output.trimmed)
    else:
        trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)
else:
    trimmed = ""

html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)

shell(
    "(fastp --thread {snakemake.threads} {snakemake.params.extra} "
    "{reads} "
    "{trimmed} "
    "{json} "
    "{html} ) {log}"
)

FASTQ_SCREEN

fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:

DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE2
DATABASE      ecoli   /data/Escherichia_coli/Bowtie2Index/genome      BOWTIE
DATABASE      hg19    /data/hg19/Bowtie2Index/genome  BOWTIE2
DATABASE      mm10    /data/mm10/Bowtie2Index/genome  BOWTIE2
BOWTIE        /path/to/bowtie
BOWTIE2       /path/to/bowtie2

becomes:

fastq_screen_config = {
 'database': {
   'ecoli': {
     'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
     'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
   'hg19': {
     'bowtie2': '/data/hg19/Bowtie2Index/genome'},
   'mm10': {
     'bowtie2': '/data/mm10/Bowtie2Index/genome'}
 },
 'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}

By default, the wrapper will use bowtie2 as the aligner and a subset of 100000 reads. These can be overridden using params.aligner and params.subset respectively. Furthermore, params.extra can be used to pass additional arguments verbatim to fastq_screen, for example extra="--illumina1_3" or extra="--bowtie2 '--trim5=8'".

Software dependencies
  • fastq-screen ==0.5.2
  • bowtie2 ==2.2.6
  • bowtie ==1.1.2
Example

This wrapper can be used in the following way:

rule fastq_screen:
    input:
        "samples/{sample}.fastq"
    output:
        txt="qc/{sample}.fastq_screen.txt",
        png="qc/{sample}.fastq_screen.png"
    params:
        fastq_screen_config="fastq_screen.conf",
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "0.53.0/bio/fastq_screen"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • fastq_screen hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.
  • While the dictionary form of fastq_screen_config is convenient, the unordered nature of the dictionary may cause snakemake --list-params-changed to incorrectly report changed parameters even though the contents remain the same. If you plan on using --list-params-changed then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6.
  • When providing the dictionary form of fastq_screen_config, the wrapper will write a temp file using Python’s tempfile module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something like shell.prefix("export TMPDIR=/scratch; ") to the snakefile calling this wrapper.
Authors
  • Ryan Dale
Code
import os
import re
from snakemake.shell import shell
import tempfile

__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"

_config = snakemake.params["fastq_screen_config"]

subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
    tmp = tempfile.NamedTemporaryFile(delete=False).name
    with open(tmp, "w") as fout:
        for label, indexes in _config["database"].items():
            for aligner, index in indexes.items():
                fout.write(
                    "\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
                )
        for aligner, path in _config["aligner_paths"].items():
            fout.write("\t".join([aligner.upper(), path]) + "\n")
    config_file = tmp
else:
    config_file = _config

# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]

tempdir = tempfile.mkdtemp()

shell(
    "fastq_screen --outdir {tempdir} "
    "--force "
    "--aligner {aligner} "
    "--conf {config_file} "
    "--subset {subset} "
    "--threads {snakemake.threads} "
    "{extra} "
    "{snakemake.input[0]} "
    "{log}"
)

# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")

# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
    shell("rm {tmp}")

FASTQC

Generate fastq qc statistics using fastqc.

Software dependencies
  • fastqc ==0.11.8
Example

This wrapper can be used in the following way:

rule fastqc:
    input:
        "reads/{sample}.fastq"
    output:
        html="qc/fastqc/{sample}.html",
        zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
    params: ""
    log:
        "logs/fastqc/{sample}.log"
    wrapper:
        "0.53.0/bio/fastqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for fastqc."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)

    split_ind = 2 if base.endswith(".fastq.gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} --quiet "
        "--outdir {tempdir} {snakemake.input[0]}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path} {snakemake.output.html}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path} {snakemake.output.zip}")

FGBIO

For fgbio, the following wrappers are available:

FGBIO ANNOTATEBAMWITHUMIS

Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule AnnotateBam:
    input:
        bam="mapped/{sample}.bam",
        umi="umi/{sample}.fastq"
    output:
        "mapped/{sample}.annotated.bam"
    params: ""
    log:
        "logs/fgbio/annotate_bam/{sample}.log"
    wrapper:
        "0.53.0/bio/fgbio/annotatebamwithumis"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input.bam

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
    raise ValueError("Input bam should be a string: " + str(bam_input) + "!")

umi_input = snakemake.input.umi

if umi_input is None:
    raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
    raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")

if not len(snakemake.output) == 1:
    raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]


if output_file is None:
    raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
    raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")

shell(
    "fgbio AnnotateBamWithUmis"
    " -i {bam_input}"
    " -f {umi_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO CALLMOLECULARCONSENSUSREADS

Calls consensus sequences from reads with the same unique molecular tag.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule ConsensusReads:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.m3.bam"
    params:
        extra="-M 3"
    log:
        "logs/fgbio/consensus_reads/{sample}.log"
    wrapper:
        "0.53.0/bio/fgbio/callmolecularconsensusreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio CallMolecularConsensusReads"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)
FGBIO COLLECTDUPLEXSEQMETRICS

Collects a suite of metrics to QC duplex sequencing data.g.

Software dependencies
  • fgbio ==0.6.1
  • r-ggplot2
Example

This wrapper can be used in the following way:

rule CollectDuplexSeqMetrics:
    input:
        "mapped/{sample}.gu.bam"
    output:
        family_sizes="stats/{sample}.family_sizes.txt",
        duplex_family_sizes="stats/{sample}.duplex_family_sizes.txt",
        duplex_yield_metrics="stats/{sample}.duplex_yield_metrics.txt",
        umi_counts="stats/{sample}.umi_counts.txt",
        duplex_qc="stats/{sample}.duplex_qc.pdf",
        duplex_umi_counts="stats/{sample}.duplex_umi_counts.txt",
    params:
        extra=lambda wildcards: "-d " + wildcards.sample
    log:
        "logs/fgbio/collectduplexseqmetrics/{sample}.log"
    wrapper:
        "0.53.0/bio/fgbio/collectduplexseqmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from os import path

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

family_sizes = snakemake.output.family_sizes
duplex_family_sizes = snakemake.output.duplex_family_sizes
duplex_yield_metrics = snakemake.output.duplex_yield_metrics
umi_counts = snakemake.output.umi_counts
duplex_qc = snakemake.output.duplex_qc
duplex_umi_counts = snakemake.output.get("duplex_umi_counts", None)

file_path = str(path.dirname(family_sizes))
name = str(path.basename(family_sizes)).split(".")[0]
path_name_prefix = str(path.join(file_path, name))

if not family_sizes == path_name_prefix + ".family_sizes.txt":
    raise Exception(
        "Unexpected family_sizes path/name format, expected {}, got {}.".format(
            path_name_prefix + ".family_sizes.txt", family_sizes
        )
    )
if not duplex_family_sizes == path_name_prefix + ".duplex_family_sizes.txt":
    raise Exception(
        "Unexpected duplex_family_sizes path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_family_sizes.txt", duplex_family_sizes
        )
    )
if not duplex_yield_metrics == path_name_prefix + ".duplex_yield_metrics.txt":
    raise Exception(
        "Unexpected duplex_yield_metrics path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_yield_metrics.txt", duplex_yield_metrics
        )
    )
if not umi_counts == path_name_prefix + ".umi_counts.txt":
    raise Exception(
        "Unexpected umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".umi_counts.txt", umi_counts
        )
    )
if not duplex_qc == path_name_prefix + ".duplex_qc.pdf":
    raise Exception(
        "Unexpected duplex_qc path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_qc.pdf", duplex_qc
        )
    )
if (
    duplex_umi_counts is not None
    and not duplex_umi_counts == path_name_prefix + ".duplex_umi_counts.txt"
):
    raise Exception(
        "Unexpected duplex_umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
            path_name_prefix + ".duplex_umi_counts.txt", duplex_umi_counts
        )
    )

duplex_umi_counts_flag = ""
if duplex_umi_counts is not None:
    duplex_umi_counts_flag = "-u "

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

shell(
    "fgbio CollectDuplexSeqMetrics"
    " -i {bam_input}"
    " -o {path_name_prefix}"
    " {duplex_umi_counts_flag}"
    " {extra_params}"
    " {log}"
)
FGBIO FILTERCONSENSUSREADS

Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule FilterConsensusReads:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.filtered.bam"
    params:
        extra="",
        min_base_quality=2,
        min_reads=[2, 2, 2],
        ref="genome.fasta"
    log:
        "logs/fgbio/filterconsensusreads/{sample}.log"
    threads: 1
    wrapper:
        "0.53.0/bio/fgbio/filterconsensusreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
  • min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
  • For more inforamtion see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
    raise ValueError("min_base_quality needs to be provided as an Int!")

min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
    raise ValueError(
        "min_reads needs to be provided as list of Ints, min length 1, max length 3!"
    )

ref = snakemake.params.get("ref", None)
if ref is None:
    raise ValueError("A reference needs to be provided!")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

bam_output = snakemake.output[0]

if not isinstance(bam_output, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(bam_output) + "!")

shell(
    "fgbio FilterConsensusReads"
    " -i {bam_input}"
    " -o {bam_output}"
    " -r {ref}"
    " --min-reads {min_reads}"
    " --min-base-quality {min_base_quality}"
    " {extra_params}"
    " {log}"
)
FGBIO GROUPREADSBYUMI

Groups reads together that appear to have come from the same original molecule.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule GroupReads:
    input:
        "mapped/a.bam"
    output:
        bam="mapped/{sample}.gu.bam",
        hist="mapped/{sample}.gu.histo.tsv",
    params:
        extra="-s adjacency --edits 1"
    log:
        "logs/fgbio/group_reads/{sample}.log"
    wrapper:
        "0.53.0/bio/fgbio/groupreadsbyumi"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_bam_file = snakemake.output.bam

if not isinstance(output_bam_file, str) and len(output_bam_file) != 1:
    raise ValueError("Bam output should be one bam file: " + str(output_bam_file) + "!")

output_histo_file = snakemake.output.hist

if not isinstance(output_histo_file, str) and len(output_histo_file) != 1:
    raise ValueError(
        "Histo output should be one histogram file path: "
        + str(output_histo_file)
        + "!"
    )

shell(
    "fgbio GroupReadsByUmi"
    " -i {bam_input}"
    " -o {output_bam_file}"
    " -f {output_histo_file}"
    " {extra_params}"
    " {log}"
)
FGBIO SETMATEINFORMATION

Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.

Software dependencies
  • fgbio ==0.6.1
Example

This wrapper can be used in the following way:

rule SetMateInfo:
    input:
        "mapped/a.bam"
    output:
        "mapped/{sample}.mi.bam"
    params: ""
    log:
        "logs/fgbio/set_mate_info/{sample}.log"
    wrapper:
        "0.53.0/bio/fgbio/setmateinformation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra_params = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if not isinstance(bam_input, str) and len(snakemake.input) != 1:
    raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")

output_file = snakemake.output[0]

if not isinstance(output_file, str) and len(snakemake.output) != 1:
    raise ValueError("Output should be one bam file: " + str(output_file) + "!")

shell(
    "fgbio SetMateInformation"
    " -i {bam_input}"
    " -o {output_file}"
    " {extra_params}"
    " {log}"
)

FILTLONG

Quality filtering tool for long reads.

Software dependencies
  • filtlong=0.2.0=he941832_2
Example

This wrapper can be used in the following way:

rule filtlong:
    input:
        reads = "{sample}.fastq"
    output:
        "{sample}.filtered.fastq"
    params:
        extra=" --mean_q_weight 5.0",
        target_bases = 10
    log:
        "logs/filtlong/test/{sample}.log"
    wrapper:
        "0.53.0/bio/filtlong"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for filtlong."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
target_bases = int(snakemake.params.get("target_bases", 0))
if target_bases > 0:
    extra += " --target_bases {}".format(target_bases)

# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# Executed shell command
shell("filtlong {extra}" " {snakemake.input.reads} > {snakemake.output} {log}")

FREEBAYES

Call small genomic variants with freebayes.

Software dependencies
  • freebayes ==1.3.1
  • bcftools ==1.10
  • parallel ==20190522
  • bedtools >=2.29
  • sed ==4.7
Example

This wrapper can be used in the following way:

rule freebayes:
    input:
        ref="genome.fasta",
        # you can have a list of samples here
        samples="mapped/{sample}.bam"
        # optional BED file specifying chromosomal regions on which freebayes
        # should run, e.g. all regions that show coverage
        #regions="/path/to/region-file.bed"
    output:
        "calls/{sample}.vcf"  # either .vcf or .bcf
    log:
        "logs/freebayes/{sample}.log"
    params:
        extra="",         # optional parameters
        chunksize=100000  # reference genome chunk size for parallelization (default: 100000)
    threads: 2
    wrapper:
        "0.53.0/bio/freebayes"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Felix Mölder
Code
__author__ = "Johannes Köster, Felix Mölder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = snakemake.params.get("extra", "")

pipe = ""
if snakemake.output[0].endswith(".bcf"):
    pipe = "| bcftools view -Ob -"

if snakemake.threads == 1:
    freebayes = "freebayes"
else:
    chunksize = snakemake.params.get("chunksize", 100000)
    regions = "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
        snakemake=snakemake, chunksize=chunksize
    )
    if snakemake.input.get("regions", ""):
        regions = (
            "<(bedtools intersect -a "
            r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
            "{regions}) -b {snakemake.input.regions} | "
            r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
        ).format(regions=regions, snakemake=snakemake)
    freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
        snakemake=snakemake, regions=regions
    )

shell(
    "({freebayes} {params} -f {snakemake.input.ref}"
    " {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}"
)

GATK

For gatk, the following wrappers are available:

GATK BASERECALIBRATOR

Run gatk BaseRecalibrator and ApplyBQSR in one step.

Software dependencies
  • gatk4 ==4.1.4.1
  • openjdk =8
Example

This wrapper can be used in the following way:

rule gatk_bqsr:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz"  # optional known sites
    output:
        bam="recal/{sample}.bam"
    log:
        "logs/gatk/bqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from tempfile import TemporaryDirectory
import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

with TemporaryDirectory() as tmpdir:
    recal_table = os.path.join(tmpdir, "recal_table.grp")
    log = snakemake.log_fmt_shell(stdout=True, stderr=True)
    known = snakemake.input.get("known", "")
    if known:
        known = "--known-sites {}".format(known)

    shell(
        "gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
        "-R {snakemake.input.ref} -I {snakemake.input.bam} "
        "-O {recal_table} {known} {log}"
    )

    log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
    shell(
        "gatk --java-options '{java_opts}' ApplyBQSR -R {snakemake.input.ref} -I {snakemake.input.bam} "
        "--bqsr-recal-file {recal_table} "
        "-O {snakemake.output.bam} {log}"
    )
GATK COMBINEGVCFS

Run gatk CombineGVCFs.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
        ref="genome.fasta"
    output:
        gvcf="calls/all.g.vcf",
    log:
        "logs/gatk/combinegvcfs.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    wrapper:
        "0.53.0/bio/gatk/combinegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' CombineGVCFs {extra} "
    "{gvcfs} "
    "-R {snakemake.input.ref} "
    "-O {snakemake.output.gvcf} {log}"
)
GATK GENOTYPEGVCFS

Run gatk GenotypeGVCFs.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule genotype_gvcfs:
    input:
        gvcf="calls/all.g.vcf",  # combined gvcf over multiple samples
        ref="genome.fasta"
    output:
        vcf="calls/all.vcf",
    log:
        "logs/gatk/genotypegvcfs.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/genotypegvcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
    "-V {snakemake.input.gvcf} "
    "-R {snakemake.input.ref} "
    "-O {snakemake.output.vcf} {log}"
)
GATK HAPLOTYPECALLER

Run gatk HaplotypeCaller.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule haplotype_caller:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
        # known="dbsnp.vcf"  # optional
    output:
        gvcf="calls/{sample}.g.vcf",
    log:
        "logs/gatk/haplotypecaller/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell

known = snakemake.input.get("known", "")
if known:
    known = "--dbsnp " + known

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
bams = snakemake.input.bam
if isinstance(bams, str):
    bams = [bams]
bams = list(map("-I {}".format, bams))

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
    "-R {snakemake.input.ref} {bams} "
    "-ERC GVCF "
    "-O {snakemake.output.gvcf} {known} {log}"
)
GATK MUTECT2

Call somatic SNVs and indels via local assembly of haplotypes

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule mutect2:
    input:
        fasta = "genome/genome.fasta",
        map = "mapped/{sample}.bam"
    output:
        vcf = "variant/{sample}.vcf"
    message:
        "Testing Mutect2 with {wildcards.sample}"
    threads:
        1
    log:
        "logs/mutect_{sample}.log"
    wrapper:
         "0.53.0/bio/gatk/mutect"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for GATK4 Mutect2"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "gatk Mutect2 "  # Tool and its subprocess
    "--input {snakemake.input.map} "  # Path to input mapping file
    "--output {snakemake.output.vcf} "  # Path to output vcf file
    "--reference {snakemake.input.fasta} "  # Path to reference fasta file
    "{extra} "  # Extra parameters
    "{log}"  # Logging behaviour
)
GATK SELECTVARIANTS

Run gatk SelectVariants.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule gatk_select:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.vcf"
    log:
        "logs/gatk/select/snvs.log"
    params:
        extra="--select-type-to-include SNP",  # optional filter arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/selectvariants"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} -O {snakemake.output.vcf} {log}"
)
GATK SPLITNCIGARREADS

Run gatk SplitNCigarReads.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule splitncigarreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta"
    output:
        "split/{sample}.bam"
    log:
        "logs/gatk/splitNCIGARreads/{sample}.log"
    params:
        extra="",  # optional
        java_opts="",  # optional
    wrapper:
        "0.53.0/bio/gatk/splitncigarreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' SplitNCigarReads {extra} "
    " -R {snakemake.input.ref} -I {snakemake.input.bam} "
    "-O {snakemake.output} {log}"
)
GATK VARIANTFILTRATION

Run gatk VariantFiltration.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

rule gatk_filter:
    input:
        vcf="calls/snvs.vcf",
        ref="genome.fasta",
    output:
        vcf="calls/snvs.filtered.vcf"
    log:
        "logs/gatk/filter/snvs.log"
    params:
        filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
        extra="",  # optional arguments, see GATK docs
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/variantfiltration"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
filters = [
    "--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
    for name, expr in snakemake.params.filters.items()
]

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "{extra} {filters} -O {snakemake.output.vcf} {log}"
)
GATK VARIANTRECALIBRATOR

Run gatk VariantRecalibrator.

Software dependencies
  • gatk4 ==4.1.4.1
Example

This wrapper can be used in the following way:

from snakemake.remote import GS

# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()


def gatk_bundle(f):
    return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))


rule haplotype_caller:
    input:
        vcf="calls/all.vcf",
        ref="genome.fasta",
        # resources have to be given as named input files
        hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
        omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
        g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
        dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
        # use aux to e.g. download other necessary file
        aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
             gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
             gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
    output:
        vcf="calls/all.recal.vcf",
        tranches="calls/all.tranches"
    log:
        "logs/gatk/variantrecalibrator.log"
    params:
        mode="SNP",  # set mode, must be either SNP, INDEL or BOTH
        # resource parameter definition. Key must match named input files from above.
        resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
                   "omni":   {"known": False, "training": True, "truth": False, "prior": 12.0},
                   "g1k":   {"known": False, "training": True, "truth": False, "prior": 10.0},
                   "dbsnp":  {"known": True, "training": False, "truth": False, "prior": 2.0}},
        annotation=["QD", "FisherStrand"],  # which fields to use with -an (see VariantRecalibrator docs)
        extra="",  # optional
        java_opts="", # optional
    wrapper:
        "0.53.0/bio/gatk/haplotypecaller"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
Authors
  • Johannes Köster
  • Jake VanCampen
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


import os

from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")


def fmt_res(resname, resparams):
    fmt_bool = lambda b: str(b).lower()
    try:
        f = snakemake.input.get(resname)
    except KeyError:
        raise RuntimeError(
            "There must be a named input file for every resource (missing: {})".format(
                resname
            )
        )
    return "{},known={},training={},truth={},prior={}:{}".format(
        resname,
        fmt_bool(resparams["known"]),
        fmt_bool(resparams["training"]),
        fmt_bool(resparams["truth"]),
        resparams["prior"],
        f,
    )


resources = [
    "--resource {}".format(fmt_res(resname, resparams))
    for resname, resparams in snakemake.params["resources"].items()
]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
    tranches = "--tranches-file " + snakemake.output.tranches

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "gatk --java-options '{java_opts}' VariantRecalibrator {extra} {resources} "
    "-R {snakemake.input.ref} -V {snakemake.input.vcf} "
    "-mode {snakemake.params.mode} "
    "--output {snakemake.output.vcf} "
    "{tranches} {annotation} {log}"
)

GATK3

For gatk3, the following wrappers are available:

GATK3 BASERECALIBRATOR

Run gatk3 BaseRecalibrator.

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule baserecalibrator:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        "{sample}.recal_data_table"
    log:
        "logs/gatk3/bqsr/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    threads: 16
    wrapper:
        "bio/gatk/baserecalibrator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + "  --knownSites {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T BaseRecalibrator"
    " -nct {snakemake.threads}"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 INDELREALIGNER

Run gatk3 IndelRealigner

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule indelrealigner:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        known="dbsnp.vcf.gz",
        target_intervals="{sample}.intervals"
    output:
        bam="realigned/{sample}.bam"
    log:
        "logs/gatk3/indelrealigner/{sample}.log"
    params:
        extra="",  # optional
        java_opts="", # optional
    threads: 16
    wrapper:
        "bio/gatk/indelrealigner"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
input_target_intervals = snakemake.input.target_intervals

bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + " -known {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T IndelRealigner"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " --targetIntervals {input_target_intervals}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 PRINTREADS

Run gatk3 PrintReads

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule printreads:
    input:
        bam="mapped/{sample}.bam",
        ref="genome.fasta",
        recal_data="{sample}.recal_data_table"
    output:
        "alignment/{sample}.bqsr.bam"
    log:
        "logs/gatk/bqsr/{sample}..log"
    params:
        extra="",  # optional
        java_opts="",
    threads: 16
    wrapper:
        "bio/gatk3/printreads"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_recal_data = snakemake.input.recal_data
input_ref = snakemake.input.ref

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "gatk3 {java_opts} -T PrintReads"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " -BQSR {input_recal_data}"
    " -o {snakemake.output}"
    " {log}"
)
GATK3 REALIGNERTARGETCREATOR

Run gatk3 RealignerTargetCreator

Software dependencies
  • gatk ==3.8
Example

This wrapper can be used in the following way:

rule realignertargetcreator:
    input:
        bam="mapped/{sample}.bam"
        ref="genome.fasta",
        known="dbsnp.vcf.gz"
    output:
        "{sample}.intervals"
    log:
        "logs/gatk/realignertargetcreator/{sample}.log"
    params:
        extra="",  # optional
        java_opts="",
    threads: 16
    wrapper:
        "bio/gatk3/realignertargetcreator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
  • The extra param alllows for additional program arguments.
  • For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
  • Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"

import os

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")

input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
    bed = "-L " + bed
else:
    bed = ""

input_known_string = ""
for known in input_known:
    input_known_string = input_known_string + " --known {}".format(known)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


shell(
    "gatk3 {java_opts} -T RealignerTargetCreator"
    " -nt {snakemake.threads}"
    " {extra}"
    " -I {input_bam}"
    " -R {input_ref}"
    " {input_known_string}"
    " {bed}"
    " -o {snakemake.output}"
    " {log}"
)

HAP.PY

For hap.py, the following wrappers are available:

PRE.PY

Preprocessing/normalisation of vcf/bcf files. Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).

Software dependencies
  • hap.py =0.3.10
Example

This wrapper can be used in the following way:

rule preprocess_variants:
    input:
        ##vcf/bcf
        variants="variants.vcf"
    output:
        "normalized/variants.vcf"
    params:
        ## path to reference genome
        genome="genome.fasta",
        ## parameters such as -L to left-align variants
        extra="-L"
    threads: 2
    wrapper:
        "0.53.0/bio/hap.py/pre.py"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

## Extract arguments
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "(pre.py"
    " --threads {snakemake.threads}"
    " -r {snakemake.params.genome}"
    " {extra}"
    " {snakemake.input.variants}"
    " {snakemake.output})"
    " {log}"
)

HISAT2

For hisat2, the following wrappers are available:

HISAT2 ALIGN

Map reads with hisat2.

Software dependencies
  • hisat2 ==2.1.0
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule hisat2_align:
    input:
      idx = "index/",
      reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"]
    output:
      "mapped/{sample}.bam"
    log:
        "logs/hisat2_align_{sample}.log"
    params:
      extra = ""
    threads: 2
    wrapper:
      "0.53.0/bio/hisat2/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • The -S flag must not be used since output is already directly piped to samtools for compression.
  • The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
  • The wrapper does not yet handle SRA input accessions.
  • No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"


from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()

# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
    input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
    input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
    input_flags = "-1 {0} -2 {1}".format(*reads)
else:
    raise RuntimeError(
        "Reads parameter must contain at least 1 and at most 2" " input files."
    )

# Executed shell command
shell(
    "(hisat2 {extra} "
    "--threads {snakemake.threads} "
    " -x {snakemake.input.idx} {input_flags} "
    " | samtools view -Sbh -o {snakemake.output[0]} -) "
    " {log}"
)
HISAT2 INDEX

Create index with hisat2.

Software dependencies
  • hisat2 ==2.1.0
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule hisat2_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("index_{genome}")
    params:
        prefix = "index_{genome}/"
    log:
        "logs/hisat2_index_{genome}.log"
    threads: 2
    wrapper:
        "0.53.0/bio/hisat2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for HISAT2 index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

import os
from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
    input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta

hisat_dir = snakemake.params.get("prefix", "")
if hisat_dir:
    os.makedirs(hisat_dir)

shell(
    "hisat2-build {extra} "
    "-p {snakemake.threads} "
    "{input_seq} "
    "{snakemake.params.prefix} "
    "{log}"
)

HMMER

For hmmer, the following wrappers are available:

HMMBUILD

hmmbuild: construct profile HMM(s) from multiple sequence alignment(s)

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmbuild_profile:
    input:
        "test-profile.sto"
    output:
        "test-profile.hmm"
    log:
        "logs/test-profile-hmmbuild.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "0.53.0/bio/hmmer/hmmbuild"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmbuild"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmbuild {extra} --cpu {snakemake.threads} "
    " {snakemake.output} {snakemake.input} {log} "
)
HMMPRESS

Format an HMM database into a binary format for hmmscan.

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmpress_profile:
    input:
        "test-profile.hmm"
    output:
        "test-profile.hmm.h3f",
        "test-profile.hmm.h3i",
        "test-profile.hmm.h3m",
        "test-profile.hmm.h3p"
    log:
        "logs/hmmpress.log"
    params:
        extra="",
    threads: 4
    wrapper:
        "0.53.0/bio/hmmer/hmmpress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# -f Force; overwrites any previous hmmpress-ed datafiles. The default is to bitch about any existing files and ask you to delete them first.

shell("hmmpress -f {snakemake.input} {log}")
HMMSCAN

search protein sequence(s) against a protein profile database

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmscan_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        pfamtblout="test-prot-pfamtbl.txt", # save table of hits and domains to file, in Pfam format <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmscan.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmscan will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "0.53.0/bio/hmmer/hmmscan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# save table of hits and domains to file, in Pfam format <f>
pfamtblout = snakemake.output.get("pfamtblout", "")
if pfamtblout:
    out_cmd += " --pfamtblout {} ".format(pfamtblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "hmmscan {out_cmd} {thresh_cmd} --cpu {snakemake.threads}"
    " {extra} {profile} {snakemake.input.fasta} {log}"
)
HMMSEARCH

search profile(s) against a sequence database

Software dependencies
  • hmmer=3.2.1
Example

This wrapper can be used in the following way:

rule hmmsearch_profile:
    input:
        fasta="test-protein.fa",
        profile="test-profile.hmm.h3f",
    output:
        # only one of these is required
        tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
        domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
        alignment_hits="test-prot-alignment-hits.txt", # Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
        outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
    log:
        "logs/hmmsearch.log"
    params:
        evalue_threshold=0.00001,
        # if bitscore threshold provided, hmmsearch will use that instead
        #score_threshold=50,
        extra="",
    threads: 4
    wrapper:
        "0.53.0/bio/hmmer/hmmsearch"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • N Tessa Pierce
Code
"""Snakemake wrapper for hmmsearch"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")

profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '

# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
    out_cmd += " --domtblout {} ".format(domtblout)

# Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
alignment_hits = snakemake.output.get("alignment_hits", "")
if alignment_hits:
    out_cmd += " -A {} ".format(alignment_hits)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = " -T {} ".format(float(score_threshold))
else:
    thresh_cmd = " -E {} ".format(float(evalue_threshold))

# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    " hmmsearch --cpu {snakemake.threads} "
    " {out_cmd} {thresh_cmd} {extra} {profile} "
    " {snakemake.input.fasta} {log}"
)

IGV-REPORTS

Create self-contained igv.js HTML pages.

Software dependencies
  • igv-reports =0.9.1
Example

This wrapper can be used in the following way:

rule igv_report:
    input:
        fasta="minigenome.fa",
        vcf="variants.vcf",
        # any number of additional optional tracks, see igv-reports manual
        tracks=["alignments.bam"]
    output:
        "igv-report.html"
    params:
        extra=""  # optional params, see igv-reports manual
    log:
        "logs/igv-report.log"
    wrapper:
        "0.53.0/bio/igv-reports"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
"""Snakemake wrapper for igv-reports."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

tracks = snakemake.input.get("tracks", [])
if tracks:
    if isinstance(tracks, str):
        tracks = [tracks]
    tracks = "--tracks {}".format(" ".join(tracks))

shell(
    "create_report {extra} --standalone --output {snakemake.output[0]} {snakemake.input.vcf} {snakemake.input.fasta} {tracks} {log}"
)

INFERNAL

For infernal, the following wrappers are available:

INFERNAL CMPRESS

Starting from a CM database <cmfile> in standard Infernal-1.1 format, construct binary compressed datafiles for cmscan. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Software dependencies
  • infernal=1.1.2
Example

This wrapper can be used in the following way:

rule infernal_cmpress:
    input:
        "test-covariance-model.cm"
    output:
        "test-covariance-model.cm.i1i",
        "test-covariance-model.cm.i1f",
        "test-covariance-model.cm.i1m",
        "test-covariance-model.cm.i1p"
    log:
        "logs/cmpress.log"
    params:
        extra="",
    wrapper:
        "0.53.0/bio/infernal/cmpress"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMpress"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# -F enables overwrite of old (otherwise cmpress will fail if old versions exist)
shell("cmpress -F {snakemake.input} {log}")
INFERNAL CMSCAN

cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Software dependencies
  • infernal=1.1.2
Example

This wrapper can be used in the following way:

rule cmscan_profile:
    input:
        fasta="test-transcript.fa",
        profile="test-covariance-model.cm.i1i"
    output:
        tblout="tr-infernal-tblout.txt",
    log:
        "logs/cmscan.log"
    params:
        evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
        extra= "",
        #score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
    threads: 4
    wrapper:
        "0.53.0/bio/infernal/cmscan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Infernal CMscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]

assert profile.endswith(".cm"), 'your profile file should end with ".cm"'

# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)

# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10)  # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = f" -T {float(score_threshold)} "
else:
    thresh_cmd = f" -E {float(evalue_threshold)} "

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)

JANNOVAR

Annotate predicted effect of nucleotide changes with Jannovar

Software dependencies
  • jannovar-cli ==0.31
Example

This wrapper can be used in the following way:

rule jannovar:
    input:
        vcf="{sample}.vcf",
        pedigree="pedigree_ar.ped" # optional, contains familial relationships
    output:
        "jannovar/{sample}.vcf.gz"
    log:
        "logs/jannovar/{sample}.log"
    params:
        database="hg19_small.ser", # path to jannovar reference dataset
        extra="--show-all"         # optional parameters
    wrapper:
        "0.53.0/bio/jannovar"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

pedigree = snakemake.input.get("pedigree", "")
if pedigree:
    pedigree = '--pedigree-file "%s"' % pedigree

shell(
    "jannovar annotate-vcf --database {snakemake.params.database}"
    " --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
    " {pedigree} {extra} {log}"
)

KALLISTO

For kallisto, the following wrappers are available:

KALLISTO INDEX

Index a transcriptome using kallisto.

Software dependencies
  • kallisto ==0.45.0
Example

This wrapper can be used in the following way:

rule kallisto_index:
    input:
        fasta = "{transcriptome}.fasta"
    output:
        index = "{transcriptome}.idx"
    params:
        extra = "--kmer-size=5"
    log:
        "logs/kallisto_index_{transcriptome}.log"
    threads: 1
    wrapper:
        "0.53.0/bio/kallisto/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto index"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file is required"
fasta = " ".join(fasta) if isinstance(fasta, list) else fasta

shell(
    "kallisto index "  # Tool
    "{extra} "  # Optional parameters
    "--index={snakemake.output.index} "  # Output file
    "{fasta} "  # Input FASTA files
    "{log}"  # Logging
)
KALLISTO QUANT

Pseudoalign reads and quantify transcripts using kallisto.

Software dependencies
  • kallisto ==0.45.0
Example

This wrapper can be used in the following way:

rule kallisto_quant:
    input:
        fastq = ["reads/{exp}_R1.fastq", "reads/{exp}_R2.fastq"],
        index = "index/transcriptome.idx"
    output:
        directory('quant_results_{exp}')
    params:
        extra = ""
    log:
        "logs/kallisto_quant_{exp}.log"
    threads: 1
    wrapper:
        "0.53.0/bio/kallisto/quant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Joël Simoneau
Code
"""Snakemake wrapper for Kallisto quant"""

__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")

# Allowing for multiple FASTQ files
fastq = snakemake.input.get("fastq")
assert fastq is not None, "input-> a FASTQ-file is required"
fastq = " ".join(fastq) if isinstance(fastq, list) else fastq

shell(
    "kallisto quant "  # Tool
    "{extra} "  # Optional parameters
    "--threads={snakemake.threads} "  # Number of threads
    "--index={snakemake.input.index} "  # Input file
    "--output-dir={snakemake.output} "  # Output directory
    "{fastq} "  # Input FASTQ files
    "{log}"  # Logging
)

LAST

For last, the following wrappers are available:

LASTAL

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

Software dependencies
  • last=874
Example

This wrapper can be used in the following way:

rule lastal_nucl_x_nucl:
    input:
        data="test-transcript.fa",
        lastdb="test-transcript.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-transcript.maf",
        #tab="test-transcript.tab",
        #blasttab="test-transcript.blasttab",
        #blasttabplus="test-transcript.blasttabplus",
    params:
        #Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
        D_length=1000000,
        extra=""
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.53.0/bio/last/lastal"

rule lastal_nucl_x_prot:
    input:
        data="test-transcript.fa",
        lastdb="test-protein.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-tr-x-prot.maf"
        #tab="test-tr-x-prot.tab",
        #blasttab="test-tr-x-prot.blasttab",
        #blasttabplus="test-tr-x-prot.blasttabplus",
    params:
        frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
        extra="",
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.53.0/bio/last/lastal"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
""" Snakemake wrapper for lastal """

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000))  # last default

# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
    list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"

out_cmd = ""

if maf_out:
    out_cmd = "-f {}".format("MAF")
    outF = maf_out
elif tab_out:
    out_cmd = "-f {}".format("TAB")
    outF = tab_out
if btab_out:
    out_cmd = "-f {}".format("BlastTab")
    outF = btab_out
if btabplus_out:
    out_cmd = "-f {}".format("BlastTab+")
    outF = btabplus_out

frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
    f_cmd = f"-F {frameshift_cost}"


lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]

shell(
    "lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)
LASTDB

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

Software dependencies
  • last=874
Example

This wrapper can be used in the following way:

rule lastdb_transcript:
    input:
        "test-transcript.fa"
    output:
        "test-transcript.fa.prj",
    params:
        protein_input=False,
        extra=""
    log:
        "logs/lastdb/test-transcript.log"
    wrapper:
        "0.53.0/bio/last/lastdb"

rule lastdb_protein:
    input:
        "test-protein.fa"
    output:
        "test-protein.fa.prj",
    params:
        protein_input=True,
        extra=""
    log:
        "logs/lastdb/test-protein.log"
    wrapper:
        "0.53.0/bio/last/lastdb"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

protein_cmd = ""
protein = snakemake.params.get("protein_input", False)

if protein:
    protein_cmd = " -p "

shell("lastdb {extra} {protein_cmd} -P {snakemake.threads} {snakemake.input} {log}")

LOFREQ

For lofreq, the following wrappers are available:

LOFREQ CALL

simply call variants

Software dependencies
  • samtools ==1.6
  • lofreq ==2.1.3.1
Example

This wrapper can be used in the following way:

rule lofreq:
    input:
        bam="data/{sample}.bam",
        bai="data/{sample}.bai"
    output:
        "calls/{sample}.vcf"
    log:
        "logs/lofreq_call/{sample}.log"
    params:
        ref="data/genome.fasta",
        extra=""
    threads: 8
    wrapper:
        "0.53.0/bio/lofreq/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)

if ref is None:
    raise ValueError("A reference must be provided")

bam_input = snakemake.input.bam
bai_input = snakemake.input.bai

if bam_input is None:
    raise ValueError("Missing bam input file!")

if bai_input is None:
    raise ValueError("Missing bai input file!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

shell(
    "lofreq call-parallel "
    " --pp-threads {snakemake.threads}"
    " -f {ref}"
    " {bam_input}"
    " -o {output_file}"
    " {extra}"
    " {log}"
)

MINIMAP2

For minimap2, the following wrappers are available:

MINIMAP2

A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2

Software dependencies
  • minimap2 ==2.5
Example

This wrapper can be used in the following way:

rule minimap2:
    input:
        target="target/{input1}.mmi", # can be either genome index or genome fasta
        query=["query/reads1.fasta", "query/reads2.fasta"]
    output:
        "aligned/{input1}_aln.paf"
    log:
        "logs/minimap2/{input1}.log"
    params:
        extra="-x map-pb"  # optional
    threads: 3
    wrapper:
        "0.53.0/bio/minimap2/aligner"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

inputQuery = " ".join(snakemake.input.query)

shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "{snakemake.input.target} {inputQuery} >"
    "{snakemake.output[0]}) {log}"
)
MINIMAP2 INDEX

creates a minimap2 index

Software dependencies
  • minimap2 ==2.5
Example

This wrapper can be used in the following way:

rule minimap2_index:
    input:
        target="target/{input1}.fasta"
    output:
        "{input1}.mmi"
    log:
        "logs/minimap2_index/{input1}.log"
    params:
        extra=""  # optional additional args
    threads: 3
    wrapper:
        "0.53.0/bio/minimap2/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tom Poorten
Code
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "(minimap2 -t {snakemake.threads} {extra} "
    "-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)

MSISENSOR

For msisensor, the following wrappers are available:

MSISENSOR MSI

Score your MSI with MSIsensor

Software dependencies
  • msisensor ==0.5
Example

This wrapper can be used in the following way:

rule test_msisensor_msi:
    input:
        normal = "example.normal.bam",
        tumor = "example.tumor.bam",
        microsat = "example.microsate.sites"
    output:
        "example.msi",
        "example.msi_dis",
        "example.msi_germline",
        "example.msi_somatic"
    message:
        "Testing MSIsensor msi"
    threads:
        1
    log:
        "example.log"
    params:
        out_prefix = "example.msi"
    wrapper:
        "0.53.0/bio/msisensor/msi"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
"""Snakemake script for MSISensor msi"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from os.path import commonprefix
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

# Detemining common prefix in output files
# to fill the requested parameter '-o'
prefix = commonprefix(snakemake.output)

shell(
    "msisensor msi"  # Tool and its sub-command
    " -d {snakemake.input.microsat}"  # Path to homopolymer/microsat file
    " -n {snakemake.input.normal}"  # Path to normal bam
    " -t {snakemake.input.tumor}"  # Path to tumor bam
    " -o {prefix}"  # Path to output distribution file
    " -b {snakemake.threads}"  # Maximum number of threads used
    " {extra}"  # Optional extra parameters
    " {log}"  # Logging behavior
)
MSISENSOR SCAN

Scan homopolymers and microsatelites with MSIsensor

Software dependencies
  • msisensor ==0.5
Example

This wrapper can be used in the following way:

rule test_msisensor_scan:
    input:
        "genome.fasta"
    output:
        "microsat.list"
    message:
        "Testing MSISensor scan"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/msisensor_scan.log"
    wrapper:
        "0.53.0/bio/msisensor/scan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
Code
"""Snakemake script for MSISensor Scan"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

shell(
    "msisensor scan "  # Tool and its sub-command
    "-d {snakemake.input} "  # Path to fasta file
    "-o {snakemake.output} "  # Path to output file
    "{extra} "  # Optional extra parameters
    "{log}"  # Logging behavior
)

MULTIQC

Generate qc report using multiqc.

Software dependencies
  • multiqc ==1.8
Example

This wrapper can be used in the following way:

rule multiqc:
    input:
        expand("samtools_stats/{sample}.txt", sample=["a", "b"])
    output:
        "qc/multiqc.html"
    params:
        ""  # Optional: extra parameters for multiqc.
    log:
        "logs/multiqc.log"
    wrapper:
        "0.53.0/bio/multiqc"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "multiqc"
    " {snakemake.params}"
    " --force"
    " -o {output_dir}"
    " -n {output_name}"
    " {input_dirs}"
    " {log}"
)

NANOSIM-H

NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.

Software dependencies
  • nanosim-h ==1.1.0.4
Example

This wrapper can be used in the following way:

rule nanosimh:
    input:
        "{sample}.fa"
    output:
        reads = "{sample}.simulated.fa",
        log = "{sample}.simulated.log",
        errors = "{sample}.simulated.errors.txt"
    params:
        extra = "",
        num_reads = 10,
        perfect_reads = True,
        min_read_len = 10,
    log:
        "logs/nanosim-h/test/{sample}.log"
    wrapper:
        "0.53.0/bio/nanosim-h"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for NanoSim-H."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


def is_header(query):
    return query.startswith(">")


def get_length_of_longest_sequence(fh):
    current_length = 0
    all_lengths = []
    for line in fh:
        if not is_header(line):
            current_length += len(line.rstrip())
        else:
            all_lengths.append(current_length)
            current_length = 0
    all_lengths.append(current_length)

    return max(all_lengths)


# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)

# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
    with open(snakemake.input[0]) as fh:
        max_read_len = get_length_of_longest_sequence(fh)

perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "nanosim-h {extra} "
    "{perfect_reads_flag} "
    "--max-len {max_read_len} "
    "--min-len {min_read_len} "
    "--profile {profile} "
    "--number {num_reads} "
    "--out-pref {prefix} "
    "{snakemake.input} {log}"
)

NGS-DISAMBIGUATE

Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.

Software dependencies
  • ngs-disambiguate ==2016.11.10
  • bamtools ==2.4.0
Example

This wrapper can be used in the following way:

rule disambiguate:
    input:
        a="mapped/{sample}.a.bam",
        b="mapped/{sample}.b.bam"
    output:
        a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
        b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
        a_disambiguated='disambiguate/{sample}.graft.bam',
        b_disambiguated='disambiguate/{sample}.host.bam',
        summary='qc/disambiguate/{sample}.txt'
    params:
        algorithm="bwa",
        # optional: Prefix to use for output. If omitted, a
        # suitable value is guessed from the output paths. Prefix
        # is used for the intermediate output paths, as well as
        # sample name in summary file.
        prefix="{sample}",
        extra=""
    wrapper:
        "0.53.0/bio/ngs-disambiguate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell


# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")

output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
    prefix = path.splitext(path.basename(snakemake.output.summary))[0]

# Run command.
shell(
    "ngs_disambiguate"
    " {extra}"
    " -o {output_dir}"
    " -s {prefix}"
    " -a {snakemake.params.algorithm}"
    " {snakemake.input.a}"
    " {snakemake.input.b}"
)

# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)

output_map = {
    output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
    output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
    output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
    output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
    output_base + "_summary.txt": snakemake.output.summary,
}

for src, dest in output_map.items():
    if src != dest:
        shell("mv {src} {dest}")

PALADIN

For paladin, the following wrappers are available:

PALADIN ALIGN

Align nucleotide reads to a protein fasta file (that has been indexed with paladin index). PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_align:
    input:
        reads=["reads/reads.left.fq.gz"],
        index="index/prot.fasta.bwt",
    output:
        "paladin_mapped/{sample}.bam" # will output BAM format if output file ends with ".bam", otherwise SAM format
    log:
        "logs/paladin/{sample}.log"
    threads: 4
    wrapper:
        "0.53.0/bio/paladin/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PALADIN alignment"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

r = snakemake.input.get("reads")
assert (
    r is not None
), "reads are required as input. If you have paired end reads, please merge them first (e.g. with PEAR)"
index = snakemake.input.get("index")
assert (
    index is not None
), "please index your assembly and provide the basename (with'.bwt' extension) via the 'index' input param"

index_base = str(index).rsplit(".bwt")[0]

outfile = snakemake.output

# if bam output, pipe to bam!
output_cmd = "  | samtools view -Sb - > " if str(outfile).endswith(".bam") else " -o "

min_orf_len = snakemake.params.get("f", "250")

shell(
    "paladin align -f {min_orf_len} -t {snakemake.threads} {extra} {index_base} {r} {output_cmd} {outfile}"
)
PALADIN INDEX

Index a protein fasta file for mapping with paladin. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_index:
    input:
        "prot.fasta",
    output:
        "index/prot.fasta.bwt"
    log:
        "logs/paladin/prot_index.log"
    params:
      reference_type=3
    wrapper:
        "0.53.0/bio/paladin/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Index."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"


# this wrapper temporarily copies your assembly into the output dir
# so that all the paladin output files end up in the desired spot

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

input_assembly = snakemake.input
annotation = snakemake.input.get("gff", "")
paladin_index = str(snakemake.output)
reference_type = snakemake.params.get("reference_type", "3")
assert int(reference_type) in [1, 2, 3, 4]
ref_type_cmd = "-r" + str(reference_type)

output_base = paladin_index.rsplit(".bwt")[0]

shell("cp {input_assembly} {output_base}")
shell("paladin index {ref_type_cmd} {output_base} {annotation} {extra} {log}")
shell("rm -f {output_base}")
PALADIN PREPARE

Download and prepare uniprot refs for paladin mapping. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.

Software dependencies
  • paladin=1.4.4
  • samtools=1.5
Example

This wrapper can be used in the following way:

rule paladin_prepare:
    output:
        "uniprot_sprot.fasta.gz",
        "uniprot_sprot.fasta.gz.pro"
    log:
        "logs/paladin/prepare_sprot.log"
    params:
        reference_type=1, # 1=swiss-prot, 2=uniref90
    wrapper:
        "0.53.0/bio/paladin/prepare"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Paladin Prepare"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

reference_type = snakemake.params.get(
    "reference_type", "1"
)  # download swissprot as default
assert int(reference_type) in [1, 2]
ref_type_cmd = "-r" + str(reference_type)

shell("paladin prepare {ref_type_cmd} {extra} {log}")

PEAR

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger

Software dependencies
  • pear=0.9.6
Example

This wrapper can be used in the following way:

rule pear_merge:
    input:
        read1="reads/reads.left.fq.gz",
        read2="reads/reads.right.fq.gz"
    output:
        assembled="pear/reads_pear_assembled.fq.gz",
        discarded="pear/reads_pear_discarded.fq.gz",
        unassembled_read1="pear/reads_pear_unassembled_r1.fq.gz",
        unassembled_read2="pear/reads_pear_unassembled_r2.fq.gz",
    log:
        'logs/pear.log'
    params:
        pval=".01",
        extra=""
    threads: 4
    resources:
        mem_mb=4000 # define amount of memory to be used by pear
    wrapper:
        "0.53.0/bio/pear"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

r1 = snakemake.input.get("read1")
r2 = snakemake.input.get("read2")
assert r1 is not None and r2 is not None, "r1 and r2 files are required as input"

assembled = snakemake.output.get("assembled")
assert assembled is not None, "require 'assembled' outfile"
gzip = True if assembled.endswith(".gz") else False

out_base, out_end = assembled.rsplit(".f")
out_end = ".f" + out_end

df_assembled = out_base + ".assembled.fastq"
df_discarded = out_base + ".discarded.fastq"
df_unassembled_r1 = out_base + ".unassembled.forward.fastq"
df_unassembled_r2 = out_base + ".unassembled.reverse.fastq"

df_outputs = [df_assembled, df_discarded, df_unassembled_r1, df_unassembled_r2]

discarded = snakemake.output.get("discarded", out_base + ".discarded" + out_end)
unassembled_r1 = snakemake.output.get(
    "unassembled_read1", out_base + ".unassembled_r1" + out_end
)
unassembled_r2 = snakemake.output.get(
    "unassembled_read2", out_base + ".unassembled_r2" + out_end
)

final_outputs = [assembled, discarded, unassembled_r1, unassembled_r2]


def move_files(in_list, out_list, gzip):
    for f, o in zip(in_list, out_list):
        if f != o:
            if gzip:
                shell("gzip -9 -c {f} > {o}")
                shell("rm -f {f}")
            else:
                shell("cp {f} {o}")
                shell("rm -f {f}")
        elif gzip:
            shell("gzip -9 {f}")


pval = float(snakemake.params.get("pval", ".01"))
max_mem = snakemake.resources.get("mem_mb", "4000")
extra = snakemake.params.get("extra", "")

shell(
    "pear -f {r1} -r {r2} -p {pval} -j {snakemake.threads} -y {max_mem} {extra} -o {out_base} {log}"
)

move_files(df_outputs, final_outputs, gzip)

PICARD

For picard, the following wrappers are available:

PICARD ADDORREPLACEREADGROUPS

Add or replace read groups with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule replace_rg:
    input:
        "mapped/{sample}.bam"
    output:
        "fixed-rg/{sample}.bam"
    log:
        "logs/picard/replace_rg/{sample}.log"
    params:
        "RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
    wrapper:
        "0.53.0/bio/picard/addorreplacereadgroups"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell(
    "picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
    "O={snakemake.output} &> {snakemake.log}"
)
PICARD COLLECTALIGNMENTSUMMARYMETRICS

Collect metrics on aligned reads with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule alignment_summary:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam"
    output:
        "stats/{sample}.summary.txt"
    log:
        "logs/picard/alignment-summary/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.53.0/bio/picard/collectalignmentsummarymetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "picard CollectAlignmentSummaryMetrics {snakemake.params} "
    "INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
    "REFERENCE_SEQUENCE={snakemake.input.ref} {log}"
)
PICARD COLLECTHSMETRICS

Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule picard_collect_hs_metrics:
    input:
        bam="mapped/{sample}.bam",
        reference="genome.fasta",
        # Baits and targets should be given as interval lists. These can
        # be generated from bed files using picard BedToIntervalList.
        bait_intervals="regions.intervals",
        target_intervals="regions.intervals"
    output:
        "stats/hs_metrics/{sample}.txt"
    params:
        # Optional extra arguments. Here we reduce sample size
        # to reduce the runtime in our unit test.
        "SAMPLE_SIZE=1000"
    log:
        "logs/picard_collect_hs_metrics/{sample}.log"
    wrapper:
        "0.53.0/bio/picard/collecthsmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard CollectHSMetrics."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard CollectHsMetrics"
    " {extra}"
    " INPUT={snakemake.input.bam}"
    " OUTPUT={snakemake.output[0]}"
    " REFERENCE_SEQUENCE={snakemake.input.reference}"
    " BAIT_INTERVALS={snakemake.input.bait_intervals}"
    " TARGET_INTERVALS={snakemake.input.target_intervals}"
    " {log}"
)
PICARD COLLECTINSERTSIZEMETRICS

Collect metrics on insert size of paired end reads with picard tools.

Software dependencies
  • picard ==2.22.1
  • r-base ==3.6.2
Example

This wrapper can be used in the following way:

rule insert_size:
    input:
        "mapped/{sample}.bam"
    output:
        txt="stats/{sample}.isize.txt",
        pdf="stats/{sample}.isize.pdf"
    log:
        "logs/picard/insert_size/{sample}.log"
    params:
        # optional parameters (e.g. relax checks as below)
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE"
    wrapper:
        "0.53.0/bio/picard/collectinsertsizemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()


shell(
    "picard CollectInsertSizeMetrics {snakemake.params} "
    "INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
    "HISTOGRAM_FILE={snakemake.output.pdf} {log}"
)
PICARD COLLECTTARGETEDPCRMETRICS

Collect metric information for target pcr metrics runs, with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule CollectTargetedPcrMetrics:
    input:
        bam="mapped/{sample}.bam",
        amplicon_intervals="amplicon.list",
        target_intervals="target.list"
    output:
        "stats/{sample}.pcr.txt"
    log:
        "logs/picard/collecttargetedpcrmetrics/{sample}.log"
    params:
        extra=""
    wrapper:
        "0.53.0/bio/picard/collecttargetedpcrmetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@mail.com"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell()

extra = snakemake.params.get("extra", "")

shell(
    "picard CollectTargetedPcrMetrics "
    "{extra} "
    "INPUT={snakemake.input.bam} "
    "OUTPUT={snakemake.output[0]} "
    "AMPLICON_INTERVALS={snakemake.input.amplicon_intervals} "
    "TARGET_INTERVALS={snakemake.input.target_intervals} "
    "{log}"
)
PICARD CREATESEQUENCEDICTIONARY

Create a .dict file for a given FASTA file

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule create_dict:
    input:
        "genome.fasta"
    output:
        "genome.dict"
    log:
        "logs/picard/create_dict.log"
    params:
        extra=""  # optional: extra arguments for picard.
    wrapper:
        "0.53.0/bio/picard/createsequencedictionary"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard "
    "CreateSequenceDictionary "
    "{extra} "
    "R={snakemake.input[0]} "
    "O={snakemake.output[0]} "
    "{log}"
)
PICARD MARKDUPLICATES

Mark PCR and optical duplicates with picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        metrics="dedup/{sample}.metrics.txt"
    log:
        "logs/picard/dedup/{sample}.log"
    params:
        "REMOVE_DUPLICATES=true"
    wrapper:
        "0.53.0/bio/picard/markduplicates"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
    "OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
    "{log}"
)
PICARD MERGESAMFILES

Merge sam/bam files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule merge_bams:
    input:
        expand("mapped/{sample}.bam", sample=["a", "b"])
    output:
        "merged.bam"
    log:
        "logs/picard_mergesamfiles.log"
    params:
        "VALIDATION_STRINGENCY=LENIENT"
    wrapper:
        "0.53.0/bio/picard/mergesamfiles"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " MergeSamFiles"
    " {snakemake.params}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD MERGEVCFS

Merge vcf files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule merge_vcfs:
    input:
        vcfs=["snvs.chr1.vcf", "snvs.chr2.vcf"]
    output:
        "snvs.vcf"
    log:
        "logs/picard/mergevcfs.log"
    params:
        extra=""
    wrapper:
        "0.53.0/bio/picard/mergevcfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
"""Snakemake wrapper for picard MergeSamFiles."""

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"


from snakemake.shell import shell


inputs = " ".join("INPUT={}".format(f) for f in snakemake.input.vcfs)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "picard"
    " MergeVcfs"
    " {extra}"
    " {inputs}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD REVERTSAM

Reverts SAM or BAM files to a previous state. .

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule revert_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "revert/{sample}.bam"
    log:
        "logs/picard/revert_sam/{sample}.log"
    params:
        extra="SANITIZE=true" # optional: Extra arguments for picard.
    wrapper:
        "0.53.0/bio/picard/revertsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard RevertSam."""

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " RevertSam"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " {log}"
)
PICARD SOMTOFASTQ

Converts a SAM or BAM file to FASTQ.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule bam_to_fastq:
    input:
        "mapped/{sample}.bam"
    output:
        fastq1="reads/{sample}.R1.fastq",
        fastq2="reads/{sample}.R2.fastq"
    log:
        "logs/picard/sam_to_fastq/{sample}.log"
    params:
        extra="" # optional: Extra arguments for picard.
    wrapper:
        "0.53.0/bio/picard/samtofastq"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)

if not isinstance(fastq1, str):
    raise ValueError("f1 needs to be provided")

output = " FASTQ=" + fastq1

if isinstance(fastq2, str):
    output += " SECOND_END_FASTQ=" + fastq2

if isinstance(fastq_unpaired, str):
    if not isinstance(fastq2, str):
        raise ValueError("f2 is required if fastq_unpaired is set")
    else:
        output += " UNPAIRED_FASTQ=" + fastq_unpaired

shell(
    "picard" " SamToFastq" " {extra}" " INPUT={snakemake.input[0]}" " {output}" " {log}"
)
PICARD SORTSAM

Sort sam/bam files using picard tools.

Software dependencies
  • picard ==2.22.1
Example

This wrapper can be used in the following way:

rule sort_bam:
    input:
        "mapped/{sample}.bam"
    output:
        "sorted/{sample}.bam"
    log:
        "logs/picard/sort_sam/{sample}.log"
    params:
        sort_order="coordinate",
        extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
    wrapper:
        "0.53.0/bio/picard/sortsam"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for picard SortSam."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "picard"
    " SortSam"
    " {extra}"
    " INPUT={snakemake.input[0]}"
    " OUTPUT={snakemake.output[0]}"
    " SORT_ORDER={snakemake.params.sort_order}"
    " {log}"
)

PINDEL

For pindel, the following wrappers are available:

PINDEL

Call variants with pindel.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]


rule pindel:
    input:
        ref="genome.fasta",
        # samples to call
        samples=["mapped/a.bam"],
        # bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
        config="pindel_config.txt"
    output:
        expand("pindel/all_{type}", type=pindel_types)
    params:
        # prefix must be consistent with output files
        prefix="pindel/all",
        extra=""  # optional parameters (except -i, -f, -o)
    log:
        "logs/pindel.log"
    threads: 4
    wrapper:
        "0.53.0/bio/pindel/call"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
    "-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}"
)
PINDEL2VCF

Convert pindel output to vcf.

Software dependencies
  • pindel ==0.2.5b8
Example

This wrapper can be used in the following way:

rule pindel2vcf:
    input:
        ref="genome.fasta",
        pindel="pindel/all_{type}"
    output:
        "pindel/all_{type}.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.{type}.log"
    wrapper:
        "0.53.0/bio/pindel/pindel2vcf"

rule pindel2vcf_multi_input:
    input:
        ref="genome.fasta",
        pindel=["pindel/all_D", "pindel/all_INV"]
    output:
        "pindel/all.vcf"
    params:
        refname="hg38",  # mandatory, see pindel manual
        refdate="20170110",  # mandatory, see pindel manual
        extra=""  # extra params (except -r, -p, -R, -d, -v)
    log:
        "logs/pindel/pindel2vcf.log"
    wrapper:
        "0.53.0/bio/pindel/pindel2vcf"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

expected_endings = [
    "INT",
    "D",
    "SI",
    "INV",
    "INV_final" "TD",
    "LI",
    "BP",
    "CloseEndMapped",
    "RP",
]


def split_file_name(file_parts, file_ending_index):
    return (
        "_".join(file_parts[:file_ending_index]),
        "_".join(file_parts[file_ending_index]),
    )


def process_input_path(input_file):
    """
        :params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
        :return: ""/path/to/file", "all"

    """
    file_path, file_name = os.path.split(input_file)
    file_parts = file_name.split("_")
    # seperate ending and name, to name: all ending: D or name: all ending: INV_final
    file_name, file_ending = split_file_name(
        file_parts, -2 if file_name.endswith("_final") else -1
    )
    if not file_ending in expected_endings:
        raise Exception("Unexpected variant type: " + file_ending)
    return file_path, file_name


with tempfile.TemporaryDirectory() as tmpdirname:
    input_flag = "-p"
    input_file = snakemake.input.get("pindel")
    if isinstance(input_file, list) and len(input_file) > 1:
        input_flag = "-P"
        input_path, input_name = process_input_path(input_file[0])
        input_file = os.path.join(input_path, input_name)
        for variant_input in snakemake.input.pindel:
            if not variant_input.startswith(input_file):
                raise Exception(
                    "Unable to extract common path from multi file input, expect path is: "
                    + input_file
                )
            if not os.path.isfile(variant_input):
                raise Exception('Input "' + input_file + '" is not a file!')
            os.symlink(
                os.path.abspath(variant_input),
                os.path.join(tmpdirname, os.path.basename(variant_input)),
            )
        input_file = os.path.join(tmpdirname, input_name)
    shell(
        "pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}"
    )

PLASS

Plass (Protein-Level ASSembler) is software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.

Software dependencies
  • plass=2.c7e35
Example

This wrapper can be used in the following way:

rule plass_paired:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "plass/prot.fasta"
    log:
        "logs/plass.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "0.53.0/bio/plass"

rule plass_single:
    input:
        single=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
    output:
        "plass/prot_single.fasta"
    log:
        "logs/plass_single.log"
    params:
        extra=""
    threads: 4
    wrapper:
        "0.53.0/bio/plass"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for PLASS Protein-Level Assembler."""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2018, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

# allow multiple input files for single assembly
left = snakemake.input.get("left")
single = snakemake.input.get("single")
assert (
    left is not None or single is not None
), "please check read inputs: either left/right or single read file inputs are required"
if left:
    left = (
        [snakemake.input.left]
        if isinstance(snakemake.input.left, str)
        else snakemake.input.left
    )
    right = snakemake.input.get("right")
    assert (
        right is not None
    ), "please input 'right' reads or specify that the reads are 'single'"
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) == len(
        right
    ), "left input needs to contain the same number of files as the right input"
    input_str_left = " " + " ".join(left)
    input_str_right = " " + " ".join(right)
    input_cmd = input_str_left + " " + input_str_right
else:
    single = (
        [snakemake.input.single]
        if isinstance(snakemake.input.single, str)
        else snakemake.input.single
    )
    input_cmd = " " + " ".join(single)


outdir = path.dirname(snakemake.output[0])
tmpdir = path.join(outdir, "tmp")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "plass assemble {input_cmd} {snakemake.output} {tmpdir} --threads {snakemake.threads} {snakemake.params.extra} {log}"
)

PRIMERCLIP

Primer trimming on sam file, https://github.com/swiftbiosciences/primerclip

Software dependencies
  • samtools ==1.9
  • primerclip ==0.3.8
Example

This wrapper can be used in the following way:

rule primerclip:
    input:
        0.53.0_file="0.53.0_file",
        alignment_file="mapped/{sample}.bam"
    output:
        alignment_file="mapped/{sample}.trimmed.bam"
    log:
        "logs/primerclip/{sample}.log"
    params:
        extra=""
    wrapper:
        "0.53.0/bio/primerclip"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from os import path

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

master_file = snakemake.input.master_file
in_alignment_file = snakemake.input.alignment_file
out_alignment_file = snakemake.output.alignment_file

# Check inputs/arguments.
if not isinstance(master_file, str):
    raise ValueError("master_file, path to the master file")

if not isinstance(in_alignment_file, str):
    raise ValueError("in_alignment_file, path to the input alignment file")

if not isinstance(out_alignment_file, str):
    raise ValueError("out_alignment_file, path to the output file")

samtools_input_command = "samtools view -h " + in_alignment_file

samtools_output_command = " | head -n -3 | samtools view -Sh"

if out_alignment_file.endswith(".cram"):
    samtools_output_command += "C -o " + out_alignment_file
elif out_alignment_file.endswith(".sam"):
    samtools_output_command += " -o " + out_alignment_file
else:
    samtools_output_command += "b -o " + out_alignment_file

shell(
    "{samtools_input_command} |"
    " primerclip"
    " {master_file}"
    " /dev/stdin"
    " /dev/stdout"
    " {samtools_output_command}"
    " {log}"
)

PYFASTAQ

For pyfastaq, the following wrappers are available:

PYFASTAQ REPLACE_BASES

Replaces all occurrences of one letter with another.

Software dependencies
  • pyfastaq ==3.17.0
Example

This wrapper can be used in the following way:

rule replace_bases:
    input:
        "{sample}.rna.fa"
    output:
        "{sample}.dna.fa",
    params:
        old_base = "U",
        new_base = "T",
    log:
        "logs/fastaq/replace_bases/test/{sample}.log"
    wrapper:
        "0.53.0/bio/pyfastaq/replace_bases"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"


from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "fastaq replace_bases"
    " {snakemake.input[0]}"
    " {snakemake.output[0]}"
    " {snakemake.params.old_base}"
    " {snakemake.params.new_base}"
    " {log}"
)

REFERENCE

For reference, the following wrappers are available:

ENSEMBL-ANNOTATION

Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.

Software dependencies
  • curl
Example

This wrapper can be used in the following way:

rule get_annotation:
    output:
        "refs/annotation.gtf"
    params:
        species="homo_sapiens",
        release="87",
        build="GRCh37",
        fmt="gtf",
        flavor="" # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
    log:
        "logs/get_annotation.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.53.0/bio/reference/ensembl-annotation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess
import sys
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
fmt = snakemake.params.fmt
build = snakemake.params.build
flavor = snakemake.params.get("flavor", "")

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

if flavor:
    flavor += "."

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

suffix = ""
if fmt == "gtf":
    suffix = "gtf.gz"
elif fmt == "gff3":
    suffix = "gff3.gz"

url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{fmt}/{species}/{species_cap}.{build}.{release}.{flavor}{suffix}".format(
    release=release,
    build=build,
    species=species,
    fmt=fmt,
    species_cap=species.capitalize(),
    suffix=suffix,
    flavor=flavor,
    branch=branch,
)

try:
    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download annotation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-SEQUENCE

Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.

Software dependencies
  • curl
Example

This wrapper can be used in the following way:

rule get_genome:
    output:
        "refs/genome.fasta"
    params:
        species="saccharomyces_cerevisiae",
        datatype="dna",
        build="R64-1-1",
        release="98"
    log:
        "logs/get_genome.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.53.0/bio/reference/ensembl-sequence"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess as sp
import sys
from itertools import product
from snakemake.shell import shell

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
    build=build, release=release
)

suffixes = ""
datatype = snakemake.params.get("datatype", "")
if datatype == "dna":
    suffixes = ["dna.primary_assembly.fa.gz", "dna.toplevel.fa.gz"]
elif datatype == "cdna":
    suffixes = ["cdna.all.fa.gz"]
elif datatype == "cds":
    suffixes = ["cds.all.fa.gz"]
elif datatype == "ncrna":
    suffixes = ["ncrna.fa.gz"]
elif datatype == "pep":
    suffixes = ["pep.all.fa.gz"]
else:
    raise ValueError("invalid datatype, must be one of dna, cdna, cds, ncrna, pep")

success = False
for suffix in suffixes:
    url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species_cap}.{spec}.{suffix}".format(
        release=release,
        species=species,
        datatype=datatype,
        spec=spec.format(build=build, release=release),
        suffix=suffix,
        species_cap=species.capitalize(),
        branch=branch,
    )

    try:
        shell("curl -sSf {url} > /dev/null 2> /dev/null")
    except sp.CalledProcessError:
        continue

    shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
    success = True
    break

if not success:
    print(
        "Unable to download requested sequence data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)
ENSEMBL-VARIATION

Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.

Software dependencies
  • bcftools >=1.10
  • curl
Example

This wrapper can be used in the following way:

rule get_variation:
    output:
        vcf="refs/variation.vcf.gz"
        # optional: add fai to get VCF with annotated contig lengths (as required by GATK)
        # fai="refs/genome.fasta.fai"
    params:
        species="saccharomyces_cerevisiae",
        release="98",
        build="R64-1-1",
        type="all" # one of "all", "somatic", "structural_variation"
    log:
        "logs/get_variation.log"
    cache: True  # save space and time with between workflow caching (see docs)
    wrapper:
        "0.53.0/bio/reference/ensembl-variation"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import tempfile
import subprocess
import sys
from snakemake.shell import shell
from snakemake.exceptions import WorkflowError

species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
type = snakemake.params.type

branch = ""
if release >= 81 and build == "GRCh37":
    # use the special grch37 branch for new releases
    branch = "grch37/"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

if type == "all":
    if species == "homo_sapiens" and release >= 93:
        suffixes = [
            "-chr{}".format(chrom) for chrom in list(range(1, 23)) + ["X", "Y", "MT"]
        ]
    else:
        suffixes = [""]
elif type == "somatic":
    suffixes = ["_somatic"]
elif type == "structural_variations":
    suffixes = ["_structural_variations"]
else:
    raise ValueError(
        "Unsupported type {} (only all, somatic, structural_variations are allowed)".format(
            type
        )
    )

species_filename = species if release >= 91 else species.capitalize()

urls = [
    "ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.vcf.gz".format(
        release=release,
        species=species,
        suffix=suffix,
        species_filename=species_filename,
        branch=branch,
    )
    for suffix in suffixes
]

download = ("bcftools concat -Oz {urls}" if len(urls) > 1 else "curl -L {urls}").format(
    urls=" ".join(urls)
)

try:
    if snakemake.input.get("fai"):
        # in case of a given .fai, reheader the VCF such that contig lengths are defined
        with tempfile.TemporaryDirectory() as tmpdir:
            shell(
                "({download} > {tmpdir}/out.vcf.gz && "
                " bcftools reheader --fai {snakemake.input.fai} {tmpdir}/out.vcf.gz | bcftools view -Oz -o {snakemake.output[0]}) {log}"
            )
    else:
        # without .fai, just concatenate
        shell("{download} > {snakemake.output[0]} {log}")
except subprocess.CalledProcessError as e:
    if snakemake.log:
        sys.stderr = open(snakemake.log[0], "a")
    print(
        "Unable to download variation data from Ensembl. "
        "Did you check that this combination of species, build, and release is actually provided?",
        file=sys.stderr,
    )
    exit(1)

REFGENIE

Deploy biomedical reference datasets via refgenie.

Software dependencies
  • refgenie =0.8.2
  • refgenconf =0.6.1
Example

This wrapper can be used in the following way:

rule obtain_asset:
    output:
        # the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
        fai="refs/genome.fasta"
        # Multiple outputs/seek keys are possible here.
    params:
        genome="human_alu",
        asset="fasta",
        tag="default"
    wrapper:
        "0.53.0/bio/refgenie"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import refgenconf

genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag

conf_path = os.environ["REFGENIE"]

rgc = refgenconf.RefGenConf(conf_path, writable=True)

# pull asset if necessary
gat, archive_data, server_url = rgc.pull_asset(genome, asset, tag, force=False)

for seek_key, out in snakemake.output.items():
    path = rgc.get_asset(genome, asset, tag, seek_key=seek_key)
    os.symlink(path, out)

RUBIC

RUBIC detects recurrent copy number alterations using copy number breaks.

Software dependencies
  • r-base =3.4.1
  • r-rubic =1.0.3
  • r-data.table =1.10.4
  • r-pracma =2.0.4
  • r-ggplot2 =2.2.1
  • r-gtable =0.2.0
  • r-codetools =0.2_15
  • r-digest =0.6.12
Example

This wrapper can be used in the following way:

rule rubic:
    input:
        seg="{samples}/segments.txt",
        markers="{samples}/markers.txt"
    output:
        out_gains="{samples}/gains.txt",
        out_losses="{samples}/losses.txt",
        out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
    params:
        fdr="",
        genefile=""
    wrapper:
        "0.53.0/bio/rubic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Beatrice F. Tan
Code
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"

library(RUBIC)

all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]

rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])

SALMON

For salmon, the following wrappers are available:

SALMON_INDEX

Index a transcriptome assembly with salmon

Software dependencies
  • salmon ==0.14.1
Example

This wrapper can be used in the following way:

rule salmon_index:
    input:
        "assembly/transcriptome.fasta"
    output:
        directory("salmon/transcriptome_index")
    log:
        "logs/salmon/transcriptome_index.log"
    threads: 2
    params:
        # optional parameters
        extra=""
    wrapper:
        "0.53.0/bio/salmon/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Index."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "salmon index -t {snakemake.input} -i {snakemake.output} "
    " --threads {snakemake.threads} {extra} {log}"
)
SALMON_QUANT

Quantify transcripts with salmon

Software dependencies
  • salmon ==0.14.1
Example

This wrapper can be used in the following way:

rule salmon_quant_reads:
    input:
        # If you have multiple fastq files for a single sample (e.g. technical replicates)
        # use a list for r1 and r2.
        r1 = "reads/{sample}_1.fq.gz",
        r2 = "reads/{sample}_2.fq.gz",
        index = "salmon/transcriptome_index"
    output:
        quant = 'salmon/{sample}/quant.sf',
        lib = 'salmon/{sample}/lib_format_counts.json'
    log:
        'logs/salmon/{sample}.log'
    params:
        # optional parameters
        libtype ="A",
        #zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
        extra=""
    threads: 2
    wrapper:
        "0.53.0/bio/salmon/quant"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Salmon Quant"""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell


def manual_decompression(reads, zip_ext):
    """ Allow *.bz2 input into salmon. Also provide same
    decompression for *gz files, as salmon devs mention
    it may be faster in some cases."""
    if zip_ext and reads:
        if zip_ext == "bz2":
            reads = " < (bunzip2 -c " + reads + ")"
        elif zip_ext == "gz":
            reads = " < (gunzip -c " + reads + ")"
    return reads


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")

r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")

assert (
    r1 is not None and r2 is not None
) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
    r1 = (
        [snakemake.input.r1]
        if isinstance(snakemake.input.r1, str)
        else snakemake.input.r1
    )
    r2 = (
        [snakemake.input.r2]
        if isinstance(snakemake.input.r2, str)
        else snakemake.input.r2
    )
    assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
    r1_cmd = " -1 " + manual_decompression(" ".join(r1), zip_extension)
    r2_cmd = " -2 " + manual_decompression(" ".join(r2), zip_extension)
    read_cmd = " ".join([r1_cmd, r2_cmd])
if r:
    assert (
        r1 is None and r2 is None
    ), "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
    r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
    read_cmd = " -r " + manual_decompression(" ".join(r), zip_extension)

outdir = path.dirname(snakemake.output.get("quant"))

shell(
    "salmon quant -i {snakemake.input.index} "
    " -l {libtype} {read_cmd} -o {outdir} "
    " -p {snakemake.threads} {extra} {log} "
)

SAMBAMBA

For sambamba, the following wrappers are available:

SAMBAMBA SORT

Sort bam file with sambamba

Software dependencies
  • sambamba ==0.6.6
Example

This wrapper can be used in the following way:

rule sambamba_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        ""  # optional parameters
    threads: 8
    wrapper:
        "0.53.0/bio/sambamba/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

shell(
    "sambamba sort {snakemake.params} -t {snakemake.threads} "
    "-o {snakemake.output[0]} {snakemake.input[0]}"
)

SAMTOOLS

For samtools, the following wrappers are available:

SAMTOOLS BAM2FQ INTERLEAVED

Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_bam2fq_interleaved:
    input:
        "mapped/{sample}.bam"
    output:
        "reads/{sample}.fq"
    params:
        " "
    threads: 3
    wrapper:
        "0.53.0/bio/samtools/bam2fq/interleaved"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

shell(
    "samtools bam2fq {snakemake.params} "
    " -@ {snakemake.threads} "
    " {snakemake.input[0]}"
    " >{snakemake.output[0]} "
    )
SAMTOOLS BAM2FQ SEPARATE

Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools bam2fq documentation), as are secondary (0x100) and supplementary reads (0x800).

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_bam2fq_separate:
    input:
        "mapped/{sample}.bam"
    output:
        "reads/{sample}.1.fq",
        "reads/{sample}.2.fq"
    params:
        sort = "-m 4G",
        bam2fq = "-n"
    threads:  # Remember, this is the number of samtools' additional threads
        3     # At least 2 threads have to be requested on cluster sumbission.
              # Thus, this value - 2 will be sent to samtools sort -@ argument.
    wrapper:
        "0.53.0/bio/samtools/bam2fq/separate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • David Laehnemann
  • Victoria Sack
Code
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

prefix = os.path.splitext(snakemake.output[0])[0]

# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools bam2fq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = (
    "" if snakemake.threads <= 2
    else " -@ {} ".format(snakemake.threads - 2)
)

shell(
    "samtools sort -n "
    " {threads} "
    " -T {prefix} "
    " {snakemake.params.sort} "
    " {snakemake.input[0]} | "
    "samtools bam2fq "
    " {snakemake.params.bam2fq} "
    " -1 {snakemake.output[0]} "
    " -2 {snakemake.output[1]} "
    " -0 /dev/null "
    " -s /dev/null "
    " -F 0x900 "
    " - "
    )
SAMTOOLS DEPTH

Compute the read depth at each position or region using samtools.

Software dependencies
  • samtools ==1.10
Example

This wrapper can be used in the following way:

rule samtools_depth:
    input:
        bams=["mapped/A.bam", "mapped/B.bam"],
        bed="regionToCalcDepth.bed", # optional
    output:
        "depth.txt"
    params:
        # optional bed file passed to -b
        extra="" # optional additional parameters as string
    wrapper:
        "0.53.0/bio/samtools/depth"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Dayne Filer
Code
"""Snakemake wrapper for running samtools depth."""

__author__ = "Dayne L Filer"
__copyright__ = "Copyright 2020, Dayne L Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

params = snakemake.params.get("extra", "")

# check for optional bed file
bed = snakemake.input.get("bed", "")
if bed:
    bed = "-b " + bed

shell(
    "samtools depth {params} {bed} " "-o {snakemake.output[0]} {snakemake.input.bams}"
)
SAMTOOLS FAIDX

index reference sequence in FASTA format from reference sequence

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "{sample}.fa"
    output:
        "{sample}.fa.fai"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/samtools/faidx"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Chambers
Code
__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "greenkidneybean@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools faidx {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS FIXMATE

Use samtools to correct mate information after BWA mapping.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_fixmate:
    input:
        "mapped/{input}"
    output:
        "fixed/{input}"
    message:
        "Fixing mate information in {wildcards.input}"
    threads:
        1
    params:
        extra = ""
    wrapper:
        "0.53.0/bio/samtools/fixmate/"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for samtools fixmate"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")

# Samtools' threads parameter lists ADDITIONAL threads.
# that is why threads - 1 has to be given to the -@ parameter
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

makedirs(op.dirname(snakemake.output[0]))

shell(
    "samtools fixmate {extra} {threads}" " {snakemake.input[0]} {snakemake.output[0]}"
)
SAMTOOLS FLAGSTAT

Use samtools to create a flagstat file from a bam or sam file.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_flagstat:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.bam.flagstat"
    wrapper:
        "0.53.0/bio/samtools/flagstat"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Christopher Preusch
Code
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS INDEX

Index bam file with samtools.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_index:
    input:
        "mapped/{sample}.sorted.bam"
    output:
        "mapped/{sample}.sorted.bam.bai"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/samtools/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE

Merge two bam files with samtools.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_merge:
    input:
        ["mapped/A.bam", "mapped/B.bam"]
    output:
        "merged.bam"
    params:
        "" # optional additional parameters as string
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@
    wrapper:
        "0.53.0/bio/samtools/merge"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools merge {threads} {snakemake.params} "
    "{snakemake.output[0]} {snakemake.input}"
)
SAMTOOLS MPILEUP

Generate pileup using samtools.

Software dependencies
  • samtools ==1.9
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule mpilup:
    input:
        # single or list of bam files
        bam="mapped/{sample}.bam",
        reference_genome="genome.fasta"
    output:
        "mpileup/{sample}.mpileup.gz"
    log:
        "logs/samtools/mpileup/{sample}.log"
    params:
        extra="-d 10000",  # optional
    wrapper:
        "0.53.0/bio/samtools/mpileup"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
"""Snakemake wrapper for running mpileup."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam_input = snakemake.input.bam
reference_genome = snakemake.input.reference_genome

extra = snakemake.params.get("extra", "")

if not snakemake.output[0].endswith(".gz"):
    raise Exception(
        'output file will be compressed and therefore filename should end with ".gz"'
    )

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "samtools mpileup "
    "{extra} "
    "-f {reference_genome} "
    "{bam_input}  "
    " | pigz > {snakemake.output} "
    "{log}"
)
SAMTOOLS SORT

Sort bam file with samtools.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_sort:
    input:
        "mapped/{sample}.bam"
    output:
        "mapped/{sample}.sorted.bam"
    params:
        "-m 4G"
    threads:  # Samtools takes additional threads through its option -@
        8     # This value - 1 will be sent to -@.
    wrapper:
        "0.53.0/bio/samtools/sort"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell


prefix = os.path.splitext(snakemake.output[0])[0]

# Samtools takes additional threads through its option -@
# One thread for samtools
# Other threads are *additional* threads passed to the argument -@
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools sort {snakemake.params} {threads} -o {snakemake.output[0]} "
    "-T {prefix} {snakemake.input[0]}"
)
SAMTOOLS STATS

Generate stats using samtools.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_stats:
    input:
        "mapped/{sample}.bam"
    output:
        "samtools_stats/{sample}.txt"
    params:
        extra="",                       # Optional: extra arguments.
        region="1:1000000-2000000"      # Optional: region string.
    log:
        "logs/samtools_stats/{sample}.log"
    wrapper:
        "0.53.0/bio/samtools/stats"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Julian de Ruiter
Code
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""

__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


shell("samtools stats {extra} {snakemake.input}" " {region} > {snakemake.output} {log}")
SAMTOOLS VIEW

Convert or filter SAM/BAM.

Software dependencies
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule samtools_view:
    input:
        "{sample}.sam"
    output:
        "{sample}.bam"
    params:
        "-b" # optional params string
    wrapper:
        "0.53.0/bio/samtools/view"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")

SICKLE

For sickle, the following wrappers are available:

SICKLE PE

Trim paired-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    r1="input_R1.fq",
    r2="input_R2.fq"
  output:
    r1="output_R1.fq",
    r2="output_R2.fq",
    rs="output_single.fq",
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    # optional log file
    "logs/sickle/job.log"
  wrapper:
    "0.53.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
    "-o {snakemake.output.r1} -p {snakemake.output.r2} "
    "-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
    "{extra}) {log}"
)
SICKLE SE

Trim single-end reads with sickle.

Software dependencies
  • sickle-trim ==1.33
Example

This wrapper can be used in the following way:

rule sickle_pe:
  input:
    "input_R1.fq"
  output:
    "output_R1.fq"
  params:
    qual_type="sanger",
    # optional extra parameters
    extra=""
  log:
    "logs/sickle/job.log"
  wrapper:
    "0.53.0/bio/sickle/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Wibowo Arindrarto
Code
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"

from snakemake.shell import shell

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()

shell(
    "(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
    "-t {snakemake.params.qual_type} {extra}) {log}"
)

SNP-MUTATOR

Generate mutated sequence files from a reference genome.

Software dependencies
  • snp-mutator ==1.2.0
Example

This wrapper can be used in the following way:

NUM_SIMULATIONS = 2

rule snpmutator:
    input:
        "{sample}.fa"
    output:
        vcf = "{sample}.mutated.vcf",
        sequences = expand(
            "{{sample}}_mutated_{simulation_number}.fasta",
            simulation_number=range(1, NUM_SIMULATIONS + 1)
        )
    params:
        num_simulations = NUM_SIMULATIONS,
        extra = " ".join([
            "--num-substitutions 2",
            "--num-insertions 2",
            "--num-deletions 0"
        ]),
    log:
        "logs/snp-mutator/test/{sample}.log"
    wrapper:
        "0.53.0/bio/snp-mutator"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Michael Hall
Code
"""Snakemake wrapper for SNP Mutator."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
from pathlib import Path

# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
num_simulations = snakemake.params.get("num_simulations", 100)
fasta_outdir = Path(snakemake.output.sequences[0]).absolute().parent
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Executed shell command
shell(
    "snpmutator {extra} "
    "--num-simulations {num_simulations} "
    "--vcf {snakemake.output.vcf} "
    "-F {fasta_outdir} "
    "{snakemake.input} {log} "
)

SNPEFF

For snpeff, the following wrappers are available:

SNPEFF

Annotate predicted effect of nucleotide changes with SnpEff

Software dependencies
  • snpeff ==4.3.1t
  • bcftools =1.10
Example

This wrapper can be used in the following way:

rule snpeff:
    input:
        calls="{sample}.vcf", # (vcf, bcf, or vcf.gz)
        db="resources/snpeff/ebola_zaire" # path to reference db downloaded with the snpeff download wrapper
    output:
        calls="snpeff/{sample}.vcf",   # annotated calls (vcf, bcf, or vcf.gz)
        stats="snpeff/{sample}.html",  # summary statistics (in HTML), optional
        csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
    log:
        "logs/snpeff/{sample}.log"
    params:
        extra="-Xmx4g"           # optional parameters (e.g., max memory 4g)
    wrapper:
        "0.53.0/bio/snpeff/annotate"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Bradford Powell
Code
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"


from snakemake.shell import shell
from os import path
import shutil
import tempfile
from pathlib import Path

outcalls = snakemake.output.calls
if outcalls.endswith(".vcf.gz"):
    outprefix = "| bcftools view -Oz"
elif outcalls.endswith(".bcf"):
    outprefix = "| bcftools view -Ob"
else:
    outprefix = ""

incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
    incalls = "< <(bcftools view {})".format(incalls)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

data_dir = Path(snakemake.input.db).parent.resolve()

stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = "" if not csvstats else "-csvStats {}".format(csvstats)
stats_opt = "-noStats" if not stats else "-stats {}".format(stats)

reference = path.basename(snakemake.input.db)

shell(
    "snpEff -dataDir {data_dir} {stats_opt} {csvstats_opt} {extra} "
    "{reference} {incalls} "
    "{outprefix} > {outcalls} {log}"
)
SNPEFF DOWNLOAD

Download snpeff DB for a given species.

Software dependencies
  • snpeff ==4.3.1t
  • bcftools =1.10
Example

This wrapper can be used in the following way:

rule snpeff_download:
    output:
        # wildcard {reference} may be anything listed in `snpeff databases`
        directory("resources/snpeff/{reference}")
    log:
        "logs/snpeff/download/{reference}.log"
    params:
        reference="{reference}"
    wrapper:
        "0.53.0/bio/snpeff/download"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path

reference = snakemake.params.reference
outdir = Path(snakemake.output[0]).parent.resolve()
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("snpEff download -dataDir {outdir} {reference} {log}")

SNPSIFT

For snpsift, the following wrappers are available:

SNPSIFT VARTYPE

Add an INFO field denoting variant type.

Software dependencies
  • snpsift =4.3.1t
Example

This wrapper can be used in the following way:

rule test_snpsift_vartype:
    input:
        vcf="in.vcf"
    output:
        vcf="annotated/out.vcf"
    message:
        "Testing SnpSift varType"
    log:
        "varType.log"
    wrapper:
        "0.53.0/bio/snpsift/varType"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for SnpSift varType"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "SnpSift varType"  # Tool and its subcommand
    " {extra}"  # Extra parameters
    " {snakemake.input.vcf}"  # Path to input vcf file
    " > {snakemake.output.vcf}"  # Path to output vcf file
    " {log}"  # Logging behaviour
)

SOURMASH

For sourmash, the following wrappers are available:

SOURMASH_COMPUTE

Build a MinHash signature for a transcriptome, genome, or reads

Software dependencies
  • sourmash==2.0.0a7
Example

This wrapper can be used in the following way:

rule sourmash_reads:
    input:
        "reads/a.fastq"
    output:
        "reads.sig"
    log:
        "logs/sourmash/sourmash_compute_reads.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.53.0/bio/sourmash/compute"


rule sourmash_transcriptome:
    input:
        "assembly/transcriptome.fasta"
    output:
        "transcriptome.sig"
    log:
        "logs/sourmash/sourmash_compute_transcriptome.log"
    threads: 2
    params:
        # optional parameters
        k = "31",
        scaled = "1000",
        extra = ""
    wrapper:
        "0.53.0/bio/sourmash/compute"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Lisa K. Johnson
Code
"""Snakemake wrapper for sourmash compute."""

__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled", "1000")
k = snakemake.params.get("k", "31")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
    " {extra} {log}"
)

SRA-TOOLS

For sra-tools, the following wrappers are available:

SRA-TOOLS FASTERQ-DUMP

Download FASTQ files from SRA.

Software dependencies
  • sra-tools >2.9.1
Example

This wrapper can be used in the following way:

rule get_fastq_pe:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/{accession}_1.fastq",
        "data/{accession}_2.fastq"
    params:
        # optional extra arguments
        extra=""
    threads: 6  # defaults to 6
    wrapper:
        "0.53.0/bio/sra-tools/fasterq-dump"


rule get_fastq_se:
    output:
        "data/{accession}.fastq"
    params:
        extra=""
    threads: 6
    wrapper:
        "0.53.0/bio/sra-tools/fasterq-dump"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Derek Croote
Code
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = "--outdir {}".format(outdir)

extra = snakemake.params.get("extra", "")

with tempfile.TemporaryDirectory() as tmp:
    shell(
        "fasterq-dump --temp {tmp} --threads {snakemake.threads} "
        "{extra} {outdir} {snakemake.wildcards.accession} {log}"
    )

STAR

For star, the following wrappers are available:

STAR

Map reads with STAR.

Software dependencies
  • star ==2.7.3a
Example

This wrapper can be used in the following way:

rule star_pe_multi:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
        # paired end reads needs to be ordered so each item in the two lists match
        fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
    output:
        # see STAR manual for additional output files
        "star/pe/{sample}/Aligned.out.sam"
    log:
        "logs/star/pe/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.53.0/bio/star/align"

rule star_se:
    input:
        fq1 = "reads/{sample}_R1.1.fastq"
    output:
        # see STAR manual for additional output files
        "star/{sample}/Aligned.out.sam"
    log:
        "logs/star/{sample}.log"
    params:
        # path to STAR reference genome index
        index="index",
        # optional parameters
        extra=""
    threads: 8
    wrapper:
        "0.53.0/bio/star/align"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Tomás Di Domenico
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


import os
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = (
    [snakemake.input.fq1]
    if isinstance(snakemake.input.fq1, str)
    else snakemake.input.fq1
)
fq2 = snakemake.input.get("fq2")
if fq2:
    fq2 = (
        [snakemake.input.fq2]
        if isinstance(snakemake.input.fq2, str)
        else snakemake.input.fq2
    )
    assert len(fq1) == len(
        fq2
    ), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])

if fq1[0].endswith(".gz"):
    readcmd = "--readFilesCommand zcat"
else:
    readcmd = ""

outprefix = os.path.dirname(snakemake.output[0]) + "/"

shell(
    "STAR "
    "{extra} "
    "--runThreadN {snakemake.threads} "
    "--genomeDir {snakemake.params.index} "
    "--readFilesIn {input_str} "
    "{readcmd} "
    "--outFileNamePrefix {outprefix} "
    "--outStd Log "
    "{log}"
)
STAR INDEX

Index fasta sequences with STAR

Software dependencies
  • star ==2.7.3a
Example

This wrapper can be used in the following way:

rule star_index:
    input:
        fasta = "{genome}.fasta"
    output:
        directory("{genome}")
    message:
        "Testing STAR index"
    threads:
        1
    params:
        extra = ""
    log:
        "logs/star_index_{genome}.log"
    wrapper:
        "0.53.0/bio/star/index"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
  • Tomás Di Domenico
Code
"""Snakemake wrapper for STAR index"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

extra = snakemake.params.get("extra", "")
sjdb_overhang = snakemake.params.get("sjdbOverhang", "100")

gtf = snakemake.input.get("gtf")
if gtf is not None:
    gtf = "--sjdbGTFfile " + gtf
    sjdb_overhang = "--sjdbOverhang " + sjdb_overhang
else:
    gtf = sjdb_overhang = ""

makedirs(snakemake.output)

shell(
    "STAR "  # Tool
    "--runMode genomeGenerate "  # Indexation mode
    "{extra} "  # Optional parameters
    "--runThreadN {snakemake.threads} "  # Number of threads
    "--genomeDir {snakemake.output} "  # Path to output
    "--genomeFastaFiles {snakemake.input.fasta} "  # Path to fasta files
    "{sjdb_overhang} "  # Read-len - 1
    "{gtf} "  # Highly recommended GTF
    "{log}"  # Logging
)

STRELKA

For strelka, the following wrappers are available:

STRELKA GERMLINE

Call germline variants with Strelka.

Software dependencies
  • strelka ==2.9.10
Example

This wrapper can be used in the following way:

rule strelka_germline:
    input:
        # the required bam file
        bam="mapped/{sample}.bam",
        # path to reference genome fasta and index
        fasta="genome.fasta",
        fasta_index="genome.fasta.fai"
    output:
        # Strelka results - either use directory or complete file path
        directory("strelka/{sample}")
    log:
        "logs/strelka/germline/{sample}.log"
    params:
        # optional parameters
        config_extra="",
        run_extra=""
    threads: 8
    wrapper:
        "0.53.0/bio/strelka/germline"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Jan Forster
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"


import os
from pathlib import Path
from snakemake.shell import shell

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

bam = snakemake.input.get("bam")  # input bam file, required
assert bam is not None, "input-> bam is a required input parameter"

if snakemake.output[0].endswith(".vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "configureStrelkaGermlineWorkflow.py "  # configure the strelka run
    "--bam {bam} "  # input bam
    "--referenceFasta {snakemake.input.fasta} "  # reference genome
    "--runDir {run_dir} "  # output directory
    "{config_extra} "  # additional parameters for the configuration
    "&& {run_dir}/runWorkflow.py "  # run the strelka workflow
    "-m local "  # run in local mode
    "-j {snakemake.threads} "  # number of threads
    "{run_extra} "  # additional parameters for the run
    "{log}"
)  # logging
STRELKA

Strelka calls somatic and germline small variants from mapped sequencing reads

Software dependencies
  • strelka ==2.9.10
Example

This wrapper can be used in the following way:

rule strelka:
    input:
        # The normal bam and its index
        # are optional input
        # normal = "data/b.bam",
        # normal_index = "data/b.bam.bai"
        tumor = "data/{tumor}.bam",
        tumor_index = "data/{tumor}.bam.bai",
        fasta = "data/genome.fasta",
        fasta_index = "data/genome.fasta.fai"
    output:
        # Strelka output - can be directory or full file path
        directory("{tumor}_vcf")
    threads:
        1
    params:
        run_extra = "",
        config_extra = ""
    log:
        "logs/strelka_{tumor}.log"
    wrapper:
        "0.53.0/bio/strelka/somatic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Strelka"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

from pathlib import Path
from snakemake.shell import shell
from snakemake.utils import makedirs

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")

# If a normal bam is given in input,
# then it should be provided in the input
# block, so Snakemake will perform additional
# tests on file existance.
normal = (
    "--normalBam {}".format(snakemake.input["normal"])
    if "normalBam" in snakemake.input.keys()
    else ""
)

if snakemake.output[0].endswith("vcf.gz"):
    run_dir = Path(snakemake.output[0]).parents[2]
else:
    run_dir = snakemake.output

shell(
    "(configureStrelkaSomaticWorkflow.py "  # Configuration script
    "{normal} "  # Path to normal bam (if any)
    "--tumorBam {snakemake.input.tumor} "  # Path to tumor bam
    "--referenceFasta {snakemake.input.fasta} "  # Path to fasta file
    "--runDir {run_dir} "  # Path to output directory
    "{config_extra} "  # Extra parametersfor configuration
    " && "
    "{run_dir}/runWorkflow.py "  # Run the pipeline
    "--mode local "  # Stop internal job submission
    "--jobs {snakemake.threads} "  # Nomber of threads
    "{run_extra}) "  # Extra parameters for runWorkflow
    "{log}"  # Logging behaviour
)

TABIX

Process given file with tabix (e.g., create index).

Software dependencies
  • htslib ==1.10
Example

This wrapper can be used in the following way:

rule tabix:
    input:
        "{prefix}.vcf.gz"
    output:
        "{prefix}.vcf.gz.tbi"
    params:
        # pass arguments to tabix (e.g. index a vcf)
        "-p vcf"
    log:
        "logs/tabix/{prefix}.log"
    wrapper:
        "0.53.0/bio/tabix"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell("tabix {snakemake.params} {snakemake.input[0]} {log}")

TRANSDECODER

For transdecoder, the following wrappers are available:

TRANSDECODER LONGORFS

TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.

Software dependencies
  • transdecoder=5.5.0
Example

This wrapper can be used in the following way:

rule transdecoder_longorfs:
    input:
        fasta="test.fa.gz", # required
        gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
    output:
        "test.fa.transdecoder_dir/longest_orfs.pep"
    log:
        "logs/transdecoder/test-longorfs.log"
    params:
        extra=""
    wrapper:
        "0.53.0/bio/transdecoder/longorfs"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder LongOrfs"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
    gtm_cmd = " --gene_trans_map " + gtm

output_dir = path.dirname(str(snakemake.output))

# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")
TRANSDECODER PREDICT

Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.

Software dependencies
  • transdecoder=5.5.0
Example

This wrapper can be used in the following way:

rule transdecoder_predict:
    input:
        fasta="test.fa.gz", # required input; optionally gzipped
        pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
        blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
        # you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
        #longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
    output:
        "test.fa.transdecoder.bed",
        "test.fa.transdecoder.cds",
        "test.fa.transdecoder.pep",
        "test.fa.transdecoder.gff3"
    log:
        "logs/transdecoder/test-predict.log"
    params:
        extra=""
    wrapper:
        "0.53.0/bio/transdecoder/predict"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
    1. Tessa Pierce
Code
"""Snakemake wrapper for Transdecoder Predict"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
    addl_outputs += " --retain_pfam_hits " + pfam

blast = snakemake.input.get("blastp_hits", "")
if blast:
    addl_outputs += " --retain_blastp_hits " + blast

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")

TRIM_GALORE

For trim_galore, the following wrappers are available:

TRIM_GALORE-PE

Trim paired-end reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.5
Example

This wrapper can be used in the following way:

rule trim_galore_pe:
    input:
        ["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
    output:
        "trimmed/{sample}.1_val_1.fq.gz",
         "trimmed/{sample}.1.fastq.gz_trimming_report.txt",
         "trimmed/{sample}.2_val_2.fq.gz",
         "trimmed/{sample}.2.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.53.0/bio/trim_galore/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " --paired"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)
TRIM_GALORE-SE

Trim unpaired reads using trim_galore.

Software dependencies
  • trim-galore ==0.4.3
Example

This wrapper can be used in the following way:

rule trim_galore_se:
    input:
        "reads/{sample}.fastq.gz"
    output:
        "trimmed/{sample}_trimmed.fq.gz",
         "trimmed/{sample}.fastq.gz_trimming_report.txt"
    params:
        extra="--illumina -q 20"
    log:
        "logs/trim_galore/{sample}.log"
    wrapper:
        "0.53.0/bio/trim_galore/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes
  • It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
  • All output files must be placed in the same directory.
Authors
  • Kerrin Mendler
Code
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""

__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import os.path


log = snakemake.log_fmt_shell()

# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
    raise ValueError(
        "The trim_galore Snakemake wrapper cannot "
        "be run with the `--fastqc` flag. Please "
        "remove the flag from extra params. "
        "You can use the fastqc Snakemake wrapper on "
        "the input and output files instead."
    )

# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m

# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
    assert out_dir == os.path.dirname(file_path), (
        "trim_galore can only output files to a single directory."
        " Please indicate only one directory for the output files."
    )

shell(
    "(trim_galore"
    " {snakemake.params.extra}"
    " -o {out_dir}"
    " {snakemake.input})"
    " {log}"
)

TRIMMOMATIC

For trimmomatic, the following wrappers are available:

TRIMMOMATIC PE

Trim paired-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic_pe:
    input:
        r1="reads/{sample}.1.fastq.gz",
        r2="reads/{sample}.2.fastq.gz"
    output:
        r1="trimmed/{sample}.1.fastq.gz",
        r2="trimmed/{sample}.2.fastq.gz",
        # reads where trimming entirely removed the mate
        r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
        r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        compression_level="-9"
    threads:
        32
    wrapper:
        "0.53.0/bio/trimmomatic/pe"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/pe

Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
    gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
    snakemake.output.r1,
    snakemake.output.r1_unpaired,
    snakemake.output.r2,
    snakemake.output.r2_unpaired,
]

trimmomatic_threads, input_threads, output_threads = distribute_threads(
    input_files, output_files, snakemake.threads
)

input_r1, input_r2 = [
    compose_input_gz(filename, input_threads) for filename in input_files
]

output_r1, output_r1_unp, output_r2, output_r2_unp = [
    compose_output_gz(filename, output_threads, compression_level)
    for filename in output_files
]

shell(
    "trimmomatic PE -threads {trimmomatic_threads} {extra} "
    "{input_r1} {input_r2} "
    "{output_r1} {output_r1_unp} "
    "{output_r2} {output_r2_unp} "
    "{trimmer} "
    "{log}"
)
TRIMMOMATIC SE

Trim single-end reads with trimmomatic. (De)compress with pigz.

Software dependencies
  • trimmomatic ==0.36
  • pigz ==2.3.4
Example

This wrapper can be used in the following way:

rule trimmomatic:
    input:
        "reads/{sample}.fastq.gz"  # input and output can be uncompressed or compressed
    output:
        "trimmed/{sample}.fastq.gz"
    log:
        "logs/trimmomatic/{sample}.log"
    params:
        # list of trimmers (see manual)
        trimmer=["TRAILING:3"],
        # optional parameters
        extra="",
        # optional compression levels from -0 to -9 and -11
        compression_level="-9"
    threads:
        32
    wrapper:
        "0.53.0/bio/trimmomatic/se"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Johannes Köster
  • Jorge Langa
Code
"""
bio/trimmomatic/se

Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""

__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell


# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_file, output_file, available_threads):
    gzipped_input_files = 1 if input_file.endswith(".gz") else 0
    gzipped_output_files = 1 if output_file.endswith(".gz") else 0
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
trimmomatic_threads, input_threads, output_threads = distribute_threads(
    snakemake.input[0], snakemake.output[0], snakemake.threads
)

# Collect files
input = compose_input_gz(snakemake.input[0], input_threads)
output = compose_output_gz(snakemake.output[0], output_threads, compression_level)

shell(
    "trimmomatic SE -threads {trimmomatic_threads} {extra} {input} {output} {trimmer} {log}"
)

TRINITY

Generate transcriptome assembly with Trinity

Software dependencies
  • trinity ==2.8.4
Example

This wrapper can be used in the following way:

rule trinity:
    input:
        left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
        right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
    output:
        "trinity_out_dir/Trinity.fasta"
    log:
        'logs/trinity/trinity.log'
    params:
        extra=""
    threads: 4
    wrapper:
        "0.53.0/bio/trinity"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Tessa Pierce
Code
"""Snakemake wrapper for Trinity."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
max_memory = snakemake.params.get("max_memory", "10G")

# allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = (
    [snakemake.input.left]
    if isinstance(snakemake.input.left, str)
    else snakemake.input.left
)
right = snakemake.input.get("right")
if right:
    right = (
        [snakemake.input.right]
        if isinstance(snakemake.input.right, str)
        else snakemake.input.right
    )
    assert len(left) >= len(
        right
    ), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
    input_str_left = " --left " + ",".join(left)
    input_str_right = " --right " + ",".join(right)
else:
    input_str_left = " --single " + ",".join(left)
    input_str_right = ""

input_cmd = " ".join([input_str_left, input_str_right])

# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
    if "fq" in left[0] or "fastq" in left[0]:
        seqtype = "fq"
    elif "fa" in left[0] or "fasta" in left[0]:
        seqtype = "fa"
    else:  # assertion is redundant - warning or error instead?
        assert (
            seqtype is not None
        ), "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"

outdir = path.dirname(snakemake.output[0])
assert "trinity" in outdir, "output directory name must contain 'trinity'"

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "Trinity {input_cmd} --CPU {snakemake.threads} "
    " --max_memory {max_memory} --seqType {seqtype} "
    " --output {outdir} {snakemake.params.extra} "
    " {log}"
)

TXIMPORT

Import and summarize transcript-level estimates for both transcript-level and gene-level analysis.

Software dependencies
  • bioconductor-tximport==1.14.0
  • r-readr==1.3.1
  • r-jsonlite==1.6
Example

This wrapper can be used in the following way:

rule tximport:
    input:
        quant = expand("quant/A/quant.sf")
        # Optional transcript/gene links as described in tximport
        # tx2gene = /path/to/tx2gene
    output:
        txi = "txi.RDS"
    params:
        extra = "type='salmon', txOut=TRUE"
    wrapper:
        "0.53.0/bio/tximport"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Add any tximport options in the params, they will be transmitted through the R wrapper. Supplementary options will cause unknown parameters error.

Authors
  • Thibault Dayris
Code
#!/bin/R

# Loading library
base::library("tximport");   # Perform actual count importation in R
base::library("readr");      # Read faster!
base::library("jsonlite");   # Importing inferential replicates

# Cast input paths as character to avoid errors
samples_paths <- sapply(               # Sequentially apply
  snakemake@input[["quant"]],          # ... to all quantification paths
  function(quant) as.character(quant)  # ... a cast as character
);

# Collapse path into a character vector
samples_paths <- base::paste0(samples_paths, collapse = '", "');

# Building function arguments
extra <- base::paste0('files = c("', samples_paths, '")');

# Check if user provided optional transcript to gene table
if ("tx_to_gene" %in% names(snakemake@input)) {
  tx2gene <- readr::read_tsv(snakemake@input[["tx_to_gene"]]);
  extra <- base::paste(
    extra,                 # Foreward existing arguments
    ", tx2gene = ",        # Argument name
    "tx2gene"              # Add tx2gene to parameters
  );
}

# Add user defined arguments
if ("extra" %in% names(snakemake@params)) {
  if (snakemake@params[["extra"]] != "") {
    extra <- base::paste(
      extra,                       # Foreward existing parameters
      snakemake@params[["extra"]], # Add user parameters
      sep = ", "                   # Field separator
    );
  }
}


print(extra);
# Perform tximport work
txi <- base::eval(                        # Evaluate the following
  base::parse(                            # ... parsed expression
    text = base::paste0(
      "tximport::tximport(", extra, ");"  # ... of tximport and its arguments
    )
  )
);

# Save results
base::saveRDS(                       # Save R object
  object = txi,                      # The txi object
  file = snakemake@output[["txi"]]   # Output path is provided by Snakemake
);

UCSC

For ucsc, the following wrappers are available:

BEDGRAPHTOBIGWIG

Convert *.bedGraph file to *.bw file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-bedgraphtobigwig == 377
Example

This wrapper can be used in the following way:

rule bedGraphToBigWig:
    input:
        bedGraph="{sample}.bedGraph",
        chromsizes="genome.chrom.sizes"
    output:
        "{sample}.bw"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/ucsc/bedGraphToBigWig"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.bedGraph to *.bw conversion using UCSC bedGraphToBigWig tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell(
    "bedGraphToBigWig {extra}"
    " {snakemake.input.bedGraph} {snakemake.input.chromsizes}"
    " {snakemake.output} {log}"
)
FATOTWOBIT

Convert *.fa file to *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-fatotwobit == 377
Example

This wrapper can be used in the following way:

# Example: from *.fa file
rule faToTwoBit_fa:
    input:
        "{sample}.fa"
    output:
        "{sample}.2bit"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/ucsc/faToTwoBit"

# Example: from *.fa.gz file
rule faToTwoBit_fa_gz:
    input:
        "{sample}.fa.gz"
    output:
        "{sample}.2bit"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/ucsc/faToTwoBit"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC faToTwoBit tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("faToTwoBit {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITINFO

Generate *.chorom.sizes file by *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-twobitinfo == 377
Example

This wrapper can be used in the following way:

rule twoBitInfo:
    input:
        "{sample}.2bit"
    output:
        "{sample}.chrom.sizes"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/ucsc/twoBitInfo"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitInfo tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitInfo {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITTOFA

Convert *.2bit file to *.fa file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)

Software dependencies
  • ucsc-twobittofa == 377
Example

This wrapper can be used in the following way:

rule twoBitToFa:
    input:
        "{sample}.2bit"
    output:
        "{sample}.fa"
    params:
        "" # optional params string
    wrapper:
        "0.53.0/bio/ucsc/twoBitToFa"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Roman Cherniatchik
Code
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitToFa tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

shell("twoBitToFa {extra} {snakemake.input} {snakemake.output} {log}")

UMIS

For umis, the following wrappers are available:

UMIS BAMTAG

Convert a BAM/SAM with fastqtransformed read names to have UMI and

Software dependencies
  • umis ==1.0.3
  • samtools ==1.9
Example

This wrapper can be used in the following way:

rule umis_bamtag:
    input:
        "data/{sample}.bam"
    output:
        "data/{sample}.annotated.bam"
    log:
        "logs/umis/bamtag/{sample}.log"
    params:
        extra=""
    threads: 1
    wrapper:
        "0.53.0/bio/umis/bamtag"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

bam_input = snakemake.input[0]

if bam_input is None:
    raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
    raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")

output_file = snakemake.output[0]

if output_file is None:
    raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
    raise ValueError("Only expecting one output file: " + str(output_file) + "!")

in_pipe = ""
if bam_input.endswith(".sam"):
    in_pipe = "cat "
else:
    in_pipe = "samtools view -h "

out_pipe = ""
if not output_file.endswith(".sam"):
    out_pipe = " | samtools view -S -b - "

shell(
    " {in_pipe} {bam_input} | " " umis bamtag -" " {out_pipe} > {output_file}" " {log}"
)

VARSCAN

For varscan, the following wrappers are available:

VARSCAN MPILEUP2INDEL

Detect indel in NGS data from mpileup files

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling Indel with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "0.53.0/bio/varscan/mpileup2indel"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2indel"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2indel "  # Tool and its subprocess
    "{extra} "  # Extra parameters
    "<( {pileup} ) "
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN MPILEUP2SNP

Detect variants in NGS data from Samtools mpileup

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule mpileup_to_vcf:
    input:
        "mpileup/{sample}.mpileup.gz"
    output:
        "vcf/{sample}.vcf"
    message:
        "Calling SNP with Varscan2"
    threads:  # Varscan does not take any threading information
        1     # However, mpileup might have to be unzipped.
              # Keep threading value to one for unzipped mpileup input
              # Set it to two for zipped mipileup files
    log:
        "logs/varscan_{sample}.log"
    wrapper:
        "0.53.0/bio/varscan/mpileup2snp"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.

If so, it’s recommended to use two threads:

  • 1 for varscan itself
  • 1 for zcat
Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for Varscan2 mpileup2snp"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"

import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs

# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
    " cat {} ".format(snakemake.input[0])
    if not snakemake.input[0].endswith("gz")
    else " zcat {} ".format(snakemake.input[0])
)

# Building output directories
makedirs(op.dirname(snakemake.output[0]))

shell(
    "varscan mpileup2snp "  # Tool and its subprocess
    "{extra} "  # Extra parameters
    "<( {pileup} ) "
    "> {snakemake.output[0]} "  # Path to vcf file
    "{log}"  # Logging behaviour
)
VARSCAN SOMATIC

Varscan Somatic calls variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.

Software dependencies
  • varscan ==2.4.3
Example

This wrapper can be used in the following way:

rule varscan_somatic:
    input:
        # A pair of pileup files can be used *instead* of the mpileup
        # normal_pileup = ""
        # tumor_pileup = ""
        mpileup = "mpileup/{sample}.mpileup.gz"
    output:
        snp = "vcf/{sample}.snp.vcf",
        indel = "vcf/{sample}.indel.vcf"
    message:
        "Calling somatic variants {wildcards.sample}"
    threads:
        1
    params:
        extra = ""
    wrapper:
        "0.53.0/bio/varscan/somatic"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Thibault Dayris
Code
"""Snakemake wrapper for varscan somatic"""

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"


import os.path as op

from snakemake.shell import shell
from snakemake.utils import makedirs

# Defining logging and gathering extra parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

# Building output dirs
makedirs(op.dirname(snakemake.output.snp))
makedirs(op.dirname(snakemake.output.indel))

# Output prefix
prefix = op.splitext(snakemake.output.snp)[0]

# Searching for input files
pileup_pair = ["normal_pileup", "tumor_pileup"]

in_pileup = ""
mpileup = ""
if "mpileup" in snakemake.input.keys():
    # Case there is a mpileup with both normal and tumor
    in_pileup = snakemake.input.mpileup
    mpileup = "--mpileup 1"
elif all(pileup in snakemake.input.keys() for pileup in pileup_pair):
    # Case there are two separate pileup files
    in_pileup = " {snakemake.input.normal_pileup}" " {snakemakeinput.tumor_pileup} "
else:
    raise KeyError("Could not find either a mpileup, or a pair of pileup files")

shell(
    "varscan somatic"  # Tool and its subcommand
    " {in_pileup}"  # Path to input file(s)
    " {prefix}"  # Path to output
    " {extra}"  # Extra parameters
    " {mpileup}"
    " --output-snp {snakemake.output.snp}"  # Path to snp output file
    " --output-indel {snakemake.output.indel}"  # Path to indel output file
)

VCFTOOLS

For vcftools, the following wrappers are available:

VCFTOOLS FILTER

Filter vcf files using vcftools

Software dependencies
  • vcftools ==0.1.16
Example

This wrapper can be used in the following way:

rule filter_vcf:
    input:
        "{sample}.vcf"
    output:
        "{sample}.filtered.vcf"
    params:
        extra="--chr 1 --recode-INFO-all"
    wrapper:
        "0.53.0/bio/vcftools/filter"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors
  • Patrik Smeds
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell

input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
    input_flag = "--gzvcf"

output = " > " + snakemake.output[0]
if output.endswith(".gz"):
    output = " | gzip" + output

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

extra = snakemake.params.get("extra", "")

shell(
    "vcftools "
    "{input_flag} "
    "{snakemake.input} "
    "{extra} "
    "--recode "
    "--stdout "
    "{output} "
    "{log}"
)