The Snakemake Wrappers repository¶
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Usage¶
The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.2.0/bio/samtools/sort"
Here, Snakemake will automatically download the corresponding wrapper from https://bitbucket.org/snakemake/snakemake-wrappers/src/0.2.0/bio/samtools/sort/wrapper.py. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id (see here). This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://
.
Each wrapper defines required software packages and versions. In combination with the --use-conda
flag of Snakemake, these will be deployed automatically.
Contribute¶
We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:
- fork the repository
- develop your contribution
- perform a pull request
The pull request will be reviewed and included as fast as possible. Thereby, contributions should follow the coding style of the already present examples, i.e.
- provide a meta.yaml with name, description and author of the wrapper,
- provide an environment.yaml which lists all required software packages (the packages shall be available via https://anaconda.org),
- provide an example Snakefile that shows how to use the wrapper,
- follow the python style guide,
- use 4 spaces for indentation.
BCFTOOLS¶
Wrappers¶
BCFTOOLS CALL¶
Call variants with bcftools.
Software dependencies¶
- samtools ==1.5
- bcftools ==1.5
Example¶
This wrapper can be used in the following way:
rule bcftools_call:
input:
ref="genome.fasta",
samples=expand("mapped/{sample}.sorted.bam", sample=config["samples"]),
indexes=expand("mapped/{sample}.sorted.bam.bai", sample=config["samples"])
output:
# Here, we optionally use a region as wildcard and constrain it to the
# format accepted by samtools mpileup.
"called/{region,.+(:[0-9]+-[0-9]+)?}.bcf"
params:
# Optional parameters for samtools mpileup (except -g, -f).
# In this example, we forward the region wildcard from the output file to mpileup.
mpileup="--region {region}",
# Optional parameters for bcftools call (except -v, -o, -m).
call=""
log:
"logs/bcftools_call/{region}.log"
wrapper:
"0.27.1/bio/bcftools/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"(samtools mpileup {snakemake.params.mpileup} {snakemake.input.samples} "
"--fasta-ref {snakemake.input.ref} --BCF --uncompressed | "
"bcftools call -m {snakemake.params.call} -o {snakemake.output[0]} -v -) 2> {snakemake.log}")
BCFTOOLS CONCAT¶
Concatenate vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.6
Example¶
This wrapper can be used in the following way:
rule bcftools_concat:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.27.1/bio/bcftools/concat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools concat {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}")
BCFTOOLS MERGE¶
Merge vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.6
Example¶
This wrapper can be used in the following way:
rule bcftools_merge:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.27.1/bio/bcftools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools merge {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}")
BCFTOOLS VIEW¶
View vcf/bcf file in a different format.
Software dependencies¶
- bcftools ==1.5
Example¶
This wrapper can be used in the following way:
rule bcf_to_vcf:
input:
"{prefix}.bcf"
output:
"{prefix}.vcf"
params:
"" # optional parameters for bcftools view (except -o)
wrapper:
"0.27.1/bio/bcftools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools view {snakemake.params} {snakemake.input[0]} "
"-o {snakemake.output[0]}")
BOWTIE2¶
Wrappers¶
BOWTIE2¶
Map reads with bowtie2.
Software dependencies¶
- bowtie2 ==2.3.2
- samtools ==1.5
Example¶
This wrapper can be used in the following way:
rule bowtie2:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bowtie2/{sample}.log"
params:
index="index/genome", # prefix of reference genome index (built with bowtie2-build)
extra="" # optional parameters
threads: 8
wrapper:
"0.27.1/bio/bowtie2/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert n == 1 or n == 2, "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "-U {}".format(*snakemake.input.sample)
else:
reads = "-1 {} -2 {}".format(*snakemake.input.sample)
shell(
"(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
"-x {snakemake.params.index} {reads} "
"| samtools view -Sbh -o {snakemake.output[0]} -) {log}")
BUSCO¶
Assess assembly and annotation completeness with BUSCO
Software dependencies¶
- busco ==3.0.2
Example¶
This wrapper can be used in the following way:
rule run_busco:
input:
"transcriptome.fasta"
output:
"run_txome_busco/full_table_txome_busco.tsv",
log:
"logs/quality/transcriptome_busco.log"
threads: 8
params:
mode="transcriptome",
lineage_path="metazoa_test",
# optional parameters
extra=""
wrapper:
"0.27.1/bio/busco"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Index."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"
out_name = path.dirname(snakemake.output[0])
assert '/' not in out_name, "out name cannot be path"
assert out_name.startswith('run_'), " out name must start with run_"
out_name = out_name.split('run_')[1] # busco adds "run_" automatically
#note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell("run_busco --in {snakemake.input} --out {out_name} --force "
" --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
" {extra} {log}" )
BWA¶
Wrappers¶
BWA ALN¶
Map reads with bwa aln.
Software dependencies¶
- bwa ==0.7.15
Example¶
This wrapper can be used in the following way:
rule bwa_aln:
input:
"reads/{sample}.{pair}.fastq"
output:
"sai/{sample}.{pair}.sai"
params:
index="genome",
extra=""
log:
"logs/bwa_aln/{sample}.{pair}.log"
threads: 8
wrapper:
"0.27.1/bio/bwa/aln"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa aln."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"bwa aln"
" {extra}"
" -t {snakemake.threads}"
" {snakemake.params.index}"
" {snakemake.input[0]}"
" > {snakemake.output[0]} {log}")
BWA INDEX¶
Creates a BWA index.
Software dependencies¶
- bwa ==0.7.15
Example¶
This wrapper can be used in the following way:
rule bwa_index:
input:
"{genome}.fasta"
output:
"{genome}.amb",
"{genome}.ann",
"{genome}.bwt",
"{genome}.pac",
"{genome}.sa"
log:
"logs/bwa_index/{genome}.log"
params:
prefix="{genome}",
algorithm="bwtsw"
wrapper:
"0.27.1/bio/bwa/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
#Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
raise ValueError("Only one reference genome can be inputed!")
#Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")
if len(prefix) > 0:
prefix = "-p " + prefix
#Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")
if len(construction_algorithm) != 0:
construction_algorithm = "-a " + construction_algorithm
shell(
"bwa index"
" {prefix}"
" {construction_algorithm}"
" {snakemake.input[0]}"
" {log}")
BWA MEM¶
Map reads using bwa mem, with optional sorting using samtools or picard.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.5
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bwa_mem/{sample}.log"
params:
index="genome",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="" # Extra args for samtools/picard.
threads: 8
wrapper:
"0.27.1/bio/bwa/mem"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {1, 2}:
raise ValueError("input must have 1 (single-end) or "
"2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}")
BWA SAMPE¶
Map paired-end reads with bwa sampe.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.3
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_sampe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_sampe/{sample}.log"
wrapper:
"0.27.1/bio/bwa/sampe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Check inputs.
if not len(snakemake.input.sai) == 2:
raise ValueError('input.sai must have 2 elements')
if not len(snakemake.input.fastq) == 2:
raise ValueError('input.fastq must have 2 elements')
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa sampe"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}")
BWA SAMSE¶
Map single-end reads with bwa samse.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.3
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_samse:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai"
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_samse/{sample}.log"
wrapper:
"0.27.1/bio/bwa/samse"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa samse"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}")
CAIROSVG¶
Convert SVG files with cairosvg.
Software dependencies¶
- cairosvg ==2.0.0rc6
Example¶
This wrapper can be used in the following way:
rule:
input:
"{prefix}.svg"
output:
"{prefix}.{fmt,(pdf|png)}"
wrapper:
"0.27.1/utils/cairosvg"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
_, ext = os.path.splitext(snakemake.output[0])
if ext not in (".png", ".pdf", ".ps", ".svg"):
raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]
shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")
CUTADAPT¶
Wrappers¶
CUTADAPT-PE¶
Trim paired-end reads using cutadapt.
Software dependencies¶
- cutadapt ==1.13
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
fastq1="trimmed/{sample}.1.fastq",
fastq2="trimmed/{sample}.2.fastq",
qc="trimmed/{sample}.qc.txt"
params:
"-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.27.1/bio/cutadapt/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params}"
" -o {snakemake.output.fastq1}"
" -p {snakemake.output.fastq2}"
" {snakemake.input}"
" > {snakemake.output.qc} {log}")
CUTADAPT-SE¶
Trim single-end reads using cutadapt.
Software dependencies¶
- cutadapt ==1.13
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
"reads/{sample}.fastq"
output:
fastq="trimmed/{sample}.fastq",
qc="trimmed/{sample}.qc.txt"
params:
"-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.27.1/bio/cutadapt/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params}"
" -o {snakemake.output.fastq}"
" {snakemake.input[0]}"
" > {snakemake.output.qc} {log}")
DELLY¶
Call variants with delly.
Software dependencies¶
- delly ==0.7.8
Example¶
This wrapper can be used in the following way:
rule delly:
input:
ref="genome.fasta",
samples=["mapped/a.bam"],
# optional exclude template (see https://github.com/dellytools/delly)
exclude="human.hg19.excl.tsv"
output:
"sv/calls.bcf"
params:
extra="" # optional parameters for delly (except -g, -x)
log:
"logs/delly.log"
threads: 2 # It is best to use as many threads as samples
wrapper:
"0.27.1/bio/delly"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
try:
exclude = "-x " + snakemake.input.exclude
except AttributeError:
exclude = ""
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
"{exclude} -g {snakemake.input.ref} "
"-o {snakemake.output[0]} {snakemake.input.samples} {log}")
EPIC¶
Wrappers¶
EPIC¶
Find broad enriched domains in ChIP-Seq data with epic
Software dependencies¶
- epic =0.2.7
- pandas =0.22.0
Example¶
This wrapper can be used in the following way:
rule epic:
input:
treatment = "bed/test.bed",
background = "bed/control.bed"
output:
enriched_regions = "epic/enriched_regions.csv", # required
bed = "epic/enriched_regions.bed", # optional
matrix = "epic/matrix.gz" # optional
log:
"logs/epic/epic.log"
params:
genome = "hg19", # optional, default hg19
extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
threads: 1 # optional, defaults to 1
wrapper:
"0.27.1/bio/epic/peaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All/any of the different bigwig options must be given as extra parameters
Authors¶
- Endre Bakken Stovner
Code¶
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1
treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")
# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")
bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")
if len(snakemake.log) > 0:
log = snakemake.log[0]
genome = snakemake.params.get("genome")
cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"
if bed:
cmd += " -b {bed}"
if matrix:
cmd += " -sm {matrix}"
if log:
cmd += " -l {log}"
cmd += " {extra}"
shell(cmd)
FASTQ_SCREEN¶
fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE2
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE
DATABASE hg19 /data/hg19/Bowtie2Index/genome BOWTIE2
DATABASE mm10 /data/mm10/Bowtie2Index/genome BOWTIE2
BOWTIE /path/to/bowtie
BOWTIE2 /path/to/bowtie2
becomes:
fastq_screen_config = {
'database': {
'ecoli': {
'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
'hg19': {
'bowtie2': '/data/hg19/Bowtie2Index/genome'},
'mm10': {
'bowtie2': '/data/mm10/Bowtie2Index/genome'}
},
'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}
By default, the wrapper will use bowtie2 as the aligner and a subset of 100000
reads. These can be overridden using params.aligner
and params.subset
respectively. Furthermore, params.extra can be used to pass additional
arguments verbatim to fastq_screen
, for example extra="--illumina1_3"
or
extra="--bowtie2 '--trim5=8'"
.
Software dependencies¶
- fastq-screen ==0.5.2
- bowtie2 ==2.2.6
- bowtie ==1.1.2
Example¶
This wrapper can be used in the following way:
rule fastq_screen:
input:
"samples/{sample}.fastq.gz"
output:
txt="qc/{sample}.fastq_screen.txt",
png="qc/{sample}.fastq_screen.png"
params:
fastq_screen_config=fastq_screen_config,
subset=100000,
aligner='bowtie2'
threads: 8
wrapper:
"0.27.1/bio/fastq_screen"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
fastq_screen
hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.- While the dictionary form of
fastq_screen_config
is convenient, the unordered nature of the dictionary may causesnakemake --list-params-changed
to incorrectly report changed parameters even though the contents remain the same. If you plan on using--list-params-changed
then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6. - When providing the dictionary form of
fastq_screen_config
, the wrapper will write a temp file using Python’stempfile
module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something likeshell.prefix("export TMPDIR=/scratch; ")
to the snakefile calling this wrapper.
Authors¶
- Ryan Dale
Code¶
import os
from snakemake.shell import shell
import tempfile
__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"
_config = snakemake.params['fastq_screen_config']
subset = snakemake.params.get('subset', 100000)
aligner = snakemake.params.get('aligner', 'bowtie2')
extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell()
# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
tmp = tempfile.NamedTemporaryFile(delete=False).name
with open(tmp, 'w') as fout:
for label, indexes in _config['database'].items():
for aligner, index in indexes.items():
fout.write('\t'.join([
'DATABASE', label, index, aligner.upper()]) + '\n')
for aligner, path in _config['aligner_paths'].items():
fout.write('\t'.join([aligner.upper(), path]) + '\n')
config_file = tmp
else:
config_file = _config
# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = os.path.basename(snakemake.input[0].split('.fastq')[0])
tempdir = tempfile.mkdtemp()
shell(
"fastq_screen --outdir {tempdir} "
"--force "
"--aligner {aligner} "
"--conf {config_file} "
"--subset {subset} "
"--threads {snakemake.threads} "
"{extra} "
"{snakemake.input[0]} "
"{log}"
)
# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")
# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
shell("rm {tmp}")
FASTQC¶
Generate fastq qc statistics using fastqc.
Software dependencies¶
- fastqc ==0.11.7
Example¶
This wrapper can be used in the following way:
rule fastqc:
input:
"reads/{sample}.fastq"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}.zip"
params: ""
log:
"logs/fastqc/{sample}.log"
wrapper:
"0.27.1/bio/fastqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for fastqc."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = path.basename(file_path)
split_ind = 2 if base.endswith(".gz") else 1
base = ".".join(base.split(".")[:-split_ind])
return base
# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
shell("fastqc {snakemake.params} --quiet "
"--outdir {tempdir} {snakemake.input[0]}"
" {log}")
# Move outputs into proper position.
output_base = basename_without_ext(snakemake.input[0])
html_path = path.join(tempdir, output_base + "_fastqc.html")
zip_path = path.join(tempdir, output_base + "_fastqc.zip")
if snakemake.output.html != html_path:
shell("mv {html_path} {snakemake.output.html}")
if snakemake.output.zip != zip_path:
shell("mv {zip_path} {snakemake.output.zip}")
FREEBAYES¶
Call small genomic variants with freebayes.
Software dependencies¶
- freebayes ==1.1.0
- bcftools ==1.5
- parallel ==20170422
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
ref="genome.fasta",
# you can have a list of samples here
samples="mapped/{sample}.bam"
output:
"calls/{sample}.vcf" # either .vcf or .bcf
log:
"logs/freebayes/{sample}.log"
params:
extra="", # optional parameters
chunksize=100000 # reference genome chunk size for parallelization (default: 100000)
threads: 2
wrapper:
"0.27.1/bio/freebayes"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = snakemake.params.get("extra", "")
pipe = ""
if snakemake.output[0].endswith(".bcf"):
pipe = "| bcftools view -Ob -"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
freebayes = ("freebayes-parallel <(fasta_generate_regions.py "
"{snakemake.input.ref}.fai {chunksize}) "
"{snakemake.threads}").format(snakemake=snakemake,
chunksize=chunksize)
shell("({freebayes} {params} -f {snakemake.input.ref}"
" {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}")
GATK¶
Wrappers¶
GATK BASERECALIBRATOR¶
Run gatk BaseRecalibrator and ApplyBQSR in one step.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_bqsr:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
known="dbsnp.vcf.gz"
output:
bam="recal/{sample}.bam"
log:
"logs/gatk/bqsr/{sample}.log"
params:
extra="", # optional
wrapper:
"0.27.1/bio/gatk/baserecalibrator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from tempfile import TemporaryDirectory
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
with TemporaryDirectory() as tmpdir:
recal_table = os.path.join(tmpdir, "recal_table.grp")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk BaseRecalibrator {extra} "
"-R {snakemake.input.ref} -I {snakemake.input.bam} "
"-O {recal_table} --known-sites {snakemake.input.known} {log}")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell("gatk ApplyBQSR -R {snakemake.input.ref} -I {snakemake.input.bam} "
"--bqsr-recal-file {recal_table} "
"-O {snakemake.output.bam} {log}")
GATK COMBINEGVCFS¶
Run gatk CombineGVCFs.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
ref="genome.fasta"
output:
gvcf="calls/all.g.vcf",
log:
"logs/gatk/combinegvcfs.log"
params:
extra="", # optional
wrapper:
"0.27.1/bio/gatk/combinegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk CombineGVCFs {extra} "
"{gvcfs} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.gvcf} {log}")
GATK GENOTYPEGVCFS¶
Run gatk GenotypeGVCFs.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcf="calls/all.g.vcf", # combined gvcf over multiple samples
ref="genome.fasta"
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
wrapper:
"0.27.1/bio/gatk/genotypegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk GenotypeGVCFs {extra} "
"-V {snakemake.input.gvcf} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.vcf} {log}")
GATK HAPLOTYPECALLER¶
Run gatk HaplotypeCaller.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule haplotype_caller:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
ref="genome.fasta"
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
log:
"logs/gatk/haplotypecaller/{sample}.log"
params:
extra="", # optional
wrapper:
"0.27.1/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
known = snakemake.input.get("known", "")
if known:
known = "--dbsnp " + known
extra = snakemake.params.get("extra", "")
bams = snakemake.input.bam
if isinstance(bams, str):
bams = [bams]
bams = list(map("-I {}".format, bams))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk HaplotypeCaller {extra} "
"-R {snakemake.input.ref} {bams} "
"-ERC GVCF "
"-O {snakemake.output.gvcf} {known} {log}")
GATK SELECTVARIANTS¶
Run gatk SelectVariants.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_select:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.vcf"
log:
"logs/gatk/select/snvs.log"
params:
extra="--select-type-to-include SNP", # optional filter arguments, see GATK docs
wrapper:
"0.27.1/bio/gatk/selectvariants"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} -O {snakemake.output.vcf} {log}")
GATK VARIANTFILTRATION¶
Run gatk VariantFiltration.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_filter:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.filtered.vcf"
log:
"logs/gatk/filter/snvs.log"
params:
filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
extra="", # optional arguments, see GATK docs
wrapper:
"0.27.1/bio/gatk/variantfiltration"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
filters = ["--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
for name, expr in snakemake.params.filters.items()]
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} {filters} -O {snakemake.output.vcf} {log}")
GATK VARIANTRECALIBRATOR¶
Run gatk VariantRecalibrator.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
from snakemake.remote import GS
# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()
def gatk_bundle(f):
return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))
rule haplotype_caller:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
# resources have to be given as named input files
hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
# use aux to e.g. download other necessary file
aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
output:
vcf="calls/all.recal.vcf",
tranches="calls/all.tranches"
log:
"logs/gatk/variantrecalibrator.log"
params:
mode="SNP", # set mode, must be either SNP, INDEL or BOTH
# resource parameter definition. Key must match named input files from above.
resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
"omni": {"known": False, "training": True, "truth": False, "prior": 12.0},
"g1k": {"known": False, "training": True, "truth": False, "prior": 10.0},
"dbsnp": {"known": True, "training": False, "truth": False, "prior": 2.0}},
annotation=["QD", "FisherStrand"], # which fields to use with -an (see VariantRecalibrator docs)
extra="", # optional
wrapper:
"0.27.1/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
def fmt_res(resname, resparams):
fmt_bool = lambda b: str(b).lower()
try:
f = snakemake.input.get(resname)
except KeyError:
raise RuntimeError("There must be a named input file for every resource (missing: {})".format(resname))
return "{},known={},training={},truth={},prior={}:{}".format(
resname, fmt_bool(resparams["known"]), fmt_bool(resparams["training"]),
fmt_bool(resparams["truth"]), resparams["prior"], f)
resources = ["--resource {}".format(fmt_res(resname, resparams))
for resname, resparams in snakemake.params["resources"].items()]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
tranches = "--tranches-file " + snakemake.output.tranches
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk VariantRecalibrator {extra} {resources} "
"-R {snakemake.input.ref} -V {snakemake.input.vcf} "
"-mode {snakemake.params.mode} "
"--output {snakemake.output.vcf} "
"{tranches} {annotation} {log}")
HISAT2¶
Map reads with hisat2.
Software dependencies¶
- hisat2 ==2.1.0
- samtools ==1.5
Example¶
This wrapper can be used in the following way:
rule hisat2:
input:
reads=["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
output:
"mapped/{sample}.bam"
log: # optional
"logs/hisat2/{sample}.log"
params: # idx is required, extra is optional
idx="genome.fa",
extra="--min-intronlen 1000"
threads: 8 # optional, defaults to 1
wrapper:
"0.27.1/bio/hisat2"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The -S flag must not be used since output is already directly piped to samtools for compression.
- The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
- The wrapper does not yet handle SRA input accessions.
- No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()
# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
input_flags = "-1 {0} -2 {1}".format(*reads)
else:
raise RuntimeError(
"Reads parameter must contain at least 1 and at most 2"
" input files.")
# Executed shell command
shell(
"(hisat2 {extra} --threads {snakemake.threads}"
" -x {snakemake.params.idx} {input_flags}"
" | samtools view -Sbh -o {snakemake.output[0]} -)"
" {log}")
JANNOVAR¶
Annotate predicted effect of nucleotide changes with Jannovar
Software dependencies¶
- jannovar-cli ==0.25
Example¶
This wrapper can be used in the following way:
rule jannovar:
input:
vcf="{sample}.vcf",
pedigree="pedigree_ar.ped" # optional, contains familial relationships
output:
"jannovar/{sample}.vcf.gz"
log:
"logs/jannovar/{sample}.log"
params:
database="hg19_small.ser", # path to jannovar reference dataset
extra="--show-all" # optional parameters
wrapper:
"0.27.1/bio/jannovar"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
pedigree = snakemake.input.get("pedigree", "")
if pedigree:
pedigree = '--pedigree-file "%s"'%pedigree
shell("jannovar annotate-vcf --database {snakemake.params.database}"
" --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
" {pedigree} {extra} {log}")
MINIMAP2¶
Wrappers¶
MINIMAP2¶
A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2
Software dependencies¶
- minimap2 ==2.5
Example¶
This wrapper can be used in the following way:
rule minimap2:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"]
output:
"aligned/{input1}_aln.paf"
log:
"logs/minimap2/{input1}.log"
params:
extra="-x map-pb" # optional
threads: 3
wrapper:
"0.27.1/bio/minimap2/aligner"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
inputQuery = " ".join(snakemake.input.query)
shell("(minimap2 -t {snakemake.threads} {extra} "
"{snakemake.input.target} {inputQuery} >"
"{snakemake.output[0]}) {log}")
MINIMAP2 INDEX¶
creates a minimap2 index
Software dependencies¶
- minimap2 ==2.5
Example¶
This wrapper can be used in the following way:
rule minimap2_index:
input:
target="target/{input1}.fasta"
output:
"{input1}.mmi"
log:
"logs/minimap2_index/{input1}.log"
params:
extra="" # optional additional args
threads: 3
wrapper:
"0.27.1/bio/minimap2/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("(minimap2 -t {snakemake.threads} {extra} "
"-d {snakemake.output[0]} {snakemake.input.target}) {log}")
MULTIQC¶
Generate qc report using multiqc.
Software dependencies¶
- multiqc ==1.2
- networkx <2.0
Example¶
This wrapper can be used in the following way:
rule multiqc:
input:
expand("samtools_stats/{sample}.txt", sample=["a", "b"])
output:
"qc/multiqc.html"
params:
"" # Optional: extra parameters for multiqc.
log:
"logs/multiqc.log"
wrapper:
"0.27.1/bio/multiqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"multiqc"
" {snakemake.params}"
" --force"
" -o {output_dir}"
" -n {output_name}"
" {input_dirs}"
" {log}")
NGS-DISAMBIGUATE¶
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.
Software dependencies¶
- ngs-disambiguate ==2016.11.10
- bamtools ==2.4.0
Example¶
This wrapper can be used in the following way:
rule disambiguate:
input:
a="mapped/{sample}.a.bam",
b="mapped/{sample}.b.bam"
output:
a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
a_disambiguated='disambiguate/{sample}.graft.bam',
b_disambiguated='disambiguate/{sample}.host.bam',
summary='qc/disambiguate/{sample}.txt'
params:
algorithm="bwa",
# optional: Prefix to use for output. If omitted, a
# suitable value is guessed from the output paths. Prefix
# is used for the intermediate output paths, as well as
# sample name in summary file.
prefix="{sample}",
extra=""
wrapper:
"0.27.1/bio/ngs-disambiguate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")
output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
prefix = path.splitext(path.basename(snakemake.output.summary))[0]
# Run command.
shell(
"ngs_disambiguate"
" {extra}"
" -o {output_dir}"
" -s {prefix}"
" -a {snakemake.params.algorithm}"
" {snakemake.input.a}"
" {snakemake.input.b}")
# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)
output_map = {
output_base + ".ambiguousSpeciesA.bam":
snakemake.output.a_ambiguous,
output_base + ".ambiguousSpeciesB.bam":
snakemake.output.b_ambiguous,
output_base + ".disambiguatedSpeciesA.bam":
snakemake.output.a_disambiguated,
output_base + ".disambiguatedSpeciesB.bam":
snakemake.output.b_disambiguated,
output_base + "_summary.txt":
snakemake.output.summary
}
for src, dest in output_map.items():
if src != dest:
shell('mv {src} {dest}')
PICARD¶
Wrappers¶
PICARD ADDORREPLACEREADGROUPS¶
Add or replace read groups with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule replace_rg:
input:
"mapped/{sample}.bam"
output:
"fixed-rg/{sample}.bam"
log:
"logs/picard/replace_rg/{sample}.log"
params:
"RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
wrapper:
"0.27.1/bio/picard/addorreplacereadgroups"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
"O={snakemake.output} &> {snakemake.log}")
PICARD COLLECTALIGNMENTSUMMARYMETRICS¶
Collect metrics on aligned reads with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
ref="genome.fasta",
bam="mapped/{sample}.bam"
output:
"stats/{sample}.summary.txt"
log:
"logs/picard/alignment-summary/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.27.1/bio/picard/collectalignmentsummarymetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell("picard CollectAlignmentSummaryMetrics {snakemake.params} "
"INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
"REFERENCE_SEQUENCE={snakemake.input.ref} {log}")
PICARD COLLECTHSMETRICS¶
Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule picard_collect_hs_metrics:
input:
bam="mapped/{sample}.bam",
reference="genome.fasta",
# Baits and targets should be given as interval lists. These can
# be generated from bed files using picard BedToIntervalList.
bait_intervals="regions.intervals",
target_intervals="regions.intervals"
output:
"stats/hs_metrics/{sample}.txt"
params:
# Optional extra arguments. Here we reduce sample size
# to reduce the runtime in our unit test.
"SAMPLE_SIZE=1000"
log:
"logs/picard_collect_hs_metrics/{sample}.log"
wrapper:
"0.27.1/bio/picard/collecthsmetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard CollectHSMetrics."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard CollectHsMetrics"
" {extra}"
" INPUT={snakemake.input.bam}"
" OUTPUT={snakemake.output[0]}"
" REFERENCE_SEQUENCE={snakemake.input.reference}"
" BAIT_INTERVALS={snakemake.input.bait_intervals}"
" TARGET_INTERVALS={snakemake.input.target_intervals}"
" {log}")
PICARD COLLECTINSERTSIZEMETRICS¶
Collect metrics on insert size of paired end reads with picard tools.
Software dependencies¶
- picard ==2.9.2
- r-base ==3.3.2
Example¶
This wrapper can be used in the following way:
rule insert_size:
input:
"mapped/{sample}.bam"
output:
txt="stats/{sample}.isize.txt",
pdf="stats/{sample}.isize.pdf"
log:
"logs/picard/insert_size/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.27.1/bio/picard/collectinsertsizemetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell("picard CollectInsertSizeMetrics {snakemake.params} "
"INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
"HISTOGRAM_FILE={snakemake.output.pdf} {log}")
PICARD CREATESEQUENCEDICTIONARY¶
Create a .dict file for a given FASTA file
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule create_dict:
input:
"genome.fasta"
output:
"genome.dict"
log:
"logs/picard/create_dict.log"
params:
extra="" # optional: extra arguments for picard.
wrapper:
"0.27.1/bio/picard/createsequencedictionary"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
'picard '
'CreateSequenceDictionary '
'{extra} '
'R={snakemake.input[0]} '
'O={snakemake.output[0]} '
'{log}')
PICARD MARKDUPLICATES¶
Mark PCR and optical duplicates with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule mark_duplicates:
input:
"mapped/{sample}.bam"
output:
bam="dedup/{sample}.bam",
metrics="dedup/{sample}.metrics.txt"
log:
"logs/picard/dedup/{sample}.log"
params:
"REMOVE_DUPLICATES=true"
wrapper:
"0.27.1/bio/picard/markduplicates"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
"OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
"&> {snakemake.log}")
PICARD MERGESAMFILES¶
Merge sam/bam files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule merge_bams:
input:
expand("mapped/{sample}.bam", sample=["a", "b"])
output:
"merged.bam"
log:
"logs/picard_mergesamfiles.log"
params:
"VALIDATION_STRINGENCY=LENIENT"
wrapper:
"0.27.1/bio/picard/mergesamfiles"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard"
" MergeSamFiles"
" {snakemake.params}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}")
PICARD MERGEVCFS¶
Merge vcf files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule merge_vcfs:
input:
["snvs.chr1.vcf", "snvs.chr2.vcf"]
output:
"snvs.vcf"
log:
"logs/picard/mergevcfs.log"
params:
extra=""
wrapper:
"0.27.1/bio/picard/mergevcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(f) for f in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"picard"
" MergeVcfs"
" {extra}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}")
PICARD SORTSAM¶
Sort sam/bam files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule sort_bam:
input:
"mapped/{sample}.bam"
output:
"sorted/{sample}.bam"
log:
"logs/picard/sort_sam/{sample}.log"
params:
sort_order="coordinate",
extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
wrapper:
"0.27.1/bio/picard/sortsam"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
'picard'
' SortSam'
' {extra}'
' INPUT={snakemake.input[0]}'
' OUTPUT={snakemake.output[0]}'
' SORT_ORDER={snakemake.params.sort_order}'
' {log}')
PINDEL¶
Wrappers¶
PINDEL¶
Call variants with pindel.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]
rule pindel:
input:
ref="genome.fasta",
# samples to call
samples=["mapped/a.bam"],
# bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
config="pindel_config.txt"
output:
expand("pindel/all_{type}", type=pindel_types)
params:
# prefix must be consistent with output files
prefix="pindel/all",
extra="" # optional parameters (except -i, -f, -o)
log:
"logs/pindel.log"
threads: 4
wrapper:
"0.27.1/bio/pindel/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
"-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}")
PINDEL2VCF¶
Convert pindel output to vcf.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
rule pindel2vcf:
input:
ref="genome.fasta",
pindel="pindel/all_{type}"
output:
"pindel/all_{type}.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.{type}.log"
wrapper:
"0.27.1/bio/pindel/pindel2vcf"
rule pindel2vcf_multi_input:
input:
ref="genome.fasta",
pindel=["pindel/all_D", "pindel/all_INV"]
output:
"pindel/all.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.log"
wrapper:
"0.27.1/bio/pindel/pindel2vcf"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
expected_endings = ['INT', 'D', 'SI', 'INV', 'INV_final' 'TD', 'LI', 'BP', 'CloseEndMapped','RP']
def split_file_name(file_parts, file_ending_index):
return "_".join(file_parts[:file_ending_index]), "_".join(file_parts[file_ending_index])
def process_input_path(input_file):
"""
:params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
:return: ""/path/to/file", "all"
"""
file_path, file_name = os.path.split(input_file)
file_parts = file_name.split("_")
#seperate ending and name, to name: all ending: D or name: all ending: INV_final
file_name, file_ending = split_file_name(file_parts, -2 if file_name.endswith("_final") else -1)
if not file_ending in expected_endings:
raise Exception("Unexpected variant type: " + file_ending)
return file_path, file_name
with tempfile.TemporaryDirectory() as tmpdirname:
input_flag = "-p"
input_file = snakemake.input.get("pindel")
if isinstance(input_file, list) and len(input_file) > 1:
input_flag = "-P"
input_path, input_name = process_input_path(input_file[0])
input_file = os.path.join(input_path,input_name)
for variant_input in snakemake.input.pindel:
if not variant_input.startswith(input_file):
raise Exception("Unable to extract common path from multi file input, expect path is: " + input_file)
if not os.path.isfile(variant_input):
raise Exception("Input \"" + input_file + "\" is not a file!")
os.symlink(os.path.abspath(variant_input),os.path.join(tmpdirname, os.path.basename(variant_input)))
input_file = os.path.join(tmpdirname,input_name)
shell("pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}")
RUBIC¶
RUBIC detects recurrent copy number alterations using copy number breaks.
Software dependencies¶
- r-base =3.4.1
- r-rubic =1.0.3
- r-data.table =1.10.4
- r-pracma =2.0.4
- r-ggplot2 =2.2.1
- r-gtable =0.2.0
- r-codetools =0.2_15
- r-digest =0.6.12
Example¶
This wrapper can be used in the following way:
rule rubic:
input:
seg="{samples}/segments.txt",
markers="{samples}/markers.txt"
output:
out_gains="{samples}/gains.txt",
out_losses="{samples}/losses.txt",
out_plots="{samples}/plots" #only possible to provide output directory for plots
params:
fdr="",
genefile=""
wrapper:
"0.27.1/bio/rubic"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Beatrice F. Tan
Code¶
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"
library(RUBIC)
all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]
rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])
SALMON¶
Wrappers¶
SALMON_INDEX¶
Index a transcriptome assembly with salmon
Software dependencies¶
- salmon ==0.10.1
Example¶
This wrapper can be used in the following way:
rule salmon_index:
input:
"assembly/transcriptome.fasta"
output:
"salmon/transcriptome_index"
log:
"logs/salmon/transcriptome_index.log"
threads: 2
params:
# optional parameters
extra=""
wrapper:
"0.27.1/bio/salmon/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Index."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("salmon index -t {snakemake.input} -i {snakemake.output} "
" --threads {snakemake.threads} {extra} {log}" )
SALMON_QUANT¶
Quantify transcripts with salmon
Software dependencies¶
- salmon ==0.10.0
Example¶
This wrapper can be used in the following way:
rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1 = "reads/{sample}_1.fq.gz",
r2 = "reads/{sample}_2.fq.gz",
index = "salmon/transcriptome_index"
output:
quant = 'salmon/{sample}/quant.sf',
lib = 'salmon/{sample}/lib_format_counts.json'
log:
'logs/salmon/{sample}.log'
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
threads: 2
wrapper:
"0.27.1/bio/salmon/quant"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Quant"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
def manual_decompression (reads, zip_ext):
""" Allow *.bz2 input into salmon. Also provide same
decompression for *gz files, as salmon devs mention
it may be faster in some cases."""
if zip_ext and reads:
if zip_ext == 'bz2':
reads = ' < (bunzip2 -c ' + reads + ')'
elif zip_ext == 'gz':
reads = ' < (gunzip -c ' + reads + ')'
return reads
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")
r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")
assert (r1 is not None and r2 is not None) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
r1 = [snakemake.input.r1] if isinstance(snakemake.input.r1, str) else snakemake.input.r1
r2 = [snakemake.input.r2] if isinstance(snakemake.input.r2, str) else snakemake.input.r2
assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
r1_cmd = ' -1 ' + manual_decompression(" ".join(r1), zip_extension)
r2_cmd = ' -2 ' + manual_decompression(" ".join(r2), zip_extension)
read_cmd = " ".join([r1_cmd,r2_cmd])
if r:
assert r1 is None and r2 is None, "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
read_cmd = ' -r ' + manual_decompression(" ".join(r), zip_extension)
outdir = path.dirname(snakemake.output.get('quant'))
shell("salmon quant -i {snakemake.input.index} "
" -l {libtype} {read_cmd} -o {outdir} "
" -p {snakemake.threads} {extra} {log} ")
SAMBAMBA¶
Wrappers¶
SAMBAMBA SORT¶
Sort bam file with sambamba
Software dependencies¶
- sambamba ==0.6.6
Example¶
This wrapper can be used in the following way:
rule sambamba_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"" # optional parameters
threads: 8
wrapper:
"0.27.1/bio/sambamba/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
shell(
"sambamba sort {snakemake.params} -t {snakemake.threads} "
"-o {snakemake.output[0]} {snakemake.input[0]}")
SAMTOOLS¶
Wrappers¶
SAMTOOLS FLAGSTAT¶
Use samtools to create a flagstat file from a bam or sam file.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_flagstat:
input: "mapped/{sample}.bam"
output: "mapped/{sample}.bam.flagstat"
wrapper:
"0.27.1/bio/samtools/flagstat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Preusch
Code¶
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS INDEX¶
Index bam file with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input: "mapped/{sample}.sorted.bam"
output: "mapped/{sample}.sorted.bam.bai"
params:
"" # optional params string
wrapper:
"0.27.1/bio/samtools/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE¶
Merge two bam files with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_merge:
input:
["mapped/A.bam", "mapped/B.bam"]
output:
"merged.bam"
params:
"" # optional additional parameters as string
threads: 8
wrapper:
"0.27.1/bio/samtools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools merge --threads {snakemake.threads} {snakemake.params} "
"{snakemake.output[0]} {snakemake.input}")
SAMTOOLS SORT¶
Sort bam file with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.27.1/bio/samtools/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
shell(
"samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} "
"-T {prefix} {snakemake.input[0]}")
SAMTOOLS STATS¶
Generate stats using samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_stats:
input:
"mapped/{sample}.bam"
output:
"samtools_stats/{sample}.txt"
params:
extra="", # Optional: extra arguments.
region="1:1000000-2000000" # Optional: region string.
log:
"logs/samtools_stats/{sample}.log"
wrapper:
"0.27.1/bio/samtools/stats"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("samtools stats {extra} {snakemake.input}"
" {region} > {snakemake.output} {log}")
SAMTOOLS VIEW¶
Convert or filter SAM/BAM.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_view:
input:
"{sample}.sam"
output:
"{sample}.bam"
params:
"-b" # optional params string
wrapper:
"0.27.1/bio/samtools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SICKLE¶
Wrappers¶
SICKLE PE¶
Trim paired-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
r1="input_R1.fq",
r2="input_R2.fq"
output:
r1="output_R1.fq",
r2="output_R2.fq",
rs="output_single.fq",
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
# optional log file
"logs/sickle/job.log"
wrapper:
"0.27.1/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
"-o {snakemake.output.r1} -p {snakemake.output.r2} "
"-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
"{extra}) {log}"
)
SICKLE SE¶
Trim single-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
"input_R1.fq"
output:
"output_R1.fq"
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
"logs/sickle/job.log"
wrapper:
"0.27.1/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
"-t {snakemake.params.qual_type} {extra}) {log}"
)
SNPEFF¶
Annotate predicted effect of nucleotide changes with SnpEff
Software dependencies¶
- snpeff ==4.3.1t
Example¶
This wrapper can be used in the following way:
rule snpeff:
input:
"{sample}.vcf",
output:
vcf="snpeff/{sample}.vcf", # the main output file, required
stats="snpeff/{sample}.html", # summary statistics (in HTML), optional
csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
log:
"logs/snpeff/{sample}.log"
params:
reference="ebola_zaire", # reference name (from `snpeff databases`)
extra="-Xmx4g" # optional parameters (e.g., max memory 4g)
wrapper:
"0.27.1/bio/snpeff"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
from os import path
import shutil
import tempfile
shell.executable("bash")
shell_command =("(snpEff {data_dir} {stats_opt} {csvstats_opt} {extra}"
" {snakemake.params.reference} {snakemake.input}"
" > {snakemake.output.vcf}) {log}")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
data_dir = snakemake.params.get("data_dir", "")
if data_dir:
data_dir = '-dataDir "%s"'%data_dir
stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = '' if not csvstats else '-csvStats {}'.format(csvstats)
stats_opt = '-noStats' if not stats else '-stats {}'.format(stats)
shell(shell_command)
#if stats:
# shutil.copy(path.join(stats_tempdir, 'stats'), stats)
#if genes:
# shutil.copy(path.join(stats_tempdir, 'stats.genes.txt'), genes)
SOURMASH¶
Wrappers¶
SOURMASH_COMPUTE¶
Build a MinHash signature for a transcriptome, genome, or reads
Software dependencies¶
- sourmash==2.0.0a7
Example¶
This wrapper can be used in the following way:
rule sourmash_reads:
input:
"reads/a.fastq"
output:
"reads.sig"
log:
"logs/sourmash/sourmash_compute_reads.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.27.1/bio/sourmash/compute"
rule sourmash_transcriptome:
input:
"assembly/transcriptome.fasta"
output:
"transcriptome.sig"
log:
"logs/sourmash/sourmash_compute_transcriptome.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.27.1/bio/sourmash/compute"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Lisa K. Johnson
Code¶
"""Snakemake wrapper for sourmash compute."""
__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled","1000")
k = snakemake.params.get("k","31")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
" {extra} {log}" )
STAR¶
Wrappers¶
STAR¶
Map reads with STAR.
Software dependencies¶
- star ==2.5.3a
Example¶
This wrapper can be used in the following way:
rule star_pe_multi:
input:
# use a list for multiple fastq files for one sample
# usually technical replicates across lanes/flowcells
fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
# paired end reads needs to be ordered so each item in the two lists match
fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
output:
# see STAR manual for additional output files
"star/pe/{sample}/Aligned.out.bam"
log:
"logs/star/pe/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.27.1/bio/star/align"
rule star_se:
input:
fq1 = "reads/{sample}_R1.1.fastq"
output:
# see STAR manual for additional output files
"star/{sample}/Aligned.out.bam"
log:
"logs/star/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.27.1/bio/star/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = [snakemake.input.fq1] if isinstance(snakemake.input.fq1, str) else snakemake.input.fq1
fq2 = snakemake.input.get("fq2")
if fq2:
fq2 = [snakemake.input.fq2] if isinstance(snakemake.input.fq2, str) else snakemake.input.fq2
assert len(fq1) == len(fq2), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])
if fq1[0].endswith(".gz"):
readcmd = "--readFilesCommand zcat"
else:
readcmd = ""
outprefix = os.path.dirname(snakemake.output[0]) + "/"
shell(
"STAR "
"{extra} "
"--runThreadN {snakemake.threads} "
"--genomeDir {snakemake.params.index} "
"--readFilesIn {input_str} "
"{readcmd} "
"--outSAMtype BAM Unsorted "
"--outFileNamePrefix {outprefix} "
"--outStd Log "
"{log}")
TRIM_GALORE¶
Wrappers¶
TRIM_GALORE-PE¶
Trim paired-end reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.5
Example¶
This wrapper can be used in the following way:
rule trim_galore_pe:
input:
["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
output:
"trimmed/{sample}.1_val_1.fq.gz",
"trimmed/{sample}.1.fastq.gz_trimming_report.txt",
"trimmed/{sample}.2_val_2.fq.gz",
"trimmed/{sample}.2.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.27.1/bio/trim_galore/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError("The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead.")
# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), \
"trim_galore can only output files to a single directory." \
" Please indicate only one directory for the output files."
shell(
"(trim_galore"
" {snakemake.params.extra}"
" --paired"
" -o {out_dir}"
" {snakemake.input})"
" {log}")
TRIM_GALORE-SE¶
Trim unpaired reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.3
Example¶
This wrapper can be used in the following way:
rule trim_galore_se:
input:
"reads/{sample}.fastq.gz"
output:
"trimmed/{sample}_trimmed.fq.gz",
"trimmed/{sample}.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.27.1/bio/trim_galore/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError("The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead.")
# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), \
"trim_galore can only output files to a single directory." \
" Please indicate only one directory for the output files."
shell(
"(trim_galore"
" {snakemake.params.extra}"
" -o {out_dir}"
" {snakemake.input})"
" {log}")
TRIMMOMATIC¶
Wrappers¶
TRIMMOMATIC PE¶
Trim paired-end reads with trimmomatic.
Software dependencies¶
- trimmomatic ==0.36
Example¶
This wrapper can be used in the following way:
rule trimmomatic_pe:
input:
r1="reads/{sample}.1.fastq",
r2="reads/{sample}.2.fastq"
output:
r1="trimmed/{sample}.1.fastq.gz",
r2="trimmed/{sample}.2.fastq.gz",
# reads where trimming entirely removed the mate
r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra=""
wrapper:
"0.27.1/bio/trimmomatic/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
trimmer = " ".join(snakemake.params.trimmer)
shell("trimmomatic PE {snakemake.params.extra} "
"{snakemake.input.r1} {snakemake.input.r2} "
"{snakemake.output.r1} {snakemake.output.r1_unpaired} "
"{snakemake.output.r2} {snakemake.output.r2_unpaired} "
"{trimmer} "
"{log}")
TRIMMOMATIC SE¶
Trim single-end reads with trimmomatic.
Software dependencies¶
- trimmomatic ==0.36
Example¶
This wrapper can be used in the following way:
rule trimmomatic_pe:
input:
"reads/{sample}.fastq"
output:
"trimmed/{sample}.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra=""
wrapper:
"0.27.1/bio/trimmomatic/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
trimmer = " ".join(snakemake.params.trimmer)
shell("trimmomatic SE {snakemake.params.extra} "
"{snakemake.input} {snakemake.output} "
"{trimmer} "
"{log}")
TRINITY¶
Generate transcriptome assembly with Trinity
Software dependencies¶
- trinity ==2.5.1
Example¶
This wrapper can be used in the following way:
rule trinity:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
output:
"trinity_out_dir/Trinity.fasta"
log:
'logs/trinity/trinity.log'
params:
extra=""
threads: 4
wrapper:
"0.27.1/bio/trinity"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Trinity."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
max_memory = snakemake.params.get("max_memory", "10G")
#allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = [snakemake.input.left] if isinstance(snakemake.input.left, str) else snakemake.input.left
right = snakemake.input.get("right")
if right:
right = [snakemake.input.right] if isinstance(snakemake.input.right, str) else snakemake.input.right
assert len(left) >= len(right), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
input_str_left = ' --left ' + ",".join(left)
input_str_right = ' --right ' + ",".join(right)
else:
input_str_left = ' --single ' + ",".join(left)
input_str_right = ''
input_cmd = " ".join([input_str_left, input_str_right])
# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
if 'fq' in left[0] or 'fastq' in left[0]:
seqtype = 'fq'
elif 'fa' in left[0] or 'fasta' in left[0]:
seqtype = 'fa'
else: # assertion is redundant - warning or error instead?
assert seqtype is not None, "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"
outdir = path.dirname(snakemake.output[0])
assert 'trinity' in outdir, "output directory name must contain 'trinity'"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("Trinity {input_cmd} --CPU {snakemake.threads} "
" --max_memory {max_memory} -seqType {seqtype} "
" --output {outdir} {snakemake.params.extra} "
" {log}")
VCF¶
Wrappers¶
COMPRESS VCF¶
Compress and index vcf file with bgzip and tabix.
Software dependencies¶
- htslib ==1.5
Example¶
This wrapper can be used in the following way:
rule compress_vcf:
input:
"{prefix}.vcf"
output:
"{prefix}.vcf.gz"
wrapper:
"0.27.1/bio/vcf/compress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("bgzip --stdout {snakemake.input} > {snakemake.output} && tabix -p vcf {snakemake.output}")
UNCOMPRESS VCF¶
Uncompress vcf file with bgzip.
Software dependencies¶
- htslib ==1.5
Example¶
This wrapper can be used in the following way:
rule uncompress_vcf:
input:
"{prefix}.vcf.gz"
output:
"{prefix}.vcf"
wrapper:
"0.27.1/bio/vcf/uncompress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("bgzip --decompress --stdout {snakemake.input} > {snakemake.output}")
VCFTOOLS¶
Wrappers¶
VCFTOOLS FILTER¶
Filter vcf files using vcftools
Software dependencies¶
- vcftools ==0.1.15
Example¶
This wrapper can be used in the following way:
rule filter_vcf:
input:
"{sample}.vcf"
output:
"{sample}.filtered.vcf"
params:
extra="--chr 1 --recode-INFO-all"
wrapper:
"0.27.1/bio/vcftools/filter"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
input_flag = "--gzvcf"
output = " > " + snakemake.output[0]
if output.endswith(".gz"):
output = " | gzip" + output
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell("vcftools "
"{input_flag} "
"{snakemake.input} "
"{extra} "
"--recode "
"--stdout "
"{output} "
"{log}")