The Snakemake Wrappers repository¶
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Usage¶
The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.2.0/bio/samtools/sort"
Here, Snakemake will automatically download the corresponding wrapper from https://bitbucket.org/snakemake/snakemake-wrappers/src/0.2.0/bio/samtools/sort/wrapper.py. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id (see here). This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://
.
Each wrapper defines required software packages and versions. In combination with the --use-conda
flag of Snakemake, these will be deployed automatically.
Contribute¶
We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:
- Fork the repository: https://bitbucket.org/snakemake/snakemake-wrappers/fork
- Clone the repo locally:
git clone https://MY_PROFILE@bitbucket.org/MY_PROFILE/snakemake-wrappers.git
- Locally, create a new branch:
git checkout -b my-new-snakemake-wrapper
- Commit your contributions to that branch and push them to you fork:
git push -u origin my-new-snakemake-wrapper
- Create a pull request:
https://bitbucket.org/MY_PROFILE/snakemake-wrappers/pull-requests/new
The pull request will be reviewed and included as fast as possible. Contributions should follow the coding style of the already present examples, i.e.:
- provide a
meta.yaml
with name, description and author(s) of the wrapper - provide an
environment.yaml
which lists all required software packages (the packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge ) - provide a minimal test case in a subfolder called
test
, with an exampleSnakefile
that shows how to use the wrapper, some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test intest.py
- follow the python style guide, using 4 spaces for indentation.
Testing locally¶
If you want to debug your contribution locally, before creating a pull request,
we recommend adding your test case to the start of the list in test.py
, so
that it runs first. Then, install miniconda with the channels as described for
bioconda and set up an
environment with the necessary dependencies and activate it:
conda create -n test-snakemake-wrappers snakemake pytest conda
conda activate test-snakemake-wrappers
Afterwards, from the main directory of the repo, you can run the tests with:
pytest test.py -v
If you use a keyboard interrupt after your test has failed, you will get all the relevant stdout and stderr messages printed.
If you also want to test the docs generation locally, create another environment and activate it:
conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml
conda activate test-snakemake-wrapper-docs
Then, enter the respective directory and build the docs:
cd docs
make html
If it runs through, you can open the main page at docs/_build/html/index.html
in a web browser. If you want to start fresh, you can clean up the build
with make clean
.
ART¶
For art, the following wrappers are available:
ART_PROFILER_ILLUMINA¶
Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.
Software dependencies¶
- art ==2016.06.05
Example¶
This wrapper can be used in the following way:
rule art_profiler_illumina:
input:
"data/{sample}.fq",
output:
"profiles/{sample}.txt"
log:
"logs/art_profiler_illumina/{sample}.log"
params: ""
threads: 2
wrapper:
"0.31.0/bio/art/profiler_illumina"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
import tempfile
import re
# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
# ensure that .fastq and .fastq.gz input files work, as well
filename = os.path.basename(snakemake.input[0] ).replace('.fastq', '.fq')
# figure out the exact file extension after the above substitution
ext = re.search("fq(\.gz)?$", filename)
if ext:
fq_extension = ext.group(0)
else:
raise IOError("Incompatible extension: This art_profiler_illumina "
"wrapper requires input files with one of the following "
"extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
"your input and the invocation of the wrapper accordingly.")
os.symlink(
# snakemake paths are relative, but the symlink needs to be absolute
os.path.abspath(snakemake.input[0] ),
# strip temp file name of any infixes, to circumvent art read
# enumeration magic
os.path.join(temp_input, "input." + fq_extension)
)
# include output folder name in the profile_name command line argument and
# strip off the file extension, as art will add its own ".txt"
profile_name = os.path.join( os.path.dirname(snakemake.output[0] ), filename.replace("." + fq_extension, '' ) )
shell(
"( art_profiler_illumina {snakemake.params} {profile_name}"
" {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}")
BCFTOOLS¶
For bcftools, the following wrappers are available:
BCFTOOLS CALL¶
Call variants with bcftools.
Software dependencies¶
- samtools ==1.5
- bcftools ==1.5
Example¶
This wrapper can be used in the following way:
rule bcftools_call:
input:
ref="genome.fasta",
samples=expand("mapped/{sample}.sorted.bam", sample=config["samples"]),
indexes=expand("mapped/{sample}.sorted.bam.bai", sample=config["samples"])
output:
# Here, we optionally use a region as wildcard and constrain it to the
# format accepted by samtools mpileup.
"called/{region,.+(:[0-9]+-[0-9]+)?}.bcf"
params:
# Optional parameters for samtools mpileup (except -g, -f).
# In this example, we forward the region wildcard from the output file to mpileup.
mpileup="--region {region}",
# Optional parameters for bcftools call (except -v, -o, -m).
call=""
log:
"logs/bcftools_call/{region}.log"
wrapper:
"0.31.0/bio/bcftools/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"(samtools mpileup {snakemake.params.mpileup} {snakemake.input.samples} "
"--fasta-ref {snakemake.input.ref} --BCF --uncompressed | "
"bcftools call -m {snakemake.params.call} -o {snakemake.output[0]} -v -) 2> {snakemake.log}")
BCFTOOLS CONCAT¶
Concatenate vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.6
Example¶
This wrapper can be used in the following way:
rule bcftools_concat:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.31.0/bio/bcftools/concat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools concat {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}")
BCFTOOLS MERGE¶
Merge vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.6
Example¶
This wrapper can be used in the following way:
rule bcftools_merge:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.31.0/bio/bcftools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools merge {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}")
BCFTOOLS VIEW¶
View vcf/bcf file in a different format.
Software dependencies¶
- bcftools ==1.5
Example¶
This wrapper can be used in the following way:
rule bcf_to_vcf:
input:
"{prefix}.bcf"
output:
"{prefix}.vcf"
params:
"" # optional parameters for bcftools view (except -o)
wrapper:
"0.31.0/bio/bcftools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools view {snakemake.params} {snakemake.input[0]} "
"-o {snakemake.output[0]}")
BOWTIE2¶
For bowtie2, the following wrappers are available:
BOWTIE2¶
Map reads with bowtie2.
Software dependencies¶
- bowtie2 ==2.3.2
- samtools ==1.5
Example¶
This wrapper can be used in the following way:
rule bowtie2:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bowtie2/{sample}.log"
params:
index="index/genome", # prefix of reference genome index (built with bowtie2-build)
extra="" # optional parameters
threads: 8
wrapper:
"0.31.0/bio/bowtie2/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert n == 1 or n == 2, "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "-U {}".format(*snakemake.input.sample)
else:
reads = "-1 {} -2 {}".format(*snakemake.input.sample)
shell(
"(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
"-x {snakemake.params.index} {reads} "
"| samtools view -Sbh -o {snakemake.output[0]} -) {log}")
BUSCO¶
Assess assembly and annotation completeness with BUSCO
Software dependencies¶
- python ==3.6
- busco
Example¶
This wrapper can be used in the following way:
rule run_busco:
input:
"sample_data/target.fa"
output:
"txome_busco/full_table_txome_busco.tsv",
log:
"logs/quality/transcriptome_busco.log"
threads: 8
params:
mode="transcriptome",
lineage_path="sample_data/example",
# optional parameters
extra=""
wrapper:
"0.31.0/bio/busco"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for BUSCO assessment"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"
# busco does not allow you to direct output location: handle this by moving output
outdir = path.dirname(snakemake.output[0])
if '/' in outdir:
out_name = path.basename(outdir)
else:
out_name = outdir
#note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell("run_busco --in {snakemake.input} --out {out_name} --force "
" --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
" {extra} {log}" )
busco_outname = 'run_' + out_name
# move to intended location
shell("cp -r {busco_outname}/* {outdir}")
shell("rm -rf {busco_outname}")
BWA¶
For bwa, the following wrappers are available:
BWA ALN¶
Map reads with bwa aln.
Software dependencies¶
- bwa ==0.7.15
Example¶
This wrapper can be used in the following way:
rule bwa_aln:
input:
"reads/{sample}.{pair}.fastq"
output:
"sai/{sample}.{pair}.sai"
params:
index="genome",
extra=""
log:
"logs/bwa_aln/{sample}.{pair}.log"
threads: 8
wrapper:
"0.31.0/bio/bwa/aln"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa aln."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"bwa aln"
" {extra}"
" -t {snakemake.threads}"
" {snakemake.params.index}"
" {snakemake.input[0]}"
" > {snakemake.output[0]} {log}")
BWA INDEX¶
Creates a BWA index.
Software dependencies¶
- bwa ==0.7.15
Example¶
This wrapper can be used in the following way:
rule bwa_index:
input:
"{genome}.fasta"
output:
"{genome}.amb",
"{genome}.ann",
"{genome}.bwt",
"{genome}.pac",
"{genome}.sa"
log:
"logs/bwa_index/{genome}.log"
params:
prefix="{genome}",
algorithm="bwtsw"
wrapper:
"0.31.0/bio/bwa/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
#Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
raise ValueError("Only one reference genome can be inputed!")
#Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")
if len(prefix) > 0:
prefix = "-p " + prefix
#Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")
if len(construction_algorithm) != 0:
construction_algorithm = "-a " + construction_algorithm
shell(
"bwa index"
" {prefix}"
" {construction_algorithm}"
" {snakemake.input[0]}"
" {log}")
BWA MEM¶
Map reads using bwa mem, with optional sorting using samtools or picard.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.5
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bwa_mem/{sample}.log"
params:
index="genome",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="" # Extra args for samtools/picard.
threads: 8
wrapper:
"0.31.0/bio/bwa/mem"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {1, 2}:
raise ValueError("input must have 1 (single-end) or "
"2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}")
BWA SAMPE¶
Map paired-end reads with bwa sampe.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.3
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_sampe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_sampe/{sample}.log"
wrapper:
"0.31.0/bio/bwa/sampe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Check inputs.
if not len(snakemake.input.sai) == 2:
raise ValueError('input.sai must have 2 elements')
if not len(snakemake.input.fastq) == 2:
raise ValueError('input.fastq must have 2 elements')
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa sampe"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}")
BWA SAMSE¶
Map single-end reads with bwa samse.
Software dependencies¶
- bwa ==0.7.15
- samtools ==1.3
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule bwa_samse:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai"
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_samse/{sample}.log"
wrapper:
"0.31.0/bio/bwa/samse"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = ("picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}")
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa samse"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}")
CUTADAPT¶
For cutadapt, the following wrappers are available:
CUTADAPT-PE¶
Trim paired-end reads using cutadapt.
Software dependencies¶
- cutadapt ==1.13
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
fastq1="trimmed/{sample}.1.fastq",
fastq2="trimmed/{sample}.2.fastq",
qc="trimmed/{sample}.qc.txt"
params:
"-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.31.0/bio/cutadapt/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params}"
" -o {snakemake.output.fastq1}"
" -p {snakemake.output.fastq2}"
" {snakemake.input}"
" > {snakemake.output.qc} {log}")
CUTADAPT-SE¶
Trim single-end reads using cutadapt.
Software dependencies¶
- cutadapt ==1.13
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
"reads/{sample}.fastq"
output:
fastq="trimmed/{sample}.fastq",
qc="trimmed/{sample}.qc.txt"
params:
"-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
log:
"logs/cutadapt/{sample}.log"
wrapper:
"0.31.0/bio/cutadapt/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params}"
" -o {snakemake.output.fastq}"
" {snakemake.input[0]}"
" > {snakemake.output.qc} {log}")
DELLY¶
Call variants with delly.
Software dependencies¶
- delly ==0.7.8
Example¶
This wrapper can be used in the following way:
rule delly:
input:
ref="genome.fasta",
samples=["mapped/a.bam"],
# optional exclude template (see https://github.com/dellytools/delly)
exclude="human.hg19.excl.tsv"
output:
"sv/calls.bcf"
params:
extra="" # optional parameters for delly (except -g, -x)
log:
"logs/delly.log"
threads: 2 # It is best to use as many threads as samples
wrapper:
"0.31.0/bio/delly"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
try:
exclude = "-x " + snakemake.input.exclude
except AttributeError:
exclude = ""
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
"{exclude} -g {snakemake.input.ref} "
"-o {snakemake.output[0]} {snakemake.input.samples} {log}")
EPIC¶
For epic, the following wrappers are available:
EPIC¶
Find broad enriched domains in ChIP-Seq data with epic
Software dependencies¶
- epic =0.2.7
- pandas =0.22.0
Example¶
This wrapper can be used in the following way:
rule epic:
input:
treatment = "bed/test.bed",
background = "bed/control.bed"
output:
enriched_regions = "epic/enriched_regions.csv", # required
bed = "epic/enriched_regions.bed", # optional
matrix = "epic/matrix.gz" # optional
log:
"logs/epic/epic.log"
params:
genome = "hg19", # optional, default hg19
extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
threads: 1 # optional, defaults to 1
wrapper:
"0.31.0/bio/epic/peaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All/any of the different bigwig options must be given as extra parameters
Authors¶
- Endre Bakken Stovner
Code¶
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1
treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")
# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")
bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")
if len(snakemake.log) > 0:
log = snakemake.log[0]
genome = snakemake.params.get("genome")
cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"
if bed:
cmd += " -b {bed}"
if matrix:
cmd += " -sm {matrix}"
if log:
cmd += " -l {log}"
cmd += " {extra}"
shell(cmd)
FASTQ_SCREEN¶
fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE2
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE
DATABASE hg19 /data/hg19/Bowtie2Index/genome BOWTIE2
DATABASE mm10 /data/mm10/Bowtie2Index/genome BOWTIE2
BOWTIE /path/to/bowtie
BOWTIE2 /path/to/bowtie2
becomes:
fastq_screen_config = {
'database': {
'ecoli': {
'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
'hg19': {
'bowtie2': '/data/hg19/Bowtie2Index/genome'},
'mm10': {
'bowtie2': '/data/mm10/Bowtie2Index/genome'}
},
'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}
By default, the wrapper will use bowtie2 as the aligner and a subset of 100000
reads. These can be overridden using params.aligner
and params.subset
respectively. Furthermore, params.extra can be used to pass additional
arguments verbatim to fastq_screen
, for example extra="--illumina1_3"
or
extra="--bowtie2 '--trim5=8'"
.
Software dependencies¶
- fastq-screen ==0.5.2
- bowtie2 ==2.2.6
- bowtie ==1.1.2
Example¶
This wrapper can be used in the following way:
rule fastq_screen:
input:
"samples/{sample}.fastq.gz"
output:
txt="qc/{sample}.fastq_screen.txt",
png="qc/{sample}.fastq_screen.png"
params:
fastq_screen_config=fastq_screen_config,
subset=100000,
aligner='bowtie2'
threads: 8
wrapper:
"0.31.0/bio/fastq_screen"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
fastq_screen
hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.- While the dictionary form of
fastq_screen_config
is convenient, the unordered nature of the dictionary may causesnakemake --list-params-changed
to incorrectly report changed parameters even though the contents remain the same. If you plan on using--list-params-changed
then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6. - When providing the dictionary form of
fastq_screen_config
, the wrapper will write a temp file using Python’stempfile
module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something likeshell.prefix("export TMPDIR=/scratch; ")
to the snakefile calling this wrapper.
Authors¶
- Ryan Dale
Code¶
import os
from snakemake.shell import shell
import tempfile
__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"
_config = snakemake.params['fastq_screen_config']
subset = snakemake.params.get('subset', 100000)
aligner = snakemake.params.get('aligner', 'bowtie2')
extra = snakemake.params.get('extra', '')
log = snakemake.log_fmt_shell()
# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
tmp = tempfile.NamedTemporaryFile(delete=False).name
with open(tmp, 'w') as fout:
for label, indexes in _config['database'].items():
for aligner, index in indexes.items():
fout.write('\t'.join([
'DATABASE', label, index, aligner.upper()]) + '\n')
for aligner, path in _config['aligner_paths'].items():
fout.write('\t'.join([aligner.upper(), path]) + '\n')
config_file = tmp
else:
config_file = _config
# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = os.path.basename(snakemake.input[0].split('.fastq')[0])
tempdir = tempfile.mkdtemp()
shell(
"fastq_screen --outdir {tempdir} "
"--force "
"--aligner {aligner} "
"--conf {config_file} "
"--subset {subset} "
"--threads {snakemake.threads} "
"{extra} "
"{snakemake.input[0]} "
"{log}"
)
# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")
# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
shell("rm {tmp}")
FASTQC¶
Generate fastq qc statistics using fastqc.
Software dependencies¶
- fastqc ==0.11.7
Example¶
This wrapper can be used in the following way:
rule fastqc:
input:
"reads/{sample}.fastq"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}.zip"
params: ""
log:
"logs/fastqc/{sample}.log"
wrapper:
"0.31.0/bio/fastqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for fastqc."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = path.basename(file_path)
split_ind = 2 if base.endswith(".gz") else 1
base = ".".join(base.split(".")[:-split_ind])
return base
# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
shell("fastqc {snakemake.params} --quiet "
"--outdir {tempdir} {snakemake.input[0]}"
" {log}")
# Move outputs into proper position.
output_base = basename_without_ext(snakemake.input[0])
html_path = path.join(tempdir, output_base + "_fastqc.html")
zip_path = path.join(tempdir, output_base + "_fastqc.zip")
if snakemake.output.html != html_path:
shell("mv {html_path} {snakemake.output.html}")
if snakemake.output.zip != zip_path:
shell("mv {zip_path} {snakemake.output.zip}")
FGBIO¶
For fgbio, the following wrappers are available:
FGBIO ANNOTATEBAMWITHUMIS¶
Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule AnnotateBam:
input:
bam="mapped/{sample}.bam",
umi="umi/{sample}.fastq"
output:
"mapped/{sample}.annotated.bam"
params: ""
log:
"logs/fgbio/annotate_bam/{sample}.log"
wrapper:
"0.31.0/bio/fgbio/annotatebamwithumis"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input.bam
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
raise ValueError("Input bam should be a string: " + str(bam_input) + "!")
umi_input = snakemake.input.umi
if umi_input is None:
raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")
if not len(snakemake.output) == 1:
raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")
shell("fgbio AnnotateBamWithUmis"
" -i {bam_input}"
" -f {umi_input}"
" -o {output_file}"
" {extra_params}"
" {log}")
FGBIO CALLMOLECULARCONSENSUSREADS¶
Calls consensus sequences from reads with the same unique molecular tag.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule ConsensusReads:
input:
"mapped/a.bam"
output:
"mapped/{sample}.m3.bam"
params:
extra="-M 3"
log:
"logs/fgbio/consensus_reads/{sample}.log"
wrapper:
"0.31.0/bio/fgbio/callmolecularconsensusreads"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell("fgbio CallMolecularConsensusReads"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}")
FGBIO GROUPREADSBYUMI¶
Groups reads together that appear to have come from the same original molecule.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule GroupReads:
input:
"mapped/a.bam"
output:
"mapped/{sample}.gu.bam"
params:
extra="-s adjacency --edits 1"
log:
"logs/fgbio/group_reads/{sample}.log"
wrapper:
"0.31.0/bio/fgbio/groupreadsbyumi"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell("fgbio GroupReadsByUmi"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}")
FGBIO SETMATEINFORMATION¶
Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule SetMateInfo:
input:
"mapped/a.bam"
output:
"mapped/{sample}.mi.bam"
params: ""
log:
"logs/fgbio/set_mate_info/{sample}.log"
wrapper:
"0.31.0/bio/fgbio/setmateinformation"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell("fgbio SetMateInformation"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}")
FREEBAYES¶
Call small genomic variants with freebayes.
Software dependencies¶
- freebayes ==1.1.0
- bcftools ==1.5
- parallel ==20170422
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
ref="genome.fasta",
# you can have a list of samples here
samples="mapped/{sample}.bam"
output:
"calls/{sample}.vcf" # either .vcf or .bcf
log:
"logs/freebayes/{sample}.log"
params:
extra="", # optional parameters
chunksize=100000 # reference genome chunk size for parallelization (default: 100000)
threads: 2
wrapper:
"0.31.0/bio/freebayes"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = snakemake.params.get("extra", "")
pipe = ""
if snakemake.output[0].endswith(".bcf"):
pipe = "| bcftools view -Ob -"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
freebayes = ("freebayes-parallel <(fasta_generate_regions.py "
"{snakemake.input.ref}.fai {chunksize}) "
"{snakemake.threads}").format(snakemake=snakemake,
chunksize=chunksize)
shell("({freebayes} {params} -f {snakemake.input.ref}"
" {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}")
GATK¶
For gatk, the following wrappers are available:
GATK BASERECALIBRATOR¶
Run gatk BaseRecalibrator and ApplyBQSR in one step.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_bqsr:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
known="dbsnp.vcf.gz"
output:
bam="recal/{sample}.bam"
log:
"logs/gatk/bqsr/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/baserecalibrator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from tempfile import TemporaryDirectory
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
with TemporaryDirectory() as tmpdir:
recal_table = os.path.join(tmpdir, "recal_table.grp")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
"-R {snakemake.input.ref} -I {snakemake.input.bam} "
"-O {recal_table} --known-sites {snakemake.input.known} {log}")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell("gatk --java-options '{java_opts}' ApplyBQSR -R {snakemake.input.ref} -I {snakemake.input.bam} "
"--bqsr-recal-file {recal_table} "
"-O {snakemake.output.bam} {log}")
GATK COMBINEGVCFS¶
Run gatk CombineGVCFs.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
ref="genome.fasta"
output:
gvcf="calls/all.g.vcf",
log:
"logs/gatk/combinegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/combinegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' CombineGVCFs {extra} "
"{gvcfs} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.gvcf} {log}")
GATK GENOTYPEGVCFS¶
Run gatk GenotypeGVCFs.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcf="calls/all.g.vcf", # combined gvcf over multiple samples
ref="genome.fasta"
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/genotypegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
"-V {snakemake.input.gvcf} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.vcf} {log}")
GATK HAPLOTYPECALLER¶
Run gatk HaplotypeCaller.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule haplotype_caller:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
ref="genome.fasta"
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
log:
"logs/gatk/haplotypecaller/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
known = snakemake.input.get("known", "")
if known:
known = "--dbsnp " + known
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
bams = snakemake.input.bam
if isinstance(bams, str):
bams = [bams]
bams = list(map("-I {}".format, bams))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
"-R {snakemake.input.ref} {bams} "
"-ERC GVCF "
"-O {snakemake.output.gvcf} {known} {log}")
GATK SELECTVARIANTS¶
Run gatk SelectVariants.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_select:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.vcf"
log:
"logs/gatk/select/snvs.log"
params:
extra="--select-type-to-include SNP", # optional filter arguments, see GATK docs
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/selectvariants"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} -O {snakemake.output.vcf} {log}")
GATK VARIANTFILTRATION¶
Run gatk VariantFiltration.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
rule gatk_filter:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.filtered.vcf"
log:
"logs/gatk/filter/snvs.log"
params:
filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
extra="", # optional arguments, see GATK docs
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/variantfiltration"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
filters = ["--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
for name, expr in snakemake.params.filters.items()]
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} {filters} -O {snakemake.output.vcf} {log}")
GATK VARIANTRECALIBRATOR¶
Run gatk VariantRecalibrator.
Software dependencies¶
- gatk4 ==4.0.5.1
Example¶
This wrapper can be used in the following way:
from snakemake.remote import GS
# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()
def gatk_bundle(f):
return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))
rule haplotype_caller:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
# resources have to be given as named input files
hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
# use aux to e.g. download other necessary file
aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
output:
vcf="calls/all.recal.vcf",
tranches="calls/all.tranches"
log:
"logs/gatk/variantrecalibrator.log"
params:
mode="SNP", # set mode, must be either SNP, INDEL or BOTH
# resource parameter definition. Key must match named input files from above.
resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
"omni": {"known": False, "training": True, "truth": False, "prior": 12.0},
"g1k": {"known": False, "training": True, "truth": False, "prior": 10.0},
"dbsnp": {"known": True, "training": False, "truth": False, "prior": 2.0}},
annotation=["QD", "FisherStrand"], # which fields to use with -an (see VariantRecalibrator docs)
extra="", # optional
java_opts="", # optional
wrapper:
"0.31.0/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
def fmt_res(resname, resparams):
fmt_bool = lambda b: str(b).lower()
try:
f = snakemake.input.get(resname)
except KeyError:
raise RuntimeError("There must be a named input file for every resource (missing: {})".format(resname))
return "{},known={},training={},truth={},prior={}:{}".format(
resname, fmt_bool(resparams["known"]), fmt_bool(resparams["training"]),
fmt_bool(resparams["truth"]), resparams["prior"], f)
resources = ["--resource {}".format(fmt_res(resname, resparams))
for resname, resparams in snakemake.params["resources"].items()]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
tranches = "--tranches-file " + snakemake.output.tranches
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("gatk --java-options '{java_opts}' VariantRecalibrator {extra} {resources} "
"-R {snakemake.input.ref} -V {snakemake.input.vcf} "
"-mode {snakemake.params.mode} "
"--output {snakemake.output.vcf} "
"{tranches} {annotation} {log}")
HISAT2¶
Map reads with hisat2.
Software dependencies¶
- hisat2 ==2.1.0
- samtools ==1.5
Example¶
This wrapper can be used in the following way:
rule hisat2:
input:
reads=["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
output:
"mapped/{sample}.bam"
log: # optional
"logs/hisat2/{sample}.log"
params: # idx is required, extra is optional
idx="genome.fa",
extra="--min-intronlen 1000"
threads: 8 # optional, defaults to 1
wrapper:
"0.31.0/bio/hisat2"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The -S flag must not be used since output is already directly piped to samtools for compression.
- The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
- The wrapper does not yet handle SRA input accessions.
- No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()
# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
input_flags = "-1 {0} -2 {1}".format(*reads)
else:
raise RuntimeError(
"Reads parameter must contain at least 1 and at most 2"
" input files.")
# Executed shell command
shell(
"(hisat2 {extra} --threads {snakemake.threads}"
" -x {snakemake.params.idx} {input_flags}"
" | samtools view -Sbh -o {snakemake.output[0]} -)"
" {log}")
JANNOVAR¶
Annotate predicted effect of nucleotide changes with Jannovar
Software dependencies¶
- jannovar-cli ==0.25
Example¶
This wrapper can be used in the following way:
rule jannovar:
input:
vcf="{sample}.vcf",
pedigree="pedigree_ar.ped" # optional, contains familial relationships
output:
"jannovar/{sample}.vcf.gz"
log:
"logs/jannovar/{sample}.log"
params:
database="hg19_small.ser", # path to jannovar reference dataset
extra="--show-all" # optional parameters
wrapper:
"0.31.0/bio/jannovar"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
pedigree = snakemake.input.get("pedigree", "")
if pedigree:
pedigree = '--pedigree-file "%s"'%pedigree
shell("jannovar annotate-vcf --database {snakemake.params.database}"
" --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
" {pedigree} {extra} {log}")
LOFREQ¶
For lofreq, the following wrappers are available:
LOFREQ CALL¶
simply call variants
Software dependencies¶
- samtools ==1.6
- lofreq ==2.1.3.1
Example¶
This wrapper can be used in the following way:
rule lofreq:
input:
"data/{sample}.bam"
output:
"calls/{sample}.vcf"
log:
"logs/lofreq_call/{sample}.log"
params:
ref="data/genome.fasta",
extra=""
threads: 8
wrapper:
"0.31.0/bio/lofreq/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference must be provided")
bam_input = snakemake.input[0]
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
raise ValueError("Only expecting one output file: " + str(output_file) + "!")
shell(
"lofreq call-parallel "
" --pp-threads {snakemake.threads}"
" -f {ref}"
" {bam_input}"
" -o {output_file}"
" {extra}"
" {log}")
MINIMAP2¶
For minimap2, the following wrappers are available:
MINIMAP2¶
A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2
Software dependencies¶
- minimap2 ==2.5
Example¶
This wrapper can be used in the following way:
rule minimap2:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"]
output:
"aligned/{input1}_aln.paf"
log:
"logs/minimap2/{input1}.log"
params:
extra="-x map-pb" # optional
threads: 3
wrapper:
"0.31.0/bio/minimap2/aligner"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
inputQuery = " ".join(snakemake.input.query)
shell("(minimap2 -t {snakemake.threads} {extra} "
"{snakemake.input.target} {inputQuery} >"
"{snakemake.output[0]}) {log}")
MINIMAP2 INDEX¶
creates a minimap2 index
Software dependencies¶
- minimap2 ==2.5
Example¶
This wrapper can be used in the following way:
rule minimap2_index:
input:
target="target/{input1}.fasta"
output:
"{input1}.mmi"
log:
"logs/minimap2_index/{input1}.log"
params:
extra="" # optional additional args
threads: 3
wrapper:
"0.31.0/bio/minimap2/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("(minimap2 -t {snakemake.threads} {extra} "
"-d {snakemake.output[0]} {snakemake.input.target}) {log}")
MULTIQC¶
Generate qc report using multiqc.
Software dependencies¶
- multiqc ==1.2
- networkx <2.0
Example¶
This wrapper can be used in the following way:
rule multiqc:
input:
expand("samtools_stats/{sample}.txt", sample=["a", "b"])
output:
"qc/multiqc.html"
params:
"" # Optional: extra parameters for multiqc.
log:
"logs/multiqc.log"
wrapper:
"0.31.0/bio/multiqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"multiqc"
" {snakemake.params}"
" --force"
" -o {output_dir}"
" -n {output_name}"
" {input_dirs}"
" {log}")
NGS-DISAMBIGUATE¶
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.
Software dependencies¶
- ngs-disambiguate ==2016.11.10
- bamtools ==2.4.0
Example¶
This wrapper can be used in the following way:
rule disambiguate:
input:
a="mapped/{sample}.a.bam",
b="mapped/{sample}.b.bam"
output:
a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
a_disambiguated='disambiguate/{sample}.graft.bam',
b_disambiguated='disambiguate/{sample}.host.bam',
summary='qc/disambiguate/{sample}.txt'
params:
algorithm="bwa",
# optional: Prefix to use for output. If omitted, a
# suitable value is guessed from the output paths. Prefix
# is used for the intermediate output paths, as well as
# sample name in summary file.
prefix="{sample}",
extra=""
wrapper:
"0.31.0/bio/ngs-disambiguate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")
output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
prefix = path.splitext(path.basename(snakemake.output.summary))[0]
# Run command.
shell(
"ngs_disambiguate"
" {extra}"
" -o {output_dir}"
" -s {prefix}"
" -a {snakemake.params.algorithm}"
" {snakemake.input.a}"
" {snakemake.input.b}")
# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)
output_map = {
output_base + ".ambiguousSpeciesA.bam":
snakemake.output.a_ambiguous,
output_base + ".ambiguousSpeciesB.bam":
snakemake.output.b_ambiguous,
output_base + ".disambiguatedSpeciesA.bam":
snakemake.output.a_disambiguated,
output_base + ".disambiguatedSpeciesB.bam":
snakemake.output.b_disambiguated,
output_base + "_summary.txt":
snakemake.output.summary
}
for src, dest in output_map.items():
if src != dest:
shell('mv {src} {dest}')
PICARD¶
For picard, the following wrappers are available:
PICARD ADDORREPLACEREADGROUPS¶
Add or replace read groups with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule replace_rg:
input:
"mapped/{sample}.bam"
output:
"fixed-rg/{sample}.bam"
log:
"logs/picard/replace_rg/{sample}.log"
params:
"RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
wrapper:
"0.31.0/bio/picard/addorreplacereadgroups"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
"O={snakemake.output} &> {snakemake.log}")
PICARD COLLECTALIGNMENTSUMMARYMETRICS¶
Collect metrics on aligned reads with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
ref="genome.fasta",
bam="mapped/{sample}.bam"
output:
"stats/{sample}.summary.txt"
log:
"logs/picard/alignment-summary/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.31.0/bio/picard/collectalignmentsummarymetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell("picard CollectAlignmentSummaryMetrics {snakemake.params} "
"INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
"REFERENCE_SEQUENCE={snakemake.input.ref} {log}")
PICARD COLLECTHSMETRICS¶
Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule picard_collect_hs_metrics:
input:
bam="mapped/{sample}.bam",
reference="genome.fasta",
# Baits and targets should be given as interval lists. These can
# be generated from bed files using picard BedToIntervalList.
bait_intervals="regions.intervals",
target_intervals="regions.intervals"
output:
"stats/hs_metrics/{sample}.txt"
params:
# Optional extra arguments. Here we reduce sample size
# to reduce the runtime in our unit test.
"SAMPLE_SIZE=1000"
log:
"logs/picard_collect_hs_metrics/{sample}.log"
wrapper:
"0.31.0/bio/picard/collecthsmetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard CollectHSMetrics."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard CollectHsMetrics"
" {extra}"
" INPUT={snakemake.input.bam}"
" OUTPUT={snakemake.output[0]}"
" REFERENCE_SEQUENCE={snakemake.input.reference}"
" BAIT_INTERVALS={snakemake.input.bait_intervals}"
" TARGET_INTERVALS={snakemake.input.target_intervals}"
" {log}")
PICARD COLLECTINSERTSIZEMETRICS¶
Collect metrics on insert size of paired end reads with picard tools.
Software dependencies¶
- picard ==2.9.2
- r-base ==3.3.2
Example¶
This wrapper can be used in the following way:
rule insert_size:
input:
"mapped/{sample}.bam"
output:
txt="stats/{sample}.isize.txt",
pdf="stats/{sample}.isize.pdf"
log:
"logs/picard/insert_size/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.31.0/bio/picard/collectinsertsizemetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell("picard CollectInsertSizeMetrics {snakemake.params} "
"INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
"HISTOGRAM_FILE={snakemake.output.pdf} {log}")
PICARD CREATESEQUENCEDICTIONARY¶
Create a .dict file for a given FASTA file
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule create_dict:
input:
"genome.fasta"
output:
"genome.dict"
log:
"logs/picard/create_dict.log"
params:
extra="" # optional: extra arguments for picard.
wrapper:
"0.31.0/bio/picard/createsequencedictionary"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
'picard '
'CreateSequenceDictionary '
'{extra} '
'R={snakemake.input[0]} '
'O={snakemake.output[0]} '
'{log}')
PICARD MARKDUPLICATES¶
Mark PCR and optical duplicates with picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule mark_duplicates:
input:
"mapped/{sample}.bam"
output:
bam="dedup/{sample}.bam",
metrics="dedup/{sample}.metrics.txt"
log:
"logs/picard/dedup/{sample}.log"
params:
"REMOVE_DUPLICATES=true"
wrapper:
"0.31.0/bio/picard/markduplicates"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
"OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
"&> {snakemake.log}")
PICARD MERGESAMFILES¶
Merge sam/bam files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule merge_bams:
input:
expand("mapped/{sample}.bam", sample=["a", "b"])
output:
"merged.bam"
log:
"logs/picard_mergesamfiles.log"
params:
"VALIDATION_STRINGENCY=LENIENT"
wrapper:
"0.31.0/bio/picard/mergesamfiles"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard"
" MergeSamFiles"
" {snakemake.params}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}")
PICARD MERGEVCFS¶
Merge vcf files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule merge_vcfs:
input:
["snvs.chr1.vcf", "snvs.chr2.vcf"]
output:
"snvs.vcf"
log:
"logs/picard/mergevcfs.log"
params:
extra=""
wrapper:
"0.31.0/bio/picard/mergevcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(f) for f in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"picard"
" MergeVcfs"
" {extra}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}")
PICARD REVERTSAM¶
Reverts SAM or BAM files to a previous state. .
Software dependencies¶
- picard ==2.18.16
Example¶
This wrapper can be used in the following way:
rule revert_bam:
input:
"mapped/{sample}.bam"
output:
"revert/{sample}.bam"
log:
"logs/picard/revert_sam/{sample}.log"
params:
extra="SANITIZE=true" # optional: Extra arguments for picard.
wrapper:
"0.31.0/bio/picard/revertsam"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard RevertSam."""
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
'picard'
' RevertSam'
' {extra}'
' INPUT={snakemake.input[0]}'
' OUTPUT={snakemake.output[0]}'
' {log}')
PICARD SOMTOFASTQ¶
Converts a SAM or BAM file to FASTQ.
Software dependencies¶
- picard ==2.18.16
Example¶
This wrapper can be used in the following way:
rule bam_to_fastq:
input:
"mapped/{sample}.bam"
output:
fastq1="reads/{sample}.R1.fastq",
fastq2="reads/{sample}.R2.fastq"
log:
"logs/picard/sam_to_fastq/{sample}.log"
params:
extra="" # optional: Extra arguments for picard.
wrapper:
"0.31.0/bio/picard/samtofastq"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)
if not isinstance(fastq1, str):
raise ValueError("f1 needs to be provided")
output = " FASTQ=" + fastq1
if isinstance(fastq2, str):
output += " SECOND_END_FASTQ=" + fastq2
if isinstance(fastq_unpaired, str):
if not isinstance(fastq2, str):
raise ValueError("f2 is required if fastq_unpaired is set")
else:
output += " UNPAIRED_FASTQ=" + fastq_unpaired
shell(
'picard'
' SamToFastq'
' {extra}'
' INPUT={snakemake.input[0]}'
' {output}'
' {log}')
PICARD SORTSAM¶
Sort sam/bam files using picard tools.
Software dependencies¶
- picard ==2.9.2
Example¶
This wrapper can be used in the following way:
rule sort_bam:
input:
"mapped/{sample}.bam"
output:
"sorted/{sample}.bam"
log:
"logs/picard/sort_sam/{sample}.log"
params:
sort_order="coordinate",
extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
wrapper:
"0.31.0/bio/picard/sortsam"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
'picard'
' SortSam'
' {extra}'
' INPUT={snakemake.input[0]}'
' OUTPUT={snakemake.output[0]}'
' SORT_ORDER={snakemake.params.sort_order}'
' {log}')
PINDEL¶
For pindel, the following wrappers are available:
PINDEL¶
Call variants with pindel.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]
rule pindel:
input:
ref="genome.fasta",
# samples to call
samples=["mapped/a.bam"],
# bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
config="pindel_config.txt"
output:
expand("pindel/all_{type}", type=pindel_types)
params:
# prefix must be consistent with output files
prefix="pindel/all",
extra="" # optional parameters (except -i, -f, -o)
log:
"logs/pindel.log"
threads: 4
wrapper:
"0.31.0/bio/pindel/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
"-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}")
PINDEL2VCF¶
Convert pindel output to vcf.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
rule pindel2vcf:
input:
ref="genome.fasta",
pindel="pindel/all_{type}"
output:
"pindel/all_{type}.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.{type}.log"
wrapper:
"0.31.0/bio/pindel/pindel2vcf"
rule pindel2vcf_multi_input:
input:
ref="genome.fasta",
pindel=["pindel/all_D", "pindel/all_INV"]
output:
"pindel/all.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.log"
wrapper:
"0.31.0/bio/pindel/pindel2vcf"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
expected_endings = ['INT', 'D', 'SI', 'INV', 'INV_final' 'TD', 'LI', 'BP', 'CloseEndMapped','RP']
def split_file_name(file_parts, file_ending_index):
return "_".join(file_parts[:file_ending_index]), "_".join(file_parts[file_ending_index])
def process_input_path(input_file):
"""
:params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
:return: ""/path/to/file", "all"
"""
file_path, file_name = os.path.split(input_file)
file_parts = file_name.split("_")
#seperate ending and name, to name: all ending: D or name: all ending: INV_final
file_name, file_ending = split_file_name(file_parts, -2 if file_name.endswith("_final") else -1)
if not file_ending in expected_endings:
raise Exception("Unexpected variant type: " + file_ending)
return file_path, file_name
with tempfile.TemporaryDirectory() as tmpdirname:
input_flag = "-p"
input_file = snakemake.input.get("pindel")
if isinstance(input_file, list) and len(input_file) > 1:
input_flag = "-P"
input_path, input_name = process_input_path(input_file[0])
input_file = os.path.join(input_path,input_name)
for variant_input in snakemake.input.pindel:
if not variant_input.startswith(input_file):
raise Exception("Unable to extract common path from multi file input, expect path is: " + input_file)
if not os.path.isfile(variant_input):
raise Exception("Input \"" + input_file + "\" is not a file!")
os.symlink(os.path.abspath(variant_input),os.path.join(tmpdirname, os.path.basename(variant_input)))
input_file = os.path.join(tmpdirname,input_name)
shell("pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}")
RUBIC¶
RUBIC detects recurrent copy number alterations using copy number breaks.
Software dependencies¶
- r-base =3.4.1
- r-rubic =1.0.3
- r-data.table =1.10.4
- r-pracma =2.0.4
- r-ggplot2 =2.2.1
- r-gtable =0.2.0
- r-codetools =0.2_15
- r-digest =0.6.12
Example¶
This wrapper can be used in the following way:
rule rubic:
input:
seg="{samples}/segments.txt",
markers="{samples}/markers.txt"
output:
out_gains="{samples}/gains.txt",
out_losses="{samples}/losses.txt",
out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
params:
fdr="",
genefile=""
wrapper:
"0.31.0/bio/rubic"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Beatrice F. Tan
Code¶
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"
library(RUBIC)
all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]
rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])
SALMON¶
For salmon, the following wrappers are available:
SALMON_INDEX¶
Index a transcriptome assembly with salmon
Software dependencies¶
- salmon ==0.10.1
Example¶
This wrapper can be used in the following way:
rule salmon_index:
input:
"assembly/transcriptome.fasta"
output:
directory("salmon/transcriptome_index")
log:
"logs/salmon/transcriptome_index.log"
threads: 2
params:
# optional parameters
extra=""
wrapper:
"0.31.0/bio/salmon/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Index."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("salmon index -t {snakemake.input} -i {snakemake.output} "
" --threads {snakemake.threads} {extra} {log}" )
SALMON_QUANT¶
Quantify transcripts with salmon
Software dependencies¶
- salmon ==0.10.0
Example¶
This wrapper can be used in the following way:
rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1 = "reads/{sample}_1.fq.gz",
r2 = "reads/{sample}_2.fq.gz",
index = "salmon/transcriptome_index"
output:
quant = 'salmon/{sample}/quant.sf',
lib = 'salmon/{sample}/lib_format_counts.json'
log:
'logs/salmon/{sample}.log'
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
threads: 2
wrapper:
"0.31.0/bio/salmon/quant"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Quant"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
def manual_decompression (reads, zip_ext):
""" Allow *.bz2 input into salmon. Also provide same
decompression for *gz files, as salmon devs mention
it may be faster in some cases."""
if zip_ext and reads:
if zip_ext == 'bz2':
reads = ' < (bunzip2 -c ' + reads + ')'
elif zip_ext == 'gz':
reads = ' < (gunzip -c ' + reads + ')'
return reads
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")
r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")
assert (r1 is not None and r2 is not None) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
r1 = [snakemake.input.r1] if isinstance(snakemake.input.r1, str) else snakemake.input.r1
r2 = [snakemake.input.r2] if isinstance(snakemake.input.r2, str) else snakemake.input.r2
assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
r1_cmd = ' -1 ' + manual_decompression(" ".join(r1), zip_extension)
r2_cmd = ' -2 ' + manual_decompression(" ".join(r2), zip_extension)
read_cmd = " ".join([r1_cmd,r2_cmd])
if r:
assert r1 is None and r2 is None, "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
read_cmd = ' -r ' + manual_decompression(" ".join(r), zip_extension)
outdir = path.dirname(snakemake.output.get('quant'))
shell("salmon quant -i {snakemake.input.index} "
" -l {libtype} {read_cmd} -o {outdir} "
" -p {snakemake.threads} {extra} {log} ")
SAMBAMBA¶
For sambamba, the following wrappers are available:
SAMBAMBA SORT¶
Sort bam file with sambamba
Software dependencies¶
- sambamba ==0.6.6
Example¶
This wrapper can be used in the following way:
rule sambamba_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"" # optional parameters
threads: 8
wrapper:
"0.31.0/bio/sambamba/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
shell(
"sambamba sort {snakemake.params} -t {snakemake.threads} "
"-o {snakemake.output[0]} {snakemake.input[0]}")
SAMTOOLS¶
For samtools, the following wrappers are available:
SAMTOOLS BAM2FQ INTERLEAVED¶
Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.
Software dependencies¶
- samtools ==1.9
Example¶
This wrapper can be used in the following way:
rule samtools_bam2fq_interleaved:
input:
"mapped/{sample}.bam"
output:
"reads/{sample}.fq"
params:
" "
threads: 3
wrapper:
"0.31.0/bio/samtools/bam2fq/interleaved"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
shell(
"samtools bam2fq {snakemake.params} "
" -@ {snakemake.threads} "
" {snakemake.input[0]}"
" >{snakemake.output[0]} "
)
SAMTOOLS BAM2FQ SEPARATE¶
Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools bam2fq documentation), as are secondary (0x100) and supplementary reads (0x800).
Software dependencies¶
- samtools ==1.9
Example¶
This wrapper can be used in the following way:
rule samtools_bam2fq_separate:
input:
"mapped/{sample}.bam"
output:
"reads/{sample}.1.fq",
"reads/{sample}.2.fq"
params:
sort = "-m 4G",
bam2fq = "-n"
threads: 3
wrapper:
"0.31.0/bio/samtools/bam2fq/separate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
shell(
"samtools sort -n "
" -@ {snakemake.threads} "
" -T {prefix} "
" {snakemake.params.sort} "
" {snakemake.input[0]} | "
"samtools bam2fq "
" {snakemake.params.bam2fq} "
" -1 {snakemake.output[0]} "
" -2 {snakemake.output[1]} "
" -0 /dev/null "
" -s /dev/null "
" -F 0x900 "
" - "
)
SAMTOOLS FLAGSTAT¶
Use samtools to create a flagstat file from a bam or sam file.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_flagstat:
input: "mapped/{sample}.bam"
output: "mapped/{sample}.bam.flagstat"
wrapper:
"0.31.0/bio/samtools/flagstat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Preusch
Code¶
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS INDEX¶
Index bam file with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input: "mapped/{sample}.sorted.bam"
output: "mapped/{sample}.sorted.bam.bai"
params:
"" # optional params string
wrapper:
"0.31.0/bio/samtools/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE¶
Merge two bam files with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_merge:
input:
["mapped/A.bam", "mapped/B.bam"]
output:
"merged.bam"
params:
"" # optional additional parameters as string
threads: 8
wrapper:
"0.31.0/bio/samtools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools merge --threads {snakemake.threads} {snakemake.params} "
"{snakemake.output[0]} {snakemake.input}")
SAMTOOLS SORT¶
Sort bam file with samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.31.0/bio/samtools/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
shell(
"samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} "
"-T {prefix} {snakemake.input[0]}")
SAMTOOLS STATS¶
Generate stats using samtools.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_stats:
input:
"mapped/{sample}.bam"
output:
"samtools_stats/{sample}.txt"
params:
extra="", # Optional: extra arguments.
region="1:1000000-2000000" # Optional: region string.
log:
"logs/samtools_stats/{sample}.log"
wrapper:
"0.31.0/bio/samtools/stats"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("samtools stats {extra} {snakemake.input}"
" {region} > {snakemake.output} {log}")
SAMTOOLS VIEW¶
Convert or filter SAM/BAM.
Software dependencies¶
- samtools ==1.6
Example¶
This wrapper can be used in the following way:
rule samtools_view:
input:
"{sample}.sam"
output:
"{sample}.bam"
params:
"-b" # optional params string
wrapper:
"0.31.0/bio/samtools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SICKLE¶
For sickle, the following wrappers are available:
SICKLE PE¶
Trim paired-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
r1="input_R1.fq",
r2="input_R2.fq"
output:
r1="output_R1.fq",
r2="output_R2.fq",
rs="output_single.fq",
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
# optional log file
"logs/sickle/job.log"
wrapper:
"0.31.0/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
"-o {snakemake.output.r1} -p {snakemake.output.r2} "
"-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
"{extra}) {log}"
)
SICKLE SE¶
Trim single-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
"input_R1.fq"
output:
"output_R1.fq"
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
"logs/sickle/job.log"
wrapper:
"0.31.0/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
"-t {snakemake.params.qual_type} {extra}) {log}"
)
SNPEFF¶
Annotate predicted effect of nucleotide changes with SnpEff
Software dependencies¶
- snpeff ==4.3.1t
Example¶
This wrapper can be used in the following way:
rule snpeff:
input:
"{sample}.vcf",
output:
vcf="snpeff/{sample}.vcf", # the main output file, required
stats="snpeff/{sample}.html", # summary statistics (in HTML), optional
csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
log:
"logs/snpeff/{sample}.log"
params:
reference="ebola_zaire", # reference name (from `snpeff databases`)
extra="-Xmx4g" # optional parameters (e.g., max memory 4g)
wrapper:
"0.31.0/bio/snpeff"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
from os import path
import shutil
import tempfile
shell.executable("bash")
shell_command =("(snpEff {data_dir} {stats_opt} {csvstats_opt} {extra}"
" {snakemake.params.reference} {snakemake.input}"
" > {snakemake.output.vcf}) {log}")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
data_dir = snakemake.params.get("data_dir", "")
if data_dir:
data_dir = '-dataDir "%s"'%data_dir
stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = '' if not csvstats else '-csvStats {}'.format(csvstats)
stats_opt = '-noStats' if not stats else '-stats {}'.format(stats)
shell(shell_command)
#if stats:
# shutil.copy(path.join(stats_tempdir, 'stats'), stats)
#if genes:
# shutil.copy(path.join(stats_tempdir, 'stats.genes.txt'), genes)
SOURMASH¶
For sourmash, the following wrappers are available:
SOURMASH_COMPUTE¶
Build a MinHash signature for a transcriptome, genome, or reads
Software dependencies¶
- sourmash==2.0.0a7
Example¶
This wrapper can be used in the following way:
rule sourmash_reads:
input:
"reads/a.fastq"
output:
"reads.sig"
log:
"logs/sourmash/sourmash_compute_reads.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.31.0/bio/sourmash/compute"
rule sourmash_transcriptome:
input:
"assembly/transcriptome.fasta"
output:
"transcriptome.sig"
log:
"logs/sourmash/sourmash_compute_transcriptome.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.31.0/bio/sourmash/compute"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Lisa K. Johnson
Code¶
"""Snakemake wrapper for sourmash compute."""
__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled","1000")
k = snakemake.params.get("k","31")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
" {extra} {log}" )
STAR¶
For star, the following wrappers are available:
STAR¶
Map reads with STAR.
Software dependencies¶
- star ==2.5.3a
Example¶
This wrapper can be used in the following way:
rule star_pe_multi:
input:
# use a list for multiple fastq files for one sample
# usually technical replicates across lanes/flowcells
fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
# paired end reads needs to be ordered so each item in the two lists match
fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
output:
# see STAR manual for additional output files
"star/pe/{sample}/Aligned.out.bam"
log:
"logs/star/pe/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.31.0/bio/star/align"
rule star_se:
input:
fq1 = "reads/{sample}_R1.1.fastq"
output:
# see STAR manual for additional output files
"star/{sample}/Aligned.out.bam"
log:
"logs/star/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.31.0/bio/star/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = [snakemake.input.fq1] if isinstance(snakemake.input.fq1, str) else snakemake.input.fq1
fq2 = snakemake.input.get("fq2")
if fq2:
fq2 = [snakemake.input.fq2] if isinstance(snakemake.input.fq2, str) else snakemake.input.fq2
assert len(fq1) == len(fq2), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])
if fq1[0].endswith(".gz"):
readcmd = "--readFilesCommand zcat"
else:
readcmd = ""
outprefix = os.path.dirname(snakemake.output[0]) + "/"
shell(
"STAR "
"{extra} "
"--runThreadN {snakemake.threads} "
"--genomeDir {snakemake.params.index} "
"--readFilesIn {input_str} "
"{readcmd} "
"--outSAMtype BAM Unsorted "
"--outFileNamePrefix {outprefix} "
"--outStd Log "
"{log}")
TRIM_GALORE¶
For trim_galore, the following wrappers are available:
TRIM_GALORE-PE¶
Trim paired-end reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.5
Example¶
This wrapper can be used in the following way:
rule trim_galore_pe:
input:
["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
output:
"trimmed/{sample}.1_val_1.fq.gz",
"trimmed/{sample}.1.fastq.gz_trimming_report.txt",
"trimmed/{sample}.2_val_2.fq.gz",
"trimmed/{sample}.2.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.31.0/bio/trim_galore/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError("The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead.")
# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), \
"trim_galore can only output files to a single directory." \
" Please indicate only one directory for the output files."
shell(
"(trim_galore"
" {snakemake.params.extra}"
" --paired"
" -o {out_dir}"
" {snakemake.input})"
" {log}")
TRIM_GALORE-SE¶
Trim unpaired reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.3
Example¶
This wrapper can be used in the following way:
rule trim_galore_se:
input:
"reads/{sample}.fastq.gz"
output:
"trimmed/{sample}_trimmed.fq.gz",
"trimmed/{sample}.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.31.0/bio/trim_galore/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError("The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead.")
# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), \
"trim_galore can only output files to a single directory." \
" Please indicate only one directory for the output files."
shell(
"(trim_galore"
" {snakemake.params.extra}"
" -o {out_dir}"
" {snakemake.input})"
" {log}")
TRIMMOMATIC¶
For trimmomatic, the following wrappers are available:
TRIMMOMATIC PE¶
Trim paired-end reads with trimmomatic. (De)compress with pigz.
Software dependencies¶
- trimmomatic ==0.36
- pigz ==2.3.4
Example¶
This wrapper can be used in the following way:
rule trimmomatic_pe:
input:
r1="reads/{sample}.1.fastq.gz",
r2="reads/{sample}.2.fastq.gz"
output:
r1="trimmed/{sample}.1.fastq.gz",
r2="trimmed/{sample}.2.fastq.gz",
# reads where trimming entirely removed the mate
r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
compression_level="-9"
wrapper:
"0.31.0/bio/trimmomatic/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/pe
Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
def compose_input_gz(filename):
if filename.endswith(".gz"):
filename = "<(pigz --decompress --stdout {filename})".format(
filename=filename
)
return filename
def compose_output_gz(filename, compression_level="-5"):
if filename.endswith(".gz"):
return ">(pigz {compression_level} > {filename})".format(
compression_level=compression_level,
filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Collect files
input_r1 = compose_input_gz(snakemake.input.r1)
input_r2 = compose_input_gz(snakemake.input.r2)
output_r1 = compose_output_gz(snakemake.output.r1, compression_level)
output_r1_unp = compose_output_gz(snakemake.output.r1_unpaired, compression_level)
output_r2 = compose_output_gz(snakemake.output.r2, compression_level)
output_r2_unp = compose_output_gz(snakemake.output.r2_unpaired, compression_level)
shell(
"trimmomatic PE {extra} "
"{input_r1} {input_r2} "
"{output_r1} {output_r1_unp} "
"{output_r2} {output_r2_unp} "
"{trimmer} "
"{log}"
)
TRIMMOMATIC SE¶
Trim single-end reads with trimmomatic. (De)compress with pigz.
Software dependencies¶
- trimmomatic ==0.36
- pigz ==2.3.4
Example¶
This wrapper can be used in the following way:
rule trimmomatic:
input:
"reads/{sample}.fastq.gz" # input and output can be uncompressed or compressed
output:
"trimmed/{sample}.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
# optional compression levels from -0 to -9 and -11
compression_level="-9"
threads:
32
wrapper:
"0.31.0/bio/trimmomatic/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/se
Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
def compose_input_gz(filename):
if filename.endswith(".gz"):
filename = "<(pigz --decompress --stdout {filename})".format(
filename=filename
)
return filename
def compose_output_gz(filename, compression_level="-5"):
if filename.endswith(".gz"):
return ">(pigz {compression_level} > {filename})".format(
compression_level=compression_level,
filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Collect files
input = compose_input_gz(snakemake.input[0])
output = compose_output_gz(snakemake.output[0], compression_level)
shell("trimmomatic SE {extra} {input} {output} {trimmer} {log}")
TRINITY¶
Generate transcriptome assembly with Trinity
Software dependencies¶
- trinity ==2.6.6
Example¶
This wrapper can be used in the following way:
rule trinity:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
output:
"trinity_out_dir/Trinity.fasta"
log:
'logs/trinity/trinity.log'
params:
extra=""
threads: 4
wrapper:
"0.31.0/bio/trinity"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Trinity."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
max_memory = snakemake.params.get("max_memory", "10G")
#allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = [snakemake.input.left] if isinstance(snakemake.input.left, str) else snakemake.input.left
right = snakemake.input.get("right")
if right:
right = [snakemake.input.right] if isinstance(snakemake.input.right, str) else snakemake.input.right
assert len(left) >= len(right), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
input_str_left = ' --left ' + ",".join(left)
input_str_right = ' --right ' + ",".join(right)
else:
input_str_left = ' --single ' + ",".join(left)
input_str_right = ''
input_cmd = " ".join([input_str_left, input_str_right])
# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
if 'fq' in left[0] or 'fastq' in left[0]:
seqtype = 'fq'
elif 'fa' in left[0] or 'fasta' in left[0]:
seqtype = 'fa'
else: # assertion is redundant - warning or error instead?
assert seqtype is not None, "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"
outdir = path.dirname(snakemake.output[0])
assert 'trinity' in outdir, "output directory name must contain 'trinity'"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("Trinity {input_cmd} --CPU {snakemake.threads} "
" --max_memory {max_memory} --seqType {seqtype} "
" --output {outdir} {snakemake.params.extra} "
" {log}")
VCF¶
For vcf, the following wrappers are available:
COMPRESS VCF¶
Compress and index vcf file with bgzip and tabix.
Software dependencies¶
- htslib ==1.5
Example¶
This wrapper can be used in the following way:
rule compress_vcf:
input:
"{prefix}.vcf"
output:
"{prefix}.vcf.gz"
wrapper:
"0.31.0/bio/vcf/compress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("bgzip --stdout {snakemake.input} > {snakemake.output} && tabix -p vcf {snakemake.output}")
UNCOMPRESS VCF¶
Uncompress vcf file with bgzip.
Software dependencies¶
- htslib ==1.5
Example¶
This wrapper can be used in the following way:
rule uncompress_vcf:
input:
"{prefix}.vcf.gz"
output:
"{prefix}.vcf"
wrapper:
"0.31.0/bio/vcf/uncompress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("bgzip --decompress --stdout {snakemake.input} > {snakemake.output}")
VCFTOOLS¶
For vcftools, the following wrappers are available:
VCFTOOLS FILTER¶
Filter vcf files using vcftools
Software dependencies¶
- vcftools ==0.1.15
Example¶
This wrapper can be used in the following way:
rule filter_vcf:
input:
"{sample}.vcf"
output:
"{sample}.filtered.vcf"
params:
extra="--chr 1 --recode-INFO-all"
wrapper:
"0.31.0/bio/vcftools/filter"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
input_flag = "--gzvcf"
output = " > " + snakemake.output[0]
if output.endswith(".gz"):
output = " | gzip" + output
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell("vcftools "
"{input_flag} "
"{snakemake.input} "
"{extra} "
"--recode "
"--stdout "
"{output} "
"{log}")