The Snakemake Wrappers repository¶
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Usage¶
The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.2.0/bio/samtools/sort"
Here, Snakemake will automatically download the corresponding wrapper from https://github.com/snakemake/snakemake-wrappers/tree/0.2.0/bio/samtools/sort. Thereby, 0.2.0 can be replaced with the version tag you want to use, or a commit id. This ensures reproducibility since changes in the wrapper implementation won’t be propagated automatically to your workflow. Alternatively, e.g., for development, the wrapper directive can also point to full URLs, including the local file://
.
Each wrapper defines required software packages and versions. In combination with the --use-conda
flag of Snakemake, these will be deployed automatically.
Contribute¶
We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:
- Fork the repository: https://github.com/snakemake/snakemake-wrappers
- Clone your fork locally.
- Locally, create a new branch:
git checkout -b my-new-snakemake-wrapper
- Commit your contributions to that branch and push them to your fork:
git push -u origin my-new-snakemake-wrapper
- Create a pull request.
The pull request will be reviewed and included as fast as possible. Contributions should follow the coding style of the already present examples, i.e.:
- provide a
meta.yaml
with name, description and author(s) of the wrapper - provide an
environment.yaml
which lists all required software packages (the packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge. Other sustainable community maintained channels are possible as well.) - provide a minimal test case in a subfolder called
test
, with an exampleSnakefile
that shows how to use the wrapper, some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test intest.py
- follow the python style guide, using 4 spaces for indentation.
Testing locally¶
If you want to debug your contribution locally, before creating a pull request,
we recommend adding your test case to the start of the list in test.py
, so
that it runs first. Then, install miniconda with the channels as described for
bioconda and set up an
environment with the necessary dependencies and activate it:
conda create -n test-snakemake-wrappers snakemake pytest conda
conda activate test-snakemake-wrappers
Afterwards, from the main directory of the repo, you can run the tests with:
pytest test.py -v
If you use a keyboard interrupt after your test has failed, you will get all the relevant stdout and stderr messages printed.
If you also want to test the docs generation locally, create another environment and activate it:
conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml
conda activate test-snakemake-wrapper-docs
Then, enter the respective directory and build the docs:
cd docs
make html
If it runs through, you can open the main page at docs/_build/html/index.html
in a web browser. If you want to start fresh, you can clean up the build
with make clean
.
ARRIBA¶
Detect gene fusions from chimeric STAR output
Software dependencies¶
- arriba ==1.1.0
Example¶
This wrapper can be used in the following way:
rule arriba:
input:
# STAR bam containing chimeric alignments
bam="{sample}.bam",
# path to reference genome
genome="genome.fasta",
# path to annotation gtf
annotation="annotation.gtf",
output:
# approved gene fusions
fusions="fusions/{sample}.tsv",
# discarded gene fusions
discarded="fusions/{sample}.discarded.tsv" # optional
log:
"logs/arriba/{sample}.log"
params:
# arriba blacklist file
blacklist="blacklist.tsv", # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
# file containing known fusions
known_fusions="", # optional
# file containing information from structural variant analysis
sv_file="", # optional
# optional parameters
extra="-T -P -i 1,2"
threads: 1
wrapper:
"0.65.0/bio/arriba"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
discarded_cmd = "-O " + discarded_fusions
else:
discarded_cmd = ""
blacklist = snakemake.params.get("blacklist")
if blacklist:
blacklist_cmd = "-b " + blacklist
else:
blacklist_cmd = ""
known_fusions = snakemake.params.get("known_fusions")
if known_fusions:
known_cmd = "-k" + known_fusions
else:
known_cmd = ""
sv_file = snakemake.params.get("sv_file")
if sv_file:
sv_cmd = "-d" + sv_file
else:
sv_cmd = ""
shell(
"arriba "
"-x {snakemake.input.bam} "
"-a {snakemake.input.genome} "
"-g {snakemake.input.annotation} "
"{blacklist_cmd} "
"{known_cmd} "
"{sv_cmd} "
"-o {snakemake.output.fusions} "
"{discarded_cmd} "
"{extra} "
"{log}"
)
ART¶
For art, the following wrappers are available:
ART_PROFILER_ILLUMINA¶
Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.
Software dependencies¶
- art ==2016.06.05
Example¶
This wrapper can be used in the following way:
rule art_profiler_illumina:
input:
"data/{sample}.fq",
output:
"profiles/{sample}.txt"
log:
"logs/art_profiler_illumina/{sample}.log"
params: ""
threads: 2
wrapper:
"0.65.0/bio/art/profiler_illumina"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
import tempfile
import re
# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
# ensure that .fastq and .fastq.gz input files work, as well
filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")
# figure out the exact file extension after the above substitution
ext = re.search("fq(\.gz)?$", filename)
if ext:
fq_extension = ext.group(0)
else:
raise IOError(
"Incompatible extension: This art_profiler_illumina "
"wrapper requires input files with one of the following "
"extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
"your input and the invocation of the wrapper accordingly."
)
os.symlink(
# snakemake paths are relative, but the symlink needs to be absolute
os.path.abspath(snakemake.input[0]),
# the following awkward file name generation has reasons:
# * the file name needs to be unique to the execution of the
# rule, as art will create and mv temporary files with its basename
# in the output directory, which causes utter confusion when
# executing instances of the rule in parallel
# * temp file name cannot have any read infixes before the file
# extension, because otherwise art does read enumeration magic
# that messes up output file naming
os.path.join(
temp_input,
filename.replace(
"." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
),
),
)
# include output folder name in the profile_name command line argument and
# strip off the file extension, as art will add its own ".txt"
profile_name = os.path.join(
os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
)
shell(
"( art_profiler_illumina {snakemake.params} {profile_name}"
" {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
)
BAMTOOLS¶
For bamtools, the following wrappers are available:
BAMTOOLS FILTER¶
Filters BAM files. For more information about bamtools see bamtools documentation and bamtools source code.
Software dependencies¶
- bamtools ==2.5.1
Example¶
This wrapper can be used in the following way:
rule bamtools_filter:
input:
"{sample}.bam"
output:
"filtered/{sample}.bam"
params:
# optional parameters
tags = [ "NM:<4", "MQ:>=10" ], # list of key:value pair strings
min_size = "-2000",
max_size = "2000",
min_length = "10",
max_length = "20",
# to add more optional parameters (see bamtools filter --help):
additional_params = "-mapQuality \">=0\" -isMapped \"true\""
log:
"logs/bamtools/filtered/{sample}.log"
wrapper:
"0.65.0/bio/bamtools/filter"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# extract arguments
params = ""
extra_limits = ""
tags = snakemake.params.get("tags")
min_size = snakemake.params.get("min_size")
max_size = snakemake.params.get("max_size")
min_length = snakemake.params.get("min_length")
max_length = snakemake.params.get("max_length")
additional_params = snakemake.params.get("additional_params")
if tags and tags is not None:
params = params + " " + " ".join(map('-tag "{}"'.format, tags))
if min_size and min_size is not None:
params = params + ' -insertSize ">=' + min_size + '"'
if max_size and max_size is not None:
extra_limits = extra_limits + ' -insertSize "<=' + max_size + '"'
else:
if max_size and max_size is not None:
params = params + ' -insertSize "<=' + max_size + '"'
if min_length and min_length is not None:
params = params + ' -length ">=' + min_length + '"'
if max_length and max_length is not None:
extra_limits = extra_limits + ' -length "<=' + max_length + '"'
else:
if max_length and max_length is not None:
params = params + ' -length "<=' + max_length + '"'
if additional_params and additional_params is not None:
params = params + " " + additional_params
if extra_limits:
params = params + " | bamtools filter" + extra_limits
shell(
"(bamtools filter"
" -in {snakemake.input[0]}" + params + " -out {snakemake.output[0]}) {log}"
)
BAMTOOLS FILTER WITH JSON¶
Filters BAM files with JSON-script for filtering parameters and rules. For more information about bamtools see bamtools documentation and bamtools source code.
Software dependencies¶
- bamtools ==2.5.1
Example¶
This wrapper can be used in the following way:
rule bamtools_filter_json:
input:
"{sample}.bam"
output:
"filtered/{sample}.bam"
params:
json="filtering-rules.json",
region="" # optional parameter for defining a specific region, e.g. "chr1:500..chr3:750"
log:
"logs/bamtools/filtered/{sample}.log"
wrapper:
"0.65.0/bio/bamtools/filter_json"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
region = snakemake.params.get("region")
region_param = ""
if region and region is not None:
region_param = ' -region "' + region + '"'
shell(
"(bamtools filter"
" -in {snakemake.input[0]}"
" -out {snakemake.output[0]}"
+ region_param
+ " -script {snakemake.params.json}) {log}"
)
BAMTOOLS STATS¶
Use bamtools to collect statistics from a BAM file. For more information about bamtools see bamtools documentation and bamtools source code.
Software dependencies¶
- bamtools ==2.5.1
Example¶
This wrapper can be used in the following way:
rule bamtools_stats:
input:
"{sample}.bam"
output:
"{sample}.bamstats"
params:
"-insert" # optional summarize insert size data
log:
"logs/bamtools/stats/{sample}.log"
wrapper:
"0.65.0/bio/bamtools/stats"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(bamtools stats {snakemake.params} -in {snakemake.input[0]} > {snakemake.output[0]}) {log}"
)
BCFTOOLS¶
For bcftools, the following wrappers are available:
BCFTOOLS CALL¶
Call variants with bcftools call.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_call:
input:
pileup="{sample}.pileup.bcf",
output:
calls="{sample}.calls.bcf",
params:
caller="-m", # valid options include -c/--consensus-caller or -m/--multiallelic-caller
options="--ploidy 1 --prior 0.001",
log:
"logs/bcftools_call/{sample}.log",
wrapper:
"0.65.0/bio/bcftools/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Michael Hall
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
class CallerOptionError(Exception):
pass
valid_caller_opts = {"-c", "--consensus-caller", "-m", "--multiallelic-caller"}
caller_opt = snakemake.params.get("caller", "")
if caller_opt.strip() not in valid_caller_opts:
raise CallerOptionError(
"bcftools call expects either -m/--multiallelic-caller or "
"-c/--consensus-caller as caller option."
)
options = snakemake.params.get("options", "")
shell(
"bcftools call {options} {caller_opt} --threads {snakemake.threads} "
"-o {snakemake.output.calls} {snakemake.input.pileup} 2> {snakemake.log}"
)
BCFTOOLS CONCAT¶
Concatenate vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_concat:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.65.0/bio/bcftools/concat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools concat {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}"
)
BCFTOOLS INDEX¶
Index vcf/bcf file.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_index:
input:
"a.bcf"
output:
"a.bcf.csi"
params:
extra="" # optional parameters for bcftools index
wrapper:
"0.65.0/bio/bcftools/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
shell("bcftools index" " {extra}" " {snakemake.input[0]}")
BCFTOOLS MERGE¶
Merge vcf/bcf files with bcftools.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_merge:
input:
calls=["a.bcf", "b.bcf"]
output:
"all.bcf"
params:
"" # optional parameters for bcftools concat (except -o)
wrapper:
"0.65.0/bio/bcftools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools merge {snakemake.params} -o {snakemake.output[0]} "
"{snakemake.input.calls}"
)
BCFTOOLS MPILEUP¶
Generate VCF or BCF containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files with bcftools mpileup.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_mpileup:
input:
index="genome.fasta.fai",
ref="genome.fasta", # this can be left out if --no-reference is in options
alignments="mapped/{sample}.bam",
output:
pileup="pileups/{sample}.pileup.bcf",
params:
options="--max-depth 100 --min-BQ 15",
log:
"logs/bcftools_mpileup/{sample}.log",
wrapper:
"0.65.0/bio/bcftools/mpileup"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
class MissingReferenceError(Exception):
pass
options = snakemake.params.get("options", "")
# determine if a fasta reference is provided or not and add to options
if "--no-reference" not in options:
ref = snakemake.input.get("ref", "")
if not ref:
raise MissingReferenceError(
"The --no-reference option was not given, but no fasta reference was "
"provided."
)
options += " --fasta-ref {}".format(ref)
shell(
"bcftools mpileup {options} --threads {snakemake.threads} "
"--output {snakemake.output.pileup} "
"{snakemake.input.alignments} 2> {snakemake.log}"
)
BCFTOOLS NORM¶
Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule norm_vcf:
input:
"{prefix}.vcf"
output:
"{prefix}.vcf"
params:
"" # optional parameters for bcftools norm (except -o)
wrapper:
"0.65.0/bio/bcftools/norm"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Dayne Filer
Code¶
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools norm {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)
BCFTOOLS REHEADER¶
Change header or sample names of vcf/bcf file.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcftools_reheader:
input:
vcf="a.bcf",
## new header, can be omitted if "samples" is set
header="header.txt",
## file containing new sample names, can be omitted if "header" is set
samples="samples.tsv"
output:
"a.reheader.bcf"
params:
extra="", # optional parameters for bcftools reheader
view_extra="-O b" # add output format for internal bcftools view call
wrapper:
"0.65.0/bio/bcftools/reheader"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
header = snakemake.input.get("header", "")
if header:
header_cmd = "-h " + header
else:
header_cmd = ""
samples = snakemake.input.get("samples", "")
if samples:
samples_cmd = "-s " + samples
else:
samples_cmd = ""
extra = snakemake.params.get("extra", "")
view_extra = snakemake.params.get("view_extra", "")
shell(
"bcftools reheader "
"{extra} "
"{header_cmd} "
"{samples_cmd} "
"{snakemake.input.vcf} "
"| bcftools view "
"{view_extra} "
"> {snakemake.output}"
)
BCFTOOLS VIEW¶
View vcf/bcf file in a different format.
Software dependencies¶
- bcftools ==1.10
Example¶
This wrapper can be used in the following way:
rule bcf_to_vcf:
input:
"{prefix}.bcf"
output:
"{prefix}.vcf"
params:
"" # optional parameters for bcftools view (except -o)
wrapper:
"0.65.0/bio/bcftools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"bcftools view {snakemake.params} {snakemake.input[0]} " "-o {snakemake.output[0]}"
)
BEDTOOLS¶
For bedtools, the following wrappers are available:
COVERAGEBED¶
Returns the depth and breadth of coverage of features from B on the intervals in A.
Software dependencies¶
- bedtools ==2.29.0
Example¶
This wrapper can be used in the following way:
rule coverageBed:
input:
a="bed/{sample}.bed",
b="mapped/{sample}.bam"
output:
"stats/{sample}.cov"
log:
"logs/coveragebed/{sample}.log"
params:
extra="" # optional parameters
threads: 8
wrapper:
"0.65.0/bio/bedtools/coveragebed"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
input_a = snakemake.input.a
input_b = snakemake.input.b
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one file: " + str(output_file) + "!")
shell(
"coverageBed"
" -a {input_a}"
" -b {input_b}"
" {extra_params}"
" > {output_file}"
" {log}"
)
BEDTOOLS GENOMECOVERAGEBED¶
bedtools
’s genomeCoverageBed computes the coverage of a feature file as histograms, per-base reports or BEDGRAPH summaries among a given genome. For usage information about genomeCoverageBed, please see bedtools
’s documentation. For more information about bedtools
, also see the source code.
Software dependencies¶
- bedtools ==2.29.2
Example¶
This wrapper can be used in the following way:
rule genomecov_bam:
input:
"bam_input/{sample}.sorted.bam"
output:
"genomecov_bam/{sample}.genomecov"
log:
"logs/genomecov_bam/{sample}.log"
params:
"-bg" # optional parameters
wrapper:
"0.65.0/bio/bedtools/genomecov"
rule genomecov_bed:
input:
# for genome file format please see:
# https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format
bed="bed_input/{sample}.sorted.bed",
ref="bed_input/genome_file"
output:
"genomecov_bed/{sample}.genomecov"
log:
"logs/genomecov_bed/{sample}.log"
params:
"-bg" # optional parameters
wrapper:
"0.65.0/bio/bedtools/genomecov"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
genome = ""
input_file = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
input_file = "-ibam " + snakemake.input[0]
if len(snakemake.input) > 1:
if (os.path.splitext(snakemake.input[0])[-1]) == ".bed":
input_file = "-i " + snakemake.input.get("bed")
genome = "-g " + snakemake.input.get("ref")
shell(
"(genomeCoverageBed"
" {snakemake.params}"
" {input_file}"
" {genome}"
" > {snakemake.output[0]}) {log}"
)
BEDTOOLS INTERSECT¶
Intersect BED/BAM/VCF files with bedtools.
Software dependencies¶
- bedtools =2.29.0
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
left="A.bed",
right="B.bed"
output:
"A_B.intersected.bed"
params:
## Add optional parameters
extra="-wa -wb" ## In this example, we want to write original entries in A and B for each overlap.
log:
"logs/intersect/A_B.log"
wrapper:
"0.65.0/bio/bedtools/intersect"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(bedtools intersect"
" {extra}"
" -a {snakemake.input.left}"
" -b {snakemake.input.right}"
" > {snakemake.output})"
" {log}"
)
BEDTOOLS MERGE¶
Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.
Software dependencies¶
- bedtools =2.29.0
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
# Multiple bed-files can be added as list
"A.bed"
output:
"A.merged.bed"
params:
## Add optional parameters
extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
log:
"logs/merge/A.log"
wrapper:
"0.65.0/bio/bedtools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster, Felix Mölder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if len(snakemake.input) > 1:
if all(f.endswith(".gz") for f in snakemake.input):
cat = "zcat"
elif all(not f.endswith(".gz") for f in snakemake.input):
cat = "cat"
else:
raise ValueError("Input files must be all compressed or uncompressed.")
shell(
"({cat} {snakemake.input} | "
"sort -k1,1 -k2,2n | "
"bedtools merge {extra} "
"-i stdin > {snakemake.output}) "
" {log}"
)
else:
shell(
"( bedtools merge"
" {extra}"
" -i {snakemake.input}"
" > {snakemake.output})"
" {log}"
)
BEDTOOLS SLOP¶
Increase the size of each feature in a BED/BAM/VCF by a specified factor.
Software dependencies¶
- bedtools =2.29.0
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
"A.bed"
output:
"A.slop.bed"
params:
## Genome file, tab-seperated file defining the length of every contig
genome="genome.txt",
## Add optional parameters
extra = "-b 10" ## in this example, we want to increase the feature by 10 bases to both sides
log:
"logs/slop/A.log"
wrapper:
"0.65.0/bio/bedtools/slop"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(bedtools slop"
" {extra}"
" -i {snakemake.input[0]}"
" -g {snakemake.params.genome}"
" > {snakemake.output})"
" {log}"
)
BENCHMARK¶
For benchmark, the following wrappers are available:
CHM-EVAL¶
Evaluate given VCF file with chm-eval (https://github.com/lh3/CHM-eval) for benchmarking variant calling.
Software dependencies¶
- perl =5.26
Example¶
This wrapper can be used in the following way:
rule chm_eval:
input:
kit="resources/chm-eval-kit",
vcf="{sample}.vcf"
output:
summary="chm-eval/{sample}.summary", # summary statistics
bed="chm-eval/{sample}.err.bed.gz" # bed file with errors
params:
extra="",
build="38"
log:
"logs/chm-eval/{sample}.log"
wrapper:
"0.65.0/bio/benchmark/chm-eval"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
kit = snakemake.input.kit
vcf = snakemake.input.vcf
build = snakemake.params.build
extra = snakemake.params.get("extra", "")
if not snakemake.output[0].endswith(".summary"):
raise ValueError("Output file must end with .summary")
out = snakemake.output[0][:-8]
shell("({kit}/run-eval -g {build} -o {out} {extra} {vcf} | sh) {log}")
CHM-EVAL-KIT¶
Download CHM-eval kit (https://github.com/lh3/CHM-eval) for benchmarking variant calling.
Software dependencies¶
- curl
Example¶
This wrapper can be used in the following way:
rule chm_eval_kit:
output:
directory("resources/chm-eval-kit")
params:
# Tag and version must match, see https://github.com/lh3/CHM-eval/releases.
tag="v0.5",
version="20180222"
log:
"logs/chm-eval-kit.log"
cache: True
wrapper:
"0.65.0/bio/benchmark/chm-eval-kit"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = (
"https://github.com/lh3/CHM-eval/releases/"
"download/{tag}/CHM-evalkit-{version}.tar"
).format(version=snakemake.params.version, tag=snakemake.params.tag)
os.makedirs(snakemake.output[0])
shell("(curl -L {url} | tar --strip-components 1 -C {snakemake.output[0]} -xf -) {log}")
CHM-EVAL-SAMPLE¶
Download CHM-eval sample (https://github.com/lh3/CHM-eval) for benchmarking variant calling.
Software dependencies¶
- samtools =1.10
- curl
Example¶
This wrapper can be used in the following way:
rule chm_eval_sample:
output:
bam="resources/chm-eval-sample.bam",
bai="resources/chm-eval-sample.bam.bai"
params:
# Optionally only grab the first 100 records.
# This is for testing, remove next line to grab all records.
first_n=100
log:
"logs/chm-eval-sample.log"
wrapper:
"0.65.0/bio/benchmark/chm-eval-sample"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR134/ERR1341796/CHM1_CHM13_2.bam"
pipefail = ""
fmt = "-b"
prefix = snakemake.params.get("first_n", "")
if prefix:
prefix = "| head -n {} | samtools view -h -b".format(prefix)
fmt = "-h"
pipefail = "set +o pipefail"
shell(
"""
{pipefail}
{{
samtools view {fmt} {url} {prefix} > {snakemake.output.bam}
samtools index {snakemake.output.bam}
}} {log}
"""
)
else:
shell(
"""
{{
curl -L {url} > {snakemake.output.bam}
samtools index {snakemake.output.bam}
}} {log}
"""
)
BISMARK¶
For bismark, the following wrappers are available:
BAM2NUC¶
Calculate mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence composition (see https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
# Nucleotide stats for genome is required for further stats for BAM file
rule bam2nuc_for_genome:
input:
genome_fa="indexes/{genome}/{genome}.fa.gz"
output:
"indexes/{genome}/genomic_nucleotide_frequencies.txt"
log:
"logs/indexes/{genome}/genomic_nucleotide_frequencies.txt.log"
wrapper:
"0.65.0/bio/bismark/bam2nuc"
# Nucleotide stats for BAM file
rule bam2nuc_for_bam:
input:
genome_fa="indexes/{genome}/{genome}.fa.gz",
bam="bams/{sample}_{genome}.bam"
output:
report="bams/{sample}_{genome}.nucleotide_stats.txt"
log:
"logs/{sample}_{genome}.nucleotide_stats.txt.log"
wrapper:
"0.65.0/bio/bismark/bam2nuc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for bam2nuc tool that calculates mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence
composition."""
# https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
cmdline_args = ["bam2nuc {extra}"]
genome_fa = snakemake.input.get("genome_fa", None)
if not genome_fa:
raise ValueError("bismark/bam2nuc: Error 'genome_fa' input not specified.")
genome_folder = os.path.dirname(genome_fa)
cmdline_args.append("--genome_folder {genome_folder:q}")
bam = snakemake.input.get("bam", None)
if bam:
cmdline_args.append("{bam}")
bams = bam if isinstance(bam, list) else [bam]
report = snakemake.output.get("report", None)
if not report:
raise ValueError("bismark/bam2nuc: Error 'report' output isn't specified.")
reports = report if isinstance(report, list) else [report]
if len(reports) != len(bams):
raise ValueError(
"bismark/bam2nuc: Error number of paths in output:report ({} files)"
" should be same as in input:bam ({} files).".format(
len(reports), len(bams)
)
)
output_dir = os.path.dirname(reports[0])
if any(output_dir != os.path.dirname(p) for p in reports):
raise ValueError(
"bismark/bam2nuc: Error all reports should be in same directory:"
" {}".format(output_dir)
)
if output_dir:
cmdline_args.append("--dir {output_dir:q}")
else:
cmdline_args.append("--genomic_composition_only")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
# Move outputs into proper position.
if bam:
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
expected_2_actual_paths = []
for bam_path, report_path in zip(bams, reports):
bam_name = os.path.basename(bam_path)
bam_basename = os.path.splitext(bam_name)[0]
expected_2_actual_paths.append(
(
report_path,
os.path.join(
output_dir, "{}.nucleotide_stats.txt".format(bam_basename)
),
)
)
for (exp_path, actual_path) in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK¶
Align BS-Seq reads using Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
# Example: Pair-ended reads
rule bismark_pe:
input:
fq_1="reads/{sample}.1.fastq",
fq_2="reads/{sample}.2.fastq",
genome="indexes/{genome}/{genome}.fa",
bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
output:
bam="bams/{sample}_{genome}_pe.bam",
report="bams/{sample}_{genome}_PE_report.txt",
nucleotide_stats="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
bam_unmapped_1="bams/{sample}_{genome}_unmapped_reads_1.fq.gz",
bam_unmapped_2="bams/{sample}_{genome}_unmapped_reads_2.fq.gz",
ambiguous_1="bams/{sample}_{genome}_ambiguous_reads_1.fq.gz",
ambiguous_2="bams/{sample}_{genome}_ambiguous_reads_2.fq.gz"
log:
"logs/bams/{sample}_{genome}.log"
params:
# optional params string, e.g: -L32 -N0 -X400 --gzip
# Useful options to tune:
# (for bowtie2)
# -N: The maximum number of mismatches permitted in the "seed", i.e. the first L base pairs
# of the read (deafault: 1)
# -L: The "seed length" (deafault: 28)
# -I: The minimum insert size for valid paired-end alignments. ~ min fragment size filter (for
# PE reads)
# -X: The maximum insert size for valid paired-end alignments. ~ max fragment size filter (for
# PE reads)
# --gzip: Gzip intermediate fastq files
# --ambiguous --unmapped
# -p: bowtie2 parallel execution
# --multicore: bismark parallel execution
# --temp_dir: tmp dir for intermediate files instead of output directory
extra=' --ambiguous --unmapped --nucleotide_coverage',
basename='{sample}_{genome}'
wrapper:
"0.65.0/bio/bismark/bismark"
# Example: Single-ended reads
rule bismark_se:
input:
fq="reads/{sample}.fq.gz",
genome="indexes/{genome}/{genome}.fa",
bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
output:
bam="bams/{sample}_{genome}.bam",
report="bams/{sample}_{genome}_SE_report.txt",
nucleotide_stats="bams/{sample}_{genome}.nucleotide_stats.txt",
bam_unmapped="bams/{sample}_{genome}_unmapped_reads.fq.gz",
ambiguous="bams/{sample}_{genome}_ambiguous_reads.fq.gz"
log:
"logs/bams/{sample}_{genome}.log",
params:
# optional params string
extra=' --ambiguous --unmapped --nucleotide_coverage',
basename='{sample}_{genome}'
wrapper:
"0.65.0/bio/bismark/bismark"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for aligning methylation BS-Seq data using Bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from tempfile import TemporaryDirectory
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = os.path.basename(file_path)
split_ind = 2 if base.endswith(".gz") else 1
base = ".".join(base.split(".")[:-split_ind])
return base
extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark {extra} --bowtie2"]
outdir = os.path.dirname(snakemake.output.bam)
if outdir:
cmdline_args.append("--output_dir {outdir}")
genome_indexes_dir = os.path.dirname(snakemake.input.bismark_indexes_dir)
cmdline_args.append("{genome_indexes_dir}")
if not snakemake.output.get("bam", None):
raise ValueError("bismark/bismark: Error 'bam' output file isn't specified.")
if not snakemake.output.get("report", None):
raise ValueError("bismark/bismark: Error 'report' output file isn't specified.")
# basename
if snakemake.params.get("basename", None):
cmdline_args.append("--basename {snakemake.params.basename:q}")
basename = snakemake.params.basename
else:
basename = None
# reads input
single_end_mode = snakemake.input.get("fq", None)
if single_end_mode:
# for SE data, you only have to specify read1 input by -i or --in1, and
# specify read1 output by -o or --out1.
cmdline_args.append("--se {snakemake.input.fq:q}")
mode_prefix = "se"
if basename is None:
basename = basename_without_ext(snakemake.input.fq)
else:
# for PE data, you should also specify read2 input by -I or --in2, and
# specify read2 output by -O or --out2.
cmdline_args.append("-1 {snakemake.input.fq_1:q} -2 {snakemake.input.fq_2:q}")
mode_prefix = "pe"
if basename is None:
# default basename
basename = basename_without_ext(snakemake.input.fq_1) + "_bismark_bt2"
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
# Move outputs into proper position.
expected_2_actual_paths = [
(
snakemake.output.bam,
os.path.join(
outdir, "{}{}.bam".format(basename, "" if single_end_mode else "_pe")
),
),
(
snakemake.output.report,
os.path.join(
outdir,
"{}_{}_report.txt".format(basename, "SE" if single_end_mode else "PE"),
),
),
(
snakemake.output.get("nucleotide_stats", None),
os.path.join(
outdir,
"{}{}.nucleotide_stats.txt".format(
basename, "" if single_end_mode else "_pe"
),
),
),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (exp_path, actual_path) in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK2BEDGRAPH¶
Generate bedGraph and coverage files from positional methylation files created by bismark_methylation_extractor (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
# Example for CHG+CHH summary coverage:
rule bismark2bedGraph_noncpg:
input:
"meth/CHG_context_{sample}.txt.gz",
"meth/CHH_context_{sample}.txt.gz"
output:
bedGraph="meth_non_cpg/{sample}_non_cpg.bedGraph.gz",
cov="meth_non_cpg/{sample}_non_cpg.bismark.cov.gz"
log:
"logs/meth_non_cpg/{sample}_non_cpg.log"
params:
extra="--CX"
wrapper:
"0.65.0/bio/bismark/bismark2bedGraph"
# Example for CpG only coverage
rule bismark2bedGraph_cpg:
input:
"meth/CpG_context_{sample}.txt.gz"
output:
bedGraph="meth_cpg/{sample}_CpG.bedGraph.gz",
cov="meth_cpg/{sample}_CpG.bismark.cov.gz"
log:
"logs/meth_cpg/{sample}_CpG.log"
wrapper:
"0.65.0/bio/bismark/bismark2bedGraph"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark bismark2bedGraph tool."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
bedGraph = snakemake.output.get("bedGraph", "")
if not bedGraph:
raise ValueError("bismark/bismark2bedGraph: Please specify bedGraph output path")
params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark2bedGraph {params_extra}"]
dir_name = os.path.dirname(bedGraph)
if dir_name:
cmdline_args.append("--dir {dir_name}")
fname = os.path.basename(bedGraph)
cmdline_args.append("--output {fname}")
cmdline_args.append("{snakemake.input}")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
BISMARK2REPORT¶
Generate graphical HTML report from Bismark reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2report).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
# Example: Pair-ended reads
rule bismark2report_pe:
input:
alignment_report="bams/{sample}_{genome}_PE_report.txt",
nucleotide_report="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
dedup_report="bams/{sample}_{genome}_pe.deduplication_report.txt",
mbias_report="meth/{sample}_{genome}_pe.deduplicated.M-bias.txt",
splitting_report="meth/{sample}_{genome}_pe.deduplicated_splitting_report.txt"
output:
html="qc/meth/{sample}_{genome}.bismark2report.html",
log:
"logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
params:
skip_optional_reports=True
wrapper:
"0.65.0/bio/bismark/bismark2report"
# Example: Single-ended reads
rule bismark2report_se:
input:
alignment_report="bams/{sample}_{genome}_SE_report.txt",
nucleotide_report="bams/{sample}_{genome}.nucleotide_stats.txt",
dedup_report="bams/{sample}_{genome}.deduplication_report.txt",
mbias_report="meth/{sample}_{genome}.deduplicated.M-bias.txt",
splitting_report="meth/{sample}_{genome}.deduplicated_splitting_report.txt"
output:
html="qc/meth/{sample}_{genome}.bismark2report.html",
log:
"logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
params:
skip_optional_reports=True
wrapper:
"0.65.0/bio/bismark/bismark2report"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper to generate graphical HTML report from Bismark reports."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2report
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
def answer2bool(v):
return str(v).lower() in ("yes", "true", "t", "1")
extra = snakemake.params.get("extra", "")
cmds = ["bismark2report {extra}"]
# output
html_file = snakemake.output.get("html", "")
output_dir = snakemake.output.get("html_dir", None)
if output_dir is None:
if html_file:
output_dir = os.path.dirname(html_file)
else:
if html_file:
raise ValueError(
"bismark/bismark2report: Choose one: 'html=...' for a single dir or 'html_dir=...' for batch processing."
)
if output_dir is None:
raise ValueError(
"bismark/bismark2report: Output file or directory not specified. "
"Use 'html=...' for a single dir or 'html_dir=...' for batch "
"processing."
)
if output_dir:
cmds.append("--dir {output_dir:q}")
if html_file:
html_file_name = os.path.basename(html_file)
cmds.append("--output {html_file_name:q}")
# reports
reports = [
"alignment_report",
"dedup_report",
"splitting_report",
"mbias_report",
"nucleotide_report",
]
skip_optional_reports = answer2bool(
snakemake.params.get("skip_optional_reports", False)
)
for report_name in reports:
path = snakemake.input.get(report_name, "")
if path:
locals()[report_name] = path
cmds.append("--{0} {{{1}:q}}".format(report_name, report_name))
elif skip_optional_reports:
cmds.append("--{0} 'none'".format(report_name))
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")
# run shell command:
shell(" ".join(cmds))
BISMARK2SUMMARY¶
Generate summary graphical HTML report from several Bismark text report files reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
import os
rule bismark2summary:
input:
bam=["bams/a_genome_pe.bam", "bams/b_genome.bam"],
# Bismark `bismark2summary` discovers reports automatically based
# on files available in bam file containing folder
#
# If your per BAM file reports aren't in the same folder
# you will need an additional task which symlinks all reports
# (E.g. your splitting report generated by `bismark_methylation_extractor`
# tool is in `meth` folder, and alignment related reports in `bams` folder)
# These dependencies are here just to ensure that corresponding rules
# has already finished at rule execution time, otherwise some reports
# will be missing.
dependencies=[
"bams/a_genome_PE_report.txt",
"bams/a_genome_pe.deduplication_report.txt",
# for example splitting report is missing for 'a' sample
"bams/b_genome_SE_report.txt",
"bams/b_genome.deduplication_report.txt",
"bams/b_genome.deduplicated_splitting_report.txt"
]
output:
html="qc/{experiment}.bismark2summary.html",
txt="qc/{experiment}.bismark2summary.txt"
log:
"logs/qc/{experiment}.bismark2summary.log"
wrapper:
"0.65.0/bio/bismark/bismark2summary"
rule bismark2summary_prepare_symlinks:
input:
"meth/b_genome.deduplicated_splitting_report.txt",
output:
temp("bams/b_genome.deduplicated_splitting_report.txt"),
log:
"qc/bismark2summary_prepare_symlinks.symlinks.log"
run:
wd = os.getcwd()
shell("echo 'Making symlinks' > {log}")
for source, target in zip(input, output):
target_dir = os.path.dirname(target)
target_name = os.path.basename(target)
log_path = os.path.join(wd, log[0])
abs_src_path = os.path.abspath(source)
shell("cd {target_dir} && ln -f -s {abs_src_path} {target_name} >> {log_path} 2>&1")
shell("echo 'Done' >> {log}")
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper to generate summary graphical HTML report from several Bismark text report files."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
cmds = ["bismark2summary {extra}"]
# basename
bam = snakemake.input.get("bam", None)
if not bam:
raise ValueError(
"bismark/bismark2summary: Please specify aligned BAM file path"
" (one or several) using 'bam=..'"
)
html = snakemake.output.get("html", None)
txt = snakemake.output.get("txt", None)
if not html or not txt:
raise ValueError(
"bismark/bismark2summary: Please specify both 'html=..' and"
" 'txt=..' paths in output section"
)
basename, ext = os.path.splitext(html)
if ext.lower() != ".html":
raise ValueError(
"bismark/bismark2summary: HTML report file should end"
" with suffix '.html' but was {} ({})".format(ext, html)
)
suggested_txt = basename + ".txt"
if suggested_txt != txt:
raise ValueError(
"bismark/bismark2summary: Expected '{}' TXT report, "
"but was: '{}'".format(suggested_txt, txt)
)
cmds.append("--basename {basename:q}")
# title
title = snakemake.params.get("title", None)
if title:
cmds.append("--title {title:q}")
cmds.append("{bam}")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")
# run shell command:
shell(" ".join(cmds))
BISMARK_GENOME_PREPARATION¶
Generate indexes for Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
# For *.fa file
rule bismark_genome_preparation_fa:
input:
"indexes/{genome}/{genome}.fa"
output:
directory("indexes/{genome}/Bisulfite_Genome")
log:
"logs/indexes/{genome}/Bisulfite_Genome.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/bismark/bismark_genome_preparation"
# Fo *.fa.gz file:
rule bismark_genome_preparation_fa_gz:
input:
"indexes/{genome}/{genome}.fa.gz"
output:
directory("indexes/{genome}/Bisulfite_Genome")
log:
"logs/indexes/{genome}/Bisulfite_Genome.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/bismark/bismark_genome_preparation"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark indexes preparing using bismark_genome_preparation."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
input_dir = path.dirname(snakemake.input[0])
params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bismark_genome_preparation --verbose --bowtie2 {params_extra} {input_dir} {log}")
BISMARK_METHYLATION_EXTRACTOR¶
Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
- perl-gdgraph == 1.54
Example¶
This wrapper can be used in the following way:
rule bismark_methylation_extractor:
input: "bams/{sample}.bam"
output:
mbias_r1="qc/meth/{sample}.M-bias_R1.png",
# Only for PE BAMS:
# mbias_r2="qc/meth/{sample}.M-bias_R2.png",
mbias_report="meth/{sample}.M-bias.txt",
splitting_report="meth/{sample}_splitting_report.txt",
# 1-based start, 1-based end ('inclusive') methylation info: % and counts
methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
# BedGraph with methylation percentage: 0-based start, end exclusive
methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",
# Primary output files: methylation status at each read cytosine position: (extremely large)
read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
# * You could merge CHG, CHH using: --merge_non_CpG
read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
log:
"logs/meth/{sample}.log"
params:
output_dir="meth", # optional output dir
extra="--gzip --comprehensive --bedGraph" # optional params string
wrapper:
"0.65.0/bio/bismark/bismark_methylation_extractor"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]
# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
cmdline_args.append("-o {output_dir:q}")
# trimming options
trimming_options = [
"ignore", # meth_bias_r1_5end
"ignore_3prime", # meth_bias_r1_3end
"ignore_r2", # meth_bias_r2_5end
"ignore_3prime_r2", # meth_bias_r2_3end
]
for key in trimming_options:
value = snakemake.params.get(key, None)
if value:
cmdline_args.append("--{} {}".format(key, value))
# Input
cmdline_args.append("{snakemake.input}")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
key2prefix_suffix = [
("mbias_report", ("", ".M-bias.txt")),
("mbias_r1", ("", ".M-bias_R1.png")),
("mbias_r2", ("", ".M-bias_R2.png")),
("splitting_report", ("", "_splitting_report.txt")),
("methylome_CpG_cov", ("", ".bismark.cov.gz")),
("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (key, (prefix, suffix)) in key2prefix_suffix:
exp_path = snakemake.output.get(key, None)
if exp_path:
if len(snakemake.input) != 1:
raise ValueError(
"bismark/bismark_methylation_extractor: Error: only one BAM file is"
" expected in input, but was <{}>".format(snakemake.input)
)
bam_file = snakemake.input[0]
bam_name = os.path.basename(bam_file)
bam_wo_ext = os.path.splitext(bam_name)[0]
actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
if exp_path != actual_path:
shell("mv {actual_path:q} {exp_path:q} {log_append}")
DEDUPLICATE_BISMARK¶
Deduplicate Bismark Bam Files and saves as *.bam file (see https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark).
Software dependencies¶
- bowtie2 == 2.3.4.3
- bismark == 0.22.1
- samtools == 1.9
Example¶
This wrapper can be used in the following way:
rule deduplicate_bismark:
input: "bams/a_genome_pe.bam"
output:
bam="bams/{sample}.deduplicated.bam",
report="bams/{sample}.deduplication_report.txt",
log:
"logs/bams/{sample}.deduplicated.log",
params:
extra="" # optional params string
wrapper:
"0.65.0/bio/bismark/deduplicate_bismark"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark aligned reads deduplication using deduplicate_bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
bam_path = snakemake.output.get("bam", None)
report_path = snakemake.output.get("report", None)
if not bam_path or not report_path:
raise ValueError(
"bismark/deduplicate_bismark: Please specify both 'bam=..' and 'report=..' paths in output section"
)
output_dir = os.path.dirname(bam_path)
if output_dir != os.path.dirname(report_path):
raise ValueError(
"bismark/deduplicate_bismark: BAM and Report files expected to have the same parent directory"
" but was {} and {}".format(bam_path, report_path)
)
arg_output_dir = "--output_dir '{}'".format(output_dir) if output_dir else ""
arg_multiple = "--multiple" if len(snakemake.input) > 1 else ""
params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
"deduplicate_bismark {params_extra} --bam {arg_multiple}"
" {arg_output_dir} {snakemake.input} {log}"
)
# Move outputs into proper position.
fst_input_filename = os.path.basename(snakemake.input[0])
fst_input_basename = os.path.splitext(fst_input_filename)[0]
prefix = os.path.join(output_dir, fst_input_basename)
deduplicated_bam_actual_name = prefix + ".deduplicated.bam"
if arg_multiple:
# bismark does it exactly like this:
deduplicated_bam_actual_name = deduplicated_bam_actual_name.replace(
"deduplicated", "multiple.deduplicated", 1
)
expected_2_actual_paths = [
(bam_path, deduplicated_bam_actual_name),
(
report_path,
prefix + (".multiple" if arg_multiple else "") + ".deduplication_report.txt",
),
]
for (exp_path, actual_path) in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BOWTIE2¶
For bowtie2, the following wrappers are available:
BOWTIE2¶
Map reads with bowtie2.
Software dependencies¶
- bowtie2 ==2.4.1
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule bowtie2:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bowtie2/{sample}.log"
params:
index="index/genome", # prefix of reference genome index (built with bowtie2-build)
extra="" # optional parameters
threads: 8 # Use at least two threads
wrapper:
"0.65.0/bio/bowtie2/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "-U {}".format(*snakemake.input.sample)
else:
reads = "-1 {} -2 {}".format(*snakemake.input.sample)
shell(
"(bowtie2 --threads {snakemake.threads} {snakemake.params.extra} "
"-x {snakemake.params.index} {reads} "
"| samtools view -Sbh -o {snakemake.output[0]} -) {log}"
)
BUSCO¶
Assess assembly and annotation completeness with BUSCO
Software dependencies¶
- python ==3.6
- busco
Example¶
This wrapper can be used in the following way:
rule run_busco:
input:
"sample_data/target.fa"
output:
"txome_busco/full_table_txome_busco.tsv",
log:
"logs/quality/transcriptome_busco.log"
threads: 8
params:
mode="transcriptome",
lineage_path="sample_data/example",
# optional parameters
extra=""
wrapper:
"0.65.0/bio/busco"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for BUSCO assessment"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode is not None, "please input a run mode: genome, transcriptome or proteins"
lineage = snakemake.params.get("lineage_path")
assert lineage is not None, "please input the path to a lineage for busco assessment"
# busco does not allow you to direct output location: handle this by moving output
outdir = path.dirname(snakemake.output[0])
if "/" in outdir:
out_name = path.basename(outdir)
else:
out_name = outdir
# note: --force allows snakemake to handle rewriting files as necessary
# without needing to specify *all* busco outputs as snakemake outputs
shell(
"run_busco --in {snakemake.input} --out {out_name} --force "
" --cpu {snakemake.threads} --mode {mode} --lineage {lineage} "
" {extra} {log}"
)
busco_outname = "run_" + out_name
# move to intended location
shell("cp -r {busco_outname}/* {outdir}")
shell("rm -rf {busco_outname}")
BWA¶
For bwa, the following wrappers are available:
BWA ALN¶
Map reads with bwa aln.
Software dependencies¶
- bwa ==0.7.17
Example¶
This wrapper can be used in the following way:
rule bwa_aln:
input:
"reads/{sample}.{pair}.fastq"
output:
"sai/{sample}.{pair}.sai"
params:
index="genome",
extra=""
log:
"logs/bwa_aln/{sample}.{pair}.log"
threads: 8
wrapper:
"0.65.0/bio/bwa/aln"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa aln."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"bwa aln"
" {extra}"
" -t {snakemake.threads}"
" {snakemake.params.index}"
" {snakemake.input[0]}"
" > {snakemake.output[0]} {log}"
)
BWA INDEX¶
Creates a BWA index.
Software dependencies¶
- bwa ==0.7.17
Example¶
This wrapper can be used in the following way:
rule bwa_index:
input:
"{genome}.fasta"
output:
"{genome}.amb",
"{genome}.ann",
"{genome}.bwt",
"{genome}.pac",
"{genome}.sa"
log:
"logs/bwa_index/{genome}.log"
params:
prefix="{genome}",
algorithm="bwtsw"
wrapper:
"0.65.0/bio/bwa/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
raise ValueError("Only one reference genome can be inputed!")
# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")
if len(prefix) > 0:
prefix = "-p " + prefix
# Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")
if len(construction_algorithm) != 0:
construction_algorithm = "-a " + construction_algorithm
shell(
"bwa index" " {prefix}" " {construction_algorithm}" " {snakemake.input[0]}" " {log}"
)
BWA MEM¶
Map reads using bwa mem, with optional sorting using samtools or picard.
Software dependencies¶
- bwa ==0.7.17
- samtools ==1.9
- picard ==2.20.1
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bwa_mem/{sample}.log"
params:
index="genome",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="" # Extra args for samtools/picard.
threads: 8
wrapper:
"0.65.0/bio/bwa/mem"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER¶
Map reads using bwa mem, mark duplicates by samblaster and sort and index by sambamba.
Software dependencies¶
- bwa ==0.7.17
- sambamba ==0.7.1
- samblaster ==0.1.24
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
bam="mapped/{sample}.bam",
index="mapped/{sample}.bam.bai"
log:
"logs/bwa_mem_sambamba/{sample}.log"
params:
index="genome",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort_extra="" # Extra args for sambamba.
threads: 8
wrapper:
"0.65.0/bio/bwa/mem-samblaster"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | samblaster"
" {samblaster_extra}"
" | sambamba view -S -f bam /dev/stdin"
" -t {snakemake.threads}"
" | sambamba sort /dev/stdin"
" -t {snakemake.threads}"
" -o {snakemake.output.bam}"
" {sort_extra}"
") {log}"
)
BWA SAMPE¶
Map paired-end reads with bwa sampe.
Software dependencies¶
- bwa ==0.7.17
- samtools ==1.9
- picard ==2.20.1
Example¶
This wrapper can be used in the following way:
rule bwa_sampe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"]
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_sampe/{sample}.log"
wrapper:
"0.65.0/bio/bwa/sampe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Check inputs.
if not len(snakemake.input.sai) == 2:
raise ValueError("input.sai must have 2 elements")
if not len(snakemake.input.fastq) == 2:
raise ValueError("input.fastq must have 2 elements")
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa sampe"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}"
)
BWA SAMSE¶
Map single-end reads with bwa samse.
Software dependencies¶
- bwa ==0.7.17
- samtools ==1.9
- picard ==2.20.1
Example¶
This wrapper can be used in the following way:
rule bwa_samse:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai"
output:
"mapped/{sample}.bam"
params:
index="genome",
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="" # optional: extra arguments for samtools/picard
log:
"logs/bwa_samse/{sample}.log"
wrapper:
"0.65.0/bio/bwa/samse"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa samse"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}"
)
BWA-MEM2¶
For bwa-mem2, the following wrappers are available:
BWA-MEM2 INDEX¶
Creates a bwa-mem2 index.
Software dependencies¶
- bwa-mem2 ==2.0
Example¶
This wrapper can be used in the following way:
rule bwa_mem2_index:
input:
"{genome}"
output:
"{genome}.0123",
"{genome}.amb",
"{genome}.ann",
"{genome}.bwt.2bit.64",
"{genome}.bwt.8bit.32",
"{genome}.pac",
log:
"logs/bwa-mem2_index/{genome}.log"
params:
prefix="{genome}"
wrapper:
"0.65.0/bio/bwa-mem2/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
- Patrik Smeds
Code¶
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2020, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
raise ValueError("Please provide exactly one reference genome as input.")
# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", "")
if len(prefix) > 0:
prefix = "-p " + prefix
shell("bwa-mem2 index" " {prefix}" " {snakemake.input[0]}" " {log}")
BWA-MEM2¶
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. Optional sorting using samtools or picard.
Software dependencies¶
- bwa-mem2 ==2.0
- samtools ==1.10
- picard ==2.23
Example¶
This wrapper can be used in the following way:
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/bwa_mem2/{sample}.log"
params:
index="genome.fasta",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="" # Extra args for samtools/picard.
threads: 8
wrapper:
"0.65.0/bio/bwa-mem2/mem"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
"Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER¶
Map reads using bwa-mem2, mark duplicates by samblaster and sort and index by sambamba.
Software dependencies¶
- bwa-mem2 ==2.0
- sambamba ==0.7.1
- samblaster ==0.1.24
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
bam="mapped/{sample}.bam",
index="mapped/{sample}.bam.bai"
log:
"logs/bwa_mem2_sambamba/{sample}.log"
params:
index="genome.fasta",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort_extra="-q" # Extra args for sambamba.
threads: 8
wrapper:
"0.65.0/bio/bwa-mem2/mem-samblaster"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" {extra}"
" {snakemake.params.index}"
" {snakemake.input.reads}"
" | samblaster"
" {samblaster_extra}"
" | sambamba view -S -f bam /dev/stdin"
" -t {snakemake.threads}"
" | sambamba sort /dev/stdin"
" -t {snakemake.threads}"
" -o {snakemake.output.bam}"
" {sort_extra}"
") {log}"
)
CAIROSVG¶
Convert SVG files with cairosvg.
Software dependencies¶
- cairosvg =2.4.2
Example¶
This wrapper can be used in the following way:
rule:
input:
"{prefix}.svg"
output:
"{prefix}.{fmt,(pdf|png)}"
wrapper:
"0.65.0/utils/cairosvg"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
_, ext = os.path.splitext(snakemake.output[0])
if ext not in (".png", ".pdf", ".ps", ".svg"):
raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]
shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")
CLUSTALO¶
Multiple alignment of nucleic acid and protein sequences.
Software dependencies¶
- clustalo ==1.2.4
Example¶
This wrapper can be used in the following way:
rule clustalo:
input:
"{sample}.fa"
output:
"{sample}.msa.fa"
params:
extra=""
log:
"logs/clustalo/test/{sample}.log"
threads: 8
wrapper:
"0.65.0/bio/clustalo"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for clustal omega."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"clustalo {extra}"
" --threads={snakemake.threads}"
" --in {snakemake.input[0]}"
" --out {snakemake.output[0]} "
" {log}"
)
CUTADAPT¶
For cutadapt, the following wrappers are available:
CUTADAPT-PE¶
Trim paired-end reads using cutadapt.
Software dependencies¶
- cutadapt ==2.10
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
fastq1="trimmed/{sample}.1.fastq",
fastq2="trimmed/{sample}.2.fastq",
qc="trimmed/{sample}.qc.txt"
params:
# https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
adapters = "-a AGAGCACACGTCTGAACTCCAGTCAC -g AGATCGGAAGAGCACACGT -A AGAGCACACGTCTGAACTCCAGTCAC -G AGATCGGAAGAGCACACGT",
# https://cutadapt.readthedocs.io/en/stable/guide.html#
others = "--minimum-length 1 -q 20"
log:
"logs/cutadapt/{sample}.log"
threads: 4 # set desired number of threads here
wrapper:
"0.65.0/bio/cutadapt/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
- David Laehnemann
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params.adapters}"
" {snakemake.params.others}"
" -o {snakemake.output.fastq1}"
" -p {snakemake.output.fastq2}"
" -j {snakemake.threads}"
" {snakemake.input}"
" > {snakemake.output.qc} {log}"
)
CUTADAPT-SE¶
Trim single-end reads using cutadapt.
Software dependencies¶
- cutadapt ==2.10
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
"reads/{sample}.fastq"
output:
fastq="trimmed/{sample}.fastq",
qc="trimmed/{sample}.qc.txt"
params:
"-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20"
log:
"logs/cutadapt/{sample}.log"
threads: 4 # set desired number of threads here
wrapper:
"0.65.0/bio/cutadapt/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cutadapt"
" {snakemake.params}"
" -j {snakemake.threads}"
" -o {snakemake.output.fastq}"
" {snakemake.input[0]}"
" > {snakemake.output.qc} {log}"
)
DEEPTOOLS¶
For deeptools, the following wrappers are available:
DEEPTOOLS COMPUTEMATRIX¶
deepTools computeMatrix
calculates scores per genomic region. The matrix file can be used as input for other tools or for the generation of a deepTools plotHeatmap
or deepTools plotProfiles
. For usage information about deepTools computeMatrix
, please see the documentation. For more information about deepTools
, also see the source code.
computeMatrix option Output format Name of output
variable to be used
Recommended
extension
–outFileName, -out, -o gzipped matrix file matrix_gz
(required)
“.gz” –outFileNameMatrix tab-separated table of
matrix file
matrix_tab “.tab” –outFileSortedRegions BED matrix file with sorted
regions after skipping zeros
or min/max threshold values
matrix_bed “.bed”
Software dependencies¶
- deeptools ==3.4.3
Example¶
This wrapper can be used in the following way:
rule compute_matrix:
input:
# Please note that the -R and -S options are defined via input files
bed=expand("{sample}.bed", sample=["a", "b"]),
bigwig=expand("{sample}.bw", sample=["a", "b"])
output:
# Please note that --outFileName, --outFileNameMatrix and --outFileSortedRegions are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/computematrix.html.
matrix_gz="matrix_files/matrix.gz", # required
# optional output files
matrix_tab="matrix_files/matrix.tab",
matrix_bed="matrix_files/matrix.bed"
log:
"logs/deeptools/compute_matrix.log"
params:
# required argument, choose "scale-regions" or "reference-point"
command="scale-regions",
# optional parameters
extra="--regionBodyLength 200 --verbose"
wrapper:
"0.65.0/bio/deeptools/computematrix"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_tab = snakemake.output.get("matrix_tab")
out_bed = snakemake.output.get("matrix_bed")
optional_output = ""
if out_tab:
optional_output += " --outFileNameMatrix {out_tab} ".format(out_tab=out_tab)
if out_bed:
optional_output += " --outFileSortedRegions {out_bed} ".format(out_bed=out_bed)
shell(
"(computeMatrix "
"{snakemake.params.command} "
"{snakemake.params.extra} "
"-R {snakemake.input.bed} "
"-S {snakemake.input.bigwig} "
"-o {snakemake.output.matrix_gz} "
"{optional_output}) {log}"
)
DEEPTOOLS PLOTFINGERPRINT¶
deepTools plotFingerprint
plots a profile of cumulative read coverages from a list of indexed BAM files. For usage information about deepTools plotFingerprint
, please see the documentation. For more information about deepTools
, also see the source code.
In addition to required output, an optional output file of read counts can be generated by setting the output variable “counts” (see example Snakemake rule below).
plotFingerprint option Output Name of output
variable to be used
Recommended
extension(s)
–plotFile, -plot, -o coverage plot fingerprint
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outRawCounts tab-separated table of
read counts per bin
counts “.tab”
Software dependencies¶
- deeptools ==3.4.3
Example¶
This wrapper can be used in the following way:
rule plot_fingerprint:
input:
bam_files=expand("samples/{sample}.bam", sample=["a", "b"]),
bam_idx=expand("samples/{sample}.bam.bai", sample=["a", "b"])
output:
# Please note that --plotFile and --outRawCounts are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotfingerprint.html.
fingerprint="plot_fingerprint/plot_fingerprint.png", # required
# optional output
counts="plot_fingerprint/raw_counts.tab"
log:
"logs/deeptools/plot_fingerprint.log"
params:
# optional parameters
"--numberOfSamples 200 "
wrapper:
"0.65.0/bio/deeptools/plotfingerprint"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_counts = snakemake.output.get("counts")
optional_output = ""
if out_counts:
optional_output += " --outRawCounts {out_counts} ".format(out_counts=out_counts)
shell(
"(plotFingerprint "
"-b {snakemake.input.bam_files} "
"-o {snakemake.output.fingerprint} "
"{optional_output} "
"{snakemake.params}) {log}"
)
DEEPTOOLS PLOTHEATMAP¶
deepTools plotHeatmap
creates a heatmap for scores associated with genomic regions. As input, it requires a matrix file generated by deepTools computeMatrix
. For usage information about deepTools plotHeatmap
, please see the documentation. For more information about deepTools
, also see the source code.
You can select which optional output files are generated by adding the respective output variable with the recommended extension(s) for them (see example Snakemake rule below).
PlotHeatmap option Output Name of output
variable to be used
Recommended
extension(s)
–outFileName, -out, -o plot image heatmap_img
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outFileSortedRegions BED file with
sorted regions
regions “.bed” –outFileNameMatrix tab-separated matrix
of values underlying
the heatmap
heatmap_matrix “.tab”
Software dependencies¶
- deeptools ==3.4.3
Example¶
This wrapper can be used in the following way:
rule plot_heatmap:
input:
# matrix file from deepTools computeMatrix tool
"matrix.gz"
output:
# Please note that --outFileSortedRegions and --outFileNameMatrix are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotheatmap.html.
heatmap_img="plot_heatmap/heatmap.png", # required
# optional output files
regions="plot_heatmap/heatmap_regions.bed",
heatmap_matrix="plot_heatmap/heatmap_matrix.tab"
log:
"logs/deeptools/heatmap.log"
params:
# optional parameters
"--plotType=fill "
wrapper:
"0.65.0/bio/deeptools/plotheatmap"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_region = snakemake.output.get("regions")
out_matrix = snakemake.output.get("heatmap_matrix")
optional_output = ""
if out_region:
optional_output += " --outFileSortedRegions {out_region} ".format(
out_region=out_region
)
if out_matrix:
optional_output += " --outFileNameMatrix {out_matrix} ".format(
out_matrix=out_matrix
)
shell(
"(plotHeatmap "
"-m {snakemake.input[0]} "
"-o {snakemake.output.heatmap_img} "
"{optional_output} "
"{snakemake.params}) {log}"
)
DEEPTOOLS PLOTPROFILE¶
deepTools plotProfile
plots scores over sets of genomic regions. As input, it requires a matrix file generated by deepToolscomputeMatrix
. For usage information about deepTools plotProfile
, please see the documentation. For more information about deepTools
, also see the source code.
You can select which optional output files are generated by adding the respective output variable with the recommended extension for them (see example Snakemake rule below).
PlotProfile option Output Name of output
variable to be used
Recommended
extension(s)
–outFileName, -out, -o profile plot plot_img
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outFileSortedRegions BED file with
sorted regions
regions “.bed” –outFileNameData tab-separated table
for average profile
data “.tab”
Software dependencies¶
- deeptools ==3.4.3
Example¶
This wrapper can be used in the following way:
rule plot_profile:
input:
# matrix file from deepTools computeMatrix tool
"matrix.gz"
output:
# Please note that --outFileSortedRegions and --outFileNameData are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotprofile.html.
# Through the output variables image file and more output options for plot profile can be selected.
plot_img="plot_profile/plot.png", # required
# optional output files
regions="plot_profile/regions.bed",
data="plot_profile/data.tab"
log:
"logs/deeptools/plot_profile.log"
params:
# optional parameters
"--plotType=fill "
"--perGroup "
"--colors red yellow blue "
"--dpi 150 "
wrapper:
"0.65.0/bio/deeptools/plotprofile"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_region = snakemake.output.get("regions")
out_data = snakemake.output.get("data")
optional_output = ""
if out_region:
optional_output += " --outFileSortedRegions {out_region} ".format(
out_region=out_region
)
if out_data:
optional_output += " --outFileNameData {out_data} ".format(out_data=out_data)
shell(
"(plotProfile "
"-m {snakemake.input[0]} "
"-o {snakemake.output.plot_img} "
"{optional_output} "
"{snakemake.params}) {log}"
)
DEEPVARIANT¶
Call genetic variants using deep neural network. Copyright 2017 Google LLC. BSD 3-Clause “New” or “Revised” https://github.com/google/deepvariant
Software dependencies¶
- deepvariant=0.10.0
- tensorflow-estimator=2.0.0
- unzip=6.0
Example¶
This wrapper can be used in the following way:
rule deepvariant:
input:
bam="mapped/{sample}.bam",
ref="genome/genome.fasta"
output:
vcf="calls/{sample}.vcf.gz"
params:
model="wgs", # {wgs, wes}
extra=""
threads: 2
log:
"logs/deepvariant/{sample}/stdout.log"
wrapper:
"0.65.0/bio/deepvariant"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param alllows for additional program arguments.
- This snakemake wrapper uses bioconda deepvariant package. Copyright 2018 Brad Chapman.
Authors¶
- Tetsuro Hisayoshi
Code¶
__author__ = "Tetsuro Hisayoshi"
__copyright__ = "Copyright 2020, Tetsuro Hisayoshi"
__email__ = "hisayoshi0530@gmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
log_dir = os.path.dirname(snakemake.log[0])
output_dir = os.path.dirname(snakemake.output[0])
# sample basename
basename = os.path.splitext(os.path.basename(snakemake.input.bam[0]))[0]
with tempfile.TemporaryDirectory() as tmp_dir:
shell(
"(dv_make_examples.py "
"--cores {snakemake.threads} "
"--ref {snakemake.input.ref} "
"--reads {snakemake.input.bam} "
"--sample {basename} "
"--examples {tmp_dir} "
"--logdir {log_dir} "
"{extra} \n"
"dv_call_variants.py "
"--cores {snakemake.threads} "
"--outfile {tmp_dir}/{basename}.tmp "
"--sample {basename} "
"--examples {tmp_dir} "
"--model {snakemake.params.model} \n"
"dv_postprocess_variants.py "
"--ref {snakemake.input.ref} "
"--infile {tmp_dir}/{basename}.tmp "
"--outfile {snakemake.output.vcf} ) {log}"
)
DELLY¶
Call variants with delly.
Software dependencies¶
- delly ==0.8.1
Example¶
This wrapper can be used in the following way:
rule delly:
input:
ref="genome.fasta",
samples=["mapped/a.bam"],
# optional exclude template (see https://github.com/dellytools/delly)
exclude="human.hg19.excl.tsv"
output:
"sv/calls.bcf"
params:
extra="" # optional parameters for delly (except -g, -x)
log:
"logs/delly.log"
threads: 2 # It is best to use as many threads as samples
wrapper:
"0.65.0/bio/delly"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
exclude = (
"-x {}".format(snakemake.input.exlude) if snakemake.input.get("exlude", "") else ""
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"OMP_NUM_THREADS={snakemake.threads} delly call {extra} "
"{exclude} -g {snakemake.input.ref} "
"-o {snakemake.output[0]} {snakemake.input.samples} {log}"
)
EPIC¶
For epic, the following wrappers are available:
EPIC¶
Find broad enriched domains in ChIP-Seq data with epic
Software dependencies¶
- epic =0.2.7
- pandas =0.22.0
Example¶
This wrapper can be used in the following way:
rule epic:
input:
treatment = "bed/test.bed",
background = "bed/control.bed"
output:
enriched_regions = "epic/enriched_regions.csv", # required
bed = "epic/enriched_regions.bed", # optional
matrix = "epic/matrix.gz" # optional
log:
"logs/epic/epic.log"
params:
genome = "hg19", # optional, default hg19
extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
threads: 1 # optional, defaults to 1
wrapper:
"0.65.0/bio/epic/peaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All/any of the different bigwig options must be given as extra parameters
Authors¶
- Endre Bakken Stovner
Code¶
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1
treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")
# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")
bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")
if len(snakemake.log) > 0:
log = snakemake.log[0]
genome = snakemake.params.get("genome")
cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"
if bed:
cmd += " -b {bed}"
if matrix:
cmd += " -sm {matrix}"
if log:
cmd += " -l {log}"
cmd += " {extra}"
shell(cmd)
FASTP¶
trim and QC fastq reads with fastp
Software dependencies¶
- fastp ==0.20.0
Example¶
This wrapper can be used in the following way:
rule fastp_se:
input:
sample=["reads/se/{sample}.fastq"]
output:
trimmed="trimmed/se/{sample}.fastq",
html="report/se/{sample}.html",
json="report/se/{sample}.json"
log:
"logs/fastp/se/{sample}.log"
params:
extra=""
threads: 1
wrapper:
"0.65.0/bio/fastp"
rule fastp_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
html="report/pe/{sample}.html",
json="report/pe/{sample}.json"
log:
"logs/fastp/pe/{sample}.log"
params:
extra=""
threads: 2
wrapper:
"0.65.0/bio/fastp"
rule fastp_pe_wo_trimming:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
html="report/pe_wo_trimming/{sample}.html",
json="report/pe_wo_trimming/{sample}.json"
log:
"logs/fastp/pe_wo_trimming/{sample}.log"
params:
extra=""
threads: 2
wrapper:
"0.65.0/bio/fastp"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Sebastian Kurscheid (sebastian.kurscheid@anu.edu.au)
Code¶
__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "--in1 {}".format(snakemake.input.sample)
else:
reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)
trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths is not None:
if n == 1:
trimmed = "--out1 {}".format(snakemake.output.trimmed)
else:
trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)
else:
trimmed = ""
html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)
shell(
"(fastp --thread {snakemake.threads} {snakemake.params.extra} "
"{reads} "
"{trimmed} "
"{json} "
"{html} ) {log}"
)
FASTQ_SCREEN¶
fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE2
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE
DATABASE hg19 /data/hg19/Bowtie2Index/genome BOWTIE2
DATABASE mm10 /data/mm10/Bowtie2Index/genome BOWTIE2
BOWTIE /path/to/bowtie
BOWTIE2 /path/to/bowtie2
becomes:
fastq_screen_config = {
'database': {
'ecoli': {
'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
'hg19': {
'bowtie2': '/data/hg19/Bowtie2Index/genome'},
'mm10': {
'bowtie2': '/data/mm10/Bowtie2Index/genome'}
},
'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}
By default, the wrapper will use bowtie2 as the aligner and a subset of 100000
reads. These can be overridden using params.aligner
and params.subset
respectively. Furthermore, params.extra can be used to pass additional
arguments verbatim to fastq_screen
, for example extra="--illumina1_3"
or
extra="--bowtie2 '--trim5=8'"
.
Software dependencies¶
- fastq-screen ==0.5.2
- bowtie2 ==2.2.6
- bowtie ==1.1.2
Example¶
This wrapper can be used in the following way:
rule fastq_screen:
input:
"samples/{sample}.fastq"
output:
txt="qc/{sample}.fastq_screen.txt",
png="qc/{sample}.fastq_screen.png"
params:
fastq_screen_config="fastq_screen.conf",
subset=100000,
aligner='bowtie2'
threads: 8
wrapper:
"0.65.0/bio/fastq_screen"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
fastq_screen
hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.- While the dictionary form of
fastq_screen_config
is convenient, the unordered nature of the dictionary may causesnakemake --list-params-changed
to incorrectly report changed parameters even though the contents remain the same. If you plan on using--list-params-changed
then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6. - When providing the dictionary form of
fastq_screen_config
, the wrapper will write a temp file using Python’stempfile
module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something likeshell.prefix("export TMPDIR=/scratch; ")
to the snakefile calling this wrapper.
Authors¶
- Ryan Dale
Code¶
import os
import re
from snakemake.shell import shell
import tempfile
__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"
_config = snakemake.params["fastq_screen_config"]
subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
tmp = tempfile.NamedTemporaryFile(delete=False).name
with open(tmp, "w") as fout:
for label, indexes in _config["database"].items():
for aligner, index in indexes.items():
fout.write(
"\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
)
for aligner, path in _config["aligner_paths"].items():
fout.write("\t".join([aligner.upper(), path]) + "\n")
config_file = tmp
else:
config_file = _config
# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]
tempdir = tempfile.mkdtemp()
shell(
"fastq_screen --outdir {tempdir} "
"--force "
"--aligner {aligner} "
"--conf {config_file} "
"--subset {subset} "
"--threads {snakemake.threads} "
"{extra} "
"{snakemake.input[0]} "
"{log}"
)
# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")
# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
shell("rm {tmp}")
FASTQC¶
Generate fastq qc statistics using fastqc.
Software dependencies¶
- fastqc ==0.11.9
Example¶
This wrapper can be used in the following way:
rule fastqc:
input:
"reads/{sample}.fastq"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params: ""
log:
"logs/fastqc/{sample}.log"
threads: 1
wrapper:
"0.65.0/bio/fastqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for fastqc."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = path.basename(file_path)
split_ind = 2 if base.endswith(".fastq.gz") else 1
base = ".".join(base.split(".")[:-split_ind])
return base
# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
shell(
"fastqc {snakemake.params} --quiet -t {snakemake.threads} "
"--outdir {tempdir:q} {snakemake.input[0]:q}"
" {log:q}"
)
# Move outputs into proper position.
output_base = basename_without_ext(snakemake.input[0])
html_path = path.join(tempdir, output_base + "_fastqc.html")
zip_path = path.join(tempdir, output_base + "_fastqc.zip")
if snakemake.output.html != html_path:
shell("mv {html_path:q} {snakemake.output.html:q}")
if snakemake.output.zip != zip_path:
shell("mv {zip_path:q} {snakemake.output.zip:q}")
FGBIO¶
For fgbio, the following wrappers are available:
FGBIO ANNOTATEBAMWITHUMIS¶
Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule AnnotateBam:
input:
bam="mapped/{sample}.bam",
umi="umi/{sample}.fastq"
output:
"mapped/{sample}.annotated.bam"
params: ""
log:
"logs/fgbio/annotate_bam/{sample}.log"
wrapper:
"0.65.0/bio/fgbio/annotatebamwithumis"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input.bam
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
raise ValueError("Input bam should be a string: " + str(bam_input) + "!")
umi_input = snakemake.input.umi
if umi_input is None:
raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")
if not len(snakemake.output) == 1:
raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")
shell(
"fgbio AnnotateBamWithUmis"
" -i {bam_input}"
" -f {umi_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FGBIO CALLMOLECULARCONSENSUSREADS¶
Calls consensus sequences from reads with the same unique molecular tag.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule ConsensusReads:
input:
"mapped/a.bam"
output:
"mapped/{sample}.m3.bam"
params:
extra="-M 3"
log:
"logs/fgbio/consensus_reads/{sample}.log"
wrapper:
"0.65.0/bio/fgbio/callmolecularconsensusreads"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell(
"fgbio CallMolecularConsensusReads"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FGBIO COLLECTDUPLEXSEQMETRICS¶
Collects a suite of metrics to QC duplex sequencing data.g.
Software dependencies¶
- fgbio ==0.6.1
- r-ggplot2
Example¶
This wrapper can be used in the following way:
rule CollectDuplexSeqMetrics:
input:
"mapped/{sample}.gu.bam"
output:
family_sizes="stats/{sample}.family_sizes.txt",
duplex_family_sizes="stats/{sample}.duplex_family_sizes.txt",
duplex_yield_metrics="stats/{sample}.duplex_yield_metrics.txt",
umi_counts="stats/{sample}.umi_counts.txt",
duplex_qc="stats/{sample}.duplex_qc.pdf",
duplex_umi_counts="stats/{sample}.duplex_umi_counts.txt",
params:
extra=lambda wildcards: "-d " + wildcards.sample
log:
"logs/fgbio/collectduplexseqmetrics/{sample}.log"
wrapper:
"0.65.0/bio/fgbio/collectduplexseqmetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
family_sizes = snakemake.output.family_sizes
duplex_family_sizes = snakemake.output.duplex_family_sizes
duplex_yield_metrics = snakemake.output.duplex_yield_metrics
umi_counts = snakemake.output.umi_counts
duplex_qc = snakemake.output.duplex_qc
duplex_umi_counts = snakemake.output.get("duplex_umi_counts", None)
file_path = str(path.dirname(family_sizes))
name = str(path.basename(family_sizes)).split(".")[0]
path_name_prefix = str(path.join(file_path, name))
if not family_sizes == path_name_prefix + ".family_sizes.txt":
raise Exception(
"Unexpected family_sizes path/name format, expected {}, got {}.".format(
path_name_prefix + ".family_sizes.txt", family_sizes
)
)
if not duplex_family_sizes == path_name_prefix + ".duplex_family_sizes.txt":
raise Exception(
"Unexpected duplex_family_sizes path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_family_sizes.txt", duplex_family_sizes
)
)
if not duplex_yield_metrics == path_name_prefix + ".duplex_yield_metrics.txt":
raise Exception(
"Unexpected duplex_yield_metrics path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_yield_metrics.txt", duplex_yield_metrics
)
)
if not umi_counts == path_name_prefix + ".umi_counts.txt":
raise Exception(
"Unexpected umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".umi_counts.txt", umi_counts
)
)
if not duplex_qc == path_name_prefix + ".duplex_qc.pdf":
raise Exception(
"Unexpected duplex_qc path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_qc.pdf", duplex_qc
)
)
if (
duplex_umi_counts is not None
and not duplex_umi_counts == path_name_prefix + ".duplex_umi_counts.txt"
):
raise Exception(
"Unexpected duplex_umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_umi_counts.txt", duplex_umi_counts
)
)
duplex_umi_counts_flag = ""
if duplex_umi_counts is not None:
duplex_umi_counts_flag = "-u "
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
shell(
"fgbio CollectDuplexSeqMetrics"
" -i {bam_input}"
" -o {path_name_prefix}"
" {duplex_umi_counts_flag}"
" {extra_params}"
" {log}"
)
FGBIO FILTERCONSENSUSREADS¶
Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule FilterConsensusReads:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.filtered.bam"
params:
extra="",
min_base_quality=2,
min_reads=[2, 2, 2],
ref="genome.fasta"
log:
"logs/fgbio/filterconsensusreads/{sample}.log"
threads: 1
wrapper:
"0.65.0/bio/fgbio/filterconsensusreads"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
- min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
- For more inforamtion see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
raise ValueError("min_base_quality needs to be provided as an Int!")
min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
raise ValueError(
"min_reads needs to be provided as list of Ints, min length 1, max length 3!"
)
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference needs to be provided!")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
bam_output = snakemake.output[0]
if not isinstance(bam_output, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(bam_output) + "!")
shell(
"fgbio FilterConsensusReads"
" -i {bam_input}"
" -o {bam_output}"
" -r {ref}"
" --min-reads {min_reads}"
" --min-base-quality {min_base_quality}"
" {extra_params}"
" {log}"
)
FGBIO GROUPREADSBYUMI¶
Groups reads together that appear to have come from the same original molecule.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule GroupReads:
input:
"mapped/a.bam"
output:
bam="mapped/{sample}.gu.bam",
hist="mapped/{sample}.gu.histo.tsv",
params:
extra="-s adjacency --edits 1"
log:
"logs/fgbio/group_reads/{sample}.log"
wrapper:
"0.65.0/bio/fgbio/groupreadsbyumi"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_bam_file = snakemake.output.bam
if not isinstance(output_bam_file, str) and len(output_bam_file) != 1:
raise ValueError("Bam output should be one bam file: " + str(output_bam_file) + "!")
output_histo_file = snakemake.output.hist
if not isinstance(output_histo_file, str) and len(output_histo_file) != 1:
raise ValueError(
"Histo output should be one histogram file path: "
+ str(output_histo_file)
+ "!"
)
shell(
"fgbio GroupReadsByUmi"
" -i {bam_input}"
" -o {output_bam_file}"
" -f {output_histo_file}"
" {extra_params}"
" {log}"
)
FGBIO SETMATEINFORMATION¶
Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.
Software dependencies¶
- fgbio ==0.6.1
Example¶
This wrapper can be used in the following way:
rule SetMateInfo:
input:
"mapped/a.bam"
output:
"mapped/{sample}.mi.bam"
params: ""
log:
"logs/fgbio/set_mate_info/{sample}.log"
wrapper:
"0.65.0/bio/fgbio/setmateinformation"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell(
"fgbio SetMateInformation"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FILTLONG¶
Quality filtering tool for long reads.
Software dependencies¶
- filtlong=0.2.0=he941832_2
Example¶
This wrapper can be used in the following way:
rule filtlong:
input:
reads = "{sample}.fastq"
output:
"{sample}.filtered.fastq"
params:
extra=" --mean_q_weight 5.0",
target_bases = 10
log:
"logs/filtlong/test/{sample}.log"
wrapper:
"0.65.0/bio/filtlong"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for filtlong."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
target_bases = int(snakemake.params.get("target_bases", 0))
if target_bases > 0:
extra += " --target_bases {}".format(target_bases)
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Executed shell command
shell("filtlong {extra}" " {snakemake.input.reads} > {snakemake.output} {log}")
FREEBAYES¶
Call small genomic variants with freebayes.
Software dependencies¶
- freebayes ==1.3.1
- bcftools ==1.10
- parallel ==20190522
- bedtools >=2.29
- sed ==4.7
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
ref="genome.fasta",
# you can have a list of samples here
samples="mapped/{sample}.bam",
# the matching BAI indexes have to present for freebayes
indexes="mapped/{sample}.bam.bai"
# optional BED file specifying chromosomal regions on which freebayes
# should run, e.g. all regions that show coverage
#regions="/path/to/region-file.bed"
output:
"calls/{sample}.vcf" # either .vcf or .bcf
log:
"logs/freebayes/{sample}.log"
params:
extra="", # optional parameters
chunksize=100000 # reference genome chunk size for parallelization (default: 100000)
threads: 2
wrapper:
"0.65.0/bio/freebayes"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Felix Mölder
Code¶
__author__ = "Johannes Köster, Felix Mölder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = snakemake.params.get("extra", "")
pipe = ""
if snakemake.output[0].endswith(".bcf"):
pipe = "| bcftools view -Ob -"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
regions = "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
snakemake=snakemake, chunksize=chunksize
)
if snakemake.input.get("regions", ""):
regions = (
"<(bedtools intersect -a "
r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
"{regions}) -b {snakemake.input.regions} | "
r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
).format(regions=regions, snakemake=snakemake)
freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
snakemake=snakemake, regions=regions
)
shell(
"({freebayes} {params} -f {snakemake.input.ref}"
" {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}"
)
GATK¶
For gatk, the following wrappers are available:
GATK APPLYBQSR¶
Run gatk ApplyBQSR.
Software dependencies¶
- gatk4 ==4.1.4.1
- openjdk =8
Example¶
This wrapper can be used in the following way:
rule gatk_applybqsr:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
recal_table="recal/{sample}.grp"
output:
bam="recal/{sample}.bam"
log:
"logs/gatk/gatk_applybqsr/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/applybqsr"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param allows for additional program arguments for ApplyBSQR.
- For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360037055712-ApplyBQSR
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
"gatk --java-options '{java_opts}' ApplyBQSR {extra} -R {snakemake.input.ref} -I {snakemake.input.bam} "
"--bqsr-recal-file {snakemake.input.recal_table} "
"-O {snakemake.output.bam} {log}"
)
GATK BASERECALIBRATOR¶
Run gatk BaseRecalibrator.
Software dependencies¶
- gatk4 ==4.1.4.1
- openjdk =8
Example¶
This wrapper can be used in the following way:
rule gatk_baserecalibrator:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
known="dbsnp.vcf.gz" # optional known sites
output:
recal_table="recal/{sample}.grp"
log:
"logs/gatk/baserecalibrator/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/baserecalibrator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
known = "--known-sites {}".format(known)
shell(
"gatk --java-options '{java_opts}' BaseRecalibrator {extra} "
"-R {snakemake.input.ref} -I {snakemake.input.bam} "
"-O {snakemake.output.recal_table} {known} {log}"
)
GATK BASERECALIBRATORSPARK¶
Run gatk BaseRecalibratorSpark.
Software dependencies¶
- gatk4 ==4.1.4.1
- openjdk =8
Example¶
This wrapper can be used in the following way:
rule gatk_baserecalibratorspark:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
known="dbsnp.vcf.gz" # optional known sites
output:
recal_table="recal/{sample}.grp"
log:
"logs/gatk/baserecalibrator/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
#spark_runner="", # optional, local by default
#spark_0.65.0="", # optional
#spark_extra="", # optional
threads: 8
wrapper:
"0.65.0/bio/gatk/baserecalibratorspark"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param allows for additional program arguments for baserecalibratorspark.
- The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
- The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
- The ‘spark_extra’ param allows for additional spark arguments.
- For more information see, https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA-
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
"spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
known = "--known-sites {}".format(known)
shell(
"gatk --java-options '{java_opts}' BaseRecalibratorSpark {extra} "
"-R {snakemake.input.ref} -I {snakemake.input.bam} "
"-O {snakemake.output.recal_table} {known} "
"-- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra} "
"{log}"
)
GATK COMBINEGVCFS¶
Run gatk CombineGVCFs.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
ref="genome.fasta"
output:
gvcf="calls/all.g.vcf",
log:
"logs/gatk/combinegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/combinegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
gvcfs = list(map("-V {}".format, snakemake.input.gvcfs))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' CombineGVCFs {extra} "
"{gvcfs} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.gvcf} {log}"
)
GATK GENOTYPEGVCFS¶
Run gatk GenotypeGVCFs.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcf="calls/all.g.vcf", # combined gvcf over multiple samples
ref="genome.fasta"
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/genotypegvcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' GenotypeGVCFs {extra} "
"-V {snakemake.input.gvcf} "
"-R {snakemake.input.ref} "
"-O {snakemake.output.vcf} {log}"
)
GATK HAPLOTYPECALLER¶
Run gatk HaplotypeCaller.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule haplotype_caller:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
ref="genome.fasta"
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
log:
"logs/gatk/haplotypecaller/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
known = snakemake.input.get("known", "")
if known:
known = "--dbsnp " + known
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
bams = snakemake.input.bam
if isinstance(bams, str):
bams = [bams]
bams = list(map("-I {}".format, bams))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' HaplotypeCaller {extra} "
"-R {snakemake.input.ref} {bams} "
"-ERC GVCF "
"-O {snakemake.output.gvcf} {known} {log}"
)
GATK MUTECT2¶
Call somatic SNVs and indels via local assembly of haplotypes
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule mutect2:
input:
fasta = "genome/genome.fasta",
map = "mapped/{sample}.bam"
output:
vcf = "variant/{sample}.vcf"
message:
"Testing Mutect2 with {wildcards.sample}"
threads:
1
log:
"logs/mutect_{sample}.log"
wrapper:
"0.65.0/bio/gatk/mutect"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for GATK4 Mutect2"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"gatk Mutect2 " # Tool and its subprocess
"--input {snakemake.input.map} " # Path to input mapping file
"--output {snakemake.output.vcf} " # Path to output vcf file
"--reference {snakemake.input.fasta} " # Path to reference fasta file
"{extra} " # Extra parameters
"{log}" # Logging behaviour
)
GATK SELECTVARIANTS¶
Run gatk SelectVariants.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule gatk_select:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.vcf"
log:
"logs/gatk/select/snvs.log"
params:
extra="--select-type-to-include SNP", # optional filter arguments, see GATK docs
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/selectvariants"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' SelectVariants -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} -O {snakemake.output.vcf} {log}"
)
GATK SPLITNCIGARREADS¶
Run gatk SplitNCigarReads.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule splitncigarreads:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta"
output:
"split/{sample}.bam"
log:
"logs/gatk/splitNCIGARreads/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/splitncigarreads"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' SplitNCigarReads {extra} "
" -R {snakemake.input.ref} -I {snakemake.input.bam} "
"-O {snakemake.output} {log}"
)
GATK VARIANTFILTRATION¶
Run gatk VariantFiltration.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
rule gatk_filter:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.filtered.vcf"
log:
"logs/gatk/filter/snvs.log"
params:
filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
extra="", # optional arguments, see GATK docs
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/variantfiltration"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
filters = [
"--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
for name, expr in snakemake.params.filters.items()
]
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' VariantFiltration -R {snakemake.input.ref} -V {snakemake.input.vcf} "
"{extra} {filters} -O {snakemake.output.vcf} {log}"
)
GATK VARIANTRECALIBRATOR¶
Run gatk VariantRecalibrator.
Software dependencies¶
- gatk4 ==4.1.4.1
Example¶
This wrapper can be used in the following way:
from snakemake.remote import GS
# GATK resource bundle files can be either directly obtained from google storage (like here), or
# from FTP. You can also use local files.
GS = GS.RemoteProvider()
def gatk_bundle(f):
return GS.remote("genomics-public-data/resources/broad/hg38/v0/{}".format(f))
rule haplotype_caller:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
# resources have to be given as named input files
hapmap=gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz"),
omni=gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz"),
g1k=gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz"),
dbsnp=gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz"),
# use aux to e.g. download other necessary file
aux=[gatk_bundle("hapmap_3.3.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_omni2.5.hg38.sites.vcf.gz.tbi"),
gatk_bundle("1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi"),
gatk_bundle("Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi")]
output:
vcf="calls/all.recal.vcf",
tranches="calls/all.tranches"
log:
"logs/gatk/variantrecalibrator.log"
params:
mode="SNP", # set mode, must be either SNP, INDEL or BOTH
# resource parameter definition. Key must match named input files from above.
resources={"hapmap": {"known": False, "training": True, "truth": True, "prior": 15.0},
"omni": {"known": False, "training": True, "truth": False, "prior": 12.0},
"g1k": {"known": False, "training": True, "truth": False, "prior": 10.0},
"dbsnp": {"known": True, "training": False, "truth": False, "prior": 2.0}},
annotation=["QD", "FisherStrand"], # which fields to use with -an (see VariantRecalibrator docs)
extra="", # optional
java_opts="", # optional
wrapper:
"0.65.0/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Authors¶
- Johannes Köster
- Jake VanCampen
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
def fmt_res(resname, resparams):
fmt_bool = lambda b: str(b).lower()
try:
f = snakemake.input.get(resname)
except KeyError:
raise RuntimeError(
"There must be a named input file for every resource (missing: {})".format(
resname
)
)
return "{},known={},training={},truth={},prior={}:{}".format(
resname,
fmt_bool(resparams["known"]),
fmt_bool(resparams["training"]),
fmt_bool(resparams["truth"]),
resparams["prior"],
f,
)
resources = [
"--resource {}".format(fmt_res(resname, resparams))
for resname, resparams in snakemake.params["resources"].items()
]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = ""
if snakemake.output.tranches:
tranches = "--tranches-file " + snakemake.output.tranches
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk --java-options '{java_opts}' VariantRecalibrator {extra} {resources} "
"-R {snakemake.input.ref} -V {snakemake.input.vcf} "
"-mode {snakemake.params.mode} "
"--output {snakemake.output.vcf} "
"{tranches} {annotation} {log}"
)
GATK3¶
For gatk3, the following wrappers are available:
GATK3 BASERECALIBRATOR¶
Run gatk3 BaseRecalibrator.
Software dependencies¶
- gatk ==3.8
Example¶
This wrapper can be used in the following way:
rule baserecalibrator:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
known="dbsnp.vcf.gz"
output:
"{sample}.recal_data_table"
log:
"logs/gatk3/bqsr/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
threads: 16
wrapper:
"bio/gatk/baserecalibrator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
bed = "-L " + bed
else:
bed = ""
input_known_string = ""
for known in input_known:
input_known_string = input_known_string + " --knownSites {}".format(known)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk3 {java_opts} -T BaseRecalibrator"
" -nct {snakemake.threads}"
" {extra}"
" -I {input_bam}"
" -R {input_ref}"
" {input_known_string}"
" {bed}"
" -o {snakemake.output}"
" {log}"
)
GATK3 INDELREALIGNER¶
Run gatk3 IndelRealigner
Software dependencies¶
- gatk ==3.8
Example¶
This wrapper can be used in the following way:
rule indelrealigner:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
known="dbsnp.vcf.gz",
target_intervals="{sample}.intervals"
output:
bam="realigned/{sample}.bam"
log:
"logs/gatk3/indelrealigner/{sample}.log"
params:
extra="", # optional
java_opts="", # optional
threads: 16
wrapper:
"bio/gatk/indelrealigner"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
input_target_intervals = snakemake.input.target_intervals
bed = snakemake.params.get("bed", None)
if bed is not None:
bed = "-L " + bed
else:
bed = ""
input_known_string = ""
for known in input_known:
input_known_string = input_known_string + " -known {}".format(known)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk3 {java_opts} -T IndelRealigner"
" {extra}"
" -I {input_bam}"
" -R {input_ref}"
" {input_known_string}"
" {bed}"
" --targetIntervals {input_target_intervals}"
" -o {snakemake.output}"
" {log}"
)
GATK3 PRINTREADS¶
Run gatk3 PrintReads
Software dependencies¶
- gatk ==3.8
Example¶
This wrapper can be used in the following way:
rule printreads:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
recal_data="{sample}.recal_data_table"
output:
"alignment/{sample}.bqsr.bam"
log:
"logs/gatk/bqsr/{sample}..log"
params:
extra="", # optional
java_opts="",
threads: 16
wrapper:
"bio/gatk3/printreads"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
input_bam = snakemake.input.bam
input_recal_data = snakemake.input.recal_data
input_ref = snakemake.input.ref
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk3 {java_opts} -T PrintReads"
" {extra}"
" -I {input_bam}"
" -R {input_ref}"
" -BQSR {input_recal_data}"
" -o {snakemake.output}"
" {log}"
)
GATK3 REALIGNERTARGETCREATOR¶
Run gatk3 RealignerTargetCreator
Software dependencies¶
- gatk ==3.8
Example¶
This wrapper can be used in the following way:
rule realignertargetcreator:
input:
bam="mapped/{sample}.bam"
ref="genome.fasta",
known="dbsnp.vcf.gz"
output:
"{sample}.intervals"
log:
"logs/gatk/realignertargetcreator/{sample}.log"
params:
extra="", # optional
java_opts="",
threads: 16
wrapper:
"bio/gatk3/realignertargetcreator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param alllows for additional program arguments.
- For more inforamtion see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
java_opts = snakemake.params.get("java_opts", "")
input_bam = snakemake.input.bam
input_known = snakemake.input.known
input_ref = snakemake.input.ref
bed = snakemake.params.get("bed", None)
if bed is not None:
bed = "-L " + bed
else:
bed = ""
input_known_string = ""
for known in input_known:
input_known_string = input_known_string + " --known {}".format(known)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"gatk3 {java_opts} -T RealignerTargetCreator"
" -nt {snakemake.threads}"
" {extra}"
" -I {input_bam}"
" -R {input_ref}"
" {input_known_string}"
" {bed}"
" -o {snakemake.output}"
" {log}"
)
GDC-API¶
For gdc-api, the following wrappers are available:
GDC API-BASED DATA DOWNLOAD OF BAM SLICES¶
Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.
Software dependencies¶
- curl ==7.69.1
Example¶
This wrapper can be used in the following way:
rule gdc_api_bam_slice_download:
output:
bam="raw/{sample}.bam",
log:
"logs/gdc-api/bam-slicing/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
# a gdc_token is required for controlled access and all BAM files
# on GDC seem to be controlled access (adjust if this changes)
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# provide wanted `region=` or `gencode=` slices joined with `&`
slices="region=chr22®ion=chr5:1000-2000®ion=unmapped&gencode=BRCA2",
# extra command line arguments passed to curl
extra=""
wrapper:
"0.65.0/bio/gdc-api/bam-slicing"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- BAM file UUIDs can be found via the GDC repository query, either by clicking on individual files or systematically by creating a cart and downloading a manifest file.
- Slicing can be performed using region syntax like ‘region=chr20:3000-4000’, gene name syntax like ‘gencode=BRCA2’ (this uses Gene symbols of GENCODE v22) or ‘region=unmapped’ to get unmapped reads. Multiple such entries can be joined with ampersands (e.g.
region=chr5:200-300®ion=unmapped&gencode=BRCA1
). - All BAM data files in GDC are controlled access according to this GDC repository query, thus a GDC access token file is always required and must be provided via
params: gdc_token: "path/to/access_token.txt"
. Should this change in the future, feel free to adjust this wrapper or contact the original author.
Authors¶
- David Lähnemann
Code¶
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
uuid = snakemake.params.get("uuid", "")
if uuid == "":
raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")
token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
raise ValueError(
"You need to provide a GDC data access token file via the 'token' in 'params'."
)
token = ""
with open(token_file) as tf:
token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)
slices = snakemake.params.get("slices", "")
if slices == "":
raise ValueError(
"You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s) via the 'slices' in 'params'."
)
extra = snakemake.params.get("extra", "")
shell(
"curl --silent"
" --header $CURL_HEADER_TOKEN"
" 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
" {extra}"
" --output {snakemake.output.bam} {log}"
)
if os.path.getsize(snakemake.output.bam) < 100000:
with open(snakemake.output.bam) as f:
if "error" in f.read():
shell("cat {snakemake.output.bam} {log}")
raise RuntimeError(
"Your GDC API request returned an error, check your log file for the error message."
)
GDC-CLIENT¶
For gdc-client, the following wrappers are available:
GDC DATA TRANSFER TOOL DATA DOWNLOAD¶
Download GDC data files with the gdc-client.
Software dependencies¶
- gdc-client ==1.5.0
Example¶
This wrapper can be used in the following way:
rule gdc_download:
output:
# the file extension (up to two components, here .maf.gz), has
# to uniquely map to one of the files downloaded for that UUID
"raw/{sample}.maf.gz"
log:
"logs/gdc-client/download/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
# a gdc_token is only required for controlled access samples,
# leave blank otherwise (`gdc_token=""`) or skip this param entirely
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# for valid extra command line arguments, check command line help or:
# https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
extra = ""
threads: 4
wrapper:
"0.65.0/bio/gdc-client/download"
rule gdc_download_bam:
output:
# specify all the downloaded files you want to keep, as all other
# downloaded files will be removed automatically e.g. for
# BAM data this could be
"raw/{sample}.bam",
"raw/{sample}.bam.bai",
"raw/{sample}.annotations.txt",
directory("raw/{sample}/logs")
log:
"logs/gdc-client/download/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
# a gdc_token is only required for controlled access samples,
# leave blank otherwise (`gdc_token=""`) or skip this param entirely
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# for valid extra command line arguments, check command line help or:
# https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
extra = ""
threads: 4
wrapper:
"0.65.0/bio/gdc-client/download"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Lähnemann
Code¶
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
from tempfile import TemporaryDirectory
import glob
uuid = snakemake.params.get("uuid", "")
if uuid == "":
raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")
extra = snakemake.params.get("extra", "")
token = snakemake.params.get("gdc_token", "")
if token != "":
token = "--token-file {}".format(token)
with TemporaryDirectory() as tempdir:
shell(
"gdc-client download"
" {token}"
" {extra}"
" -n {snakemake.threads} "
" --log-file {snakemake.log} "
" --dir {tempdir}"
" {uuid}"
)
for out_path in snakemake.output:
tmp_path = path.join(tempdir, uuid, path.basename(out_path))
if not path.exists(tmp_path):
(root, ext1) = path.splitext(out_path)
paths = glob.glob(path.join(tempdir, uuid, "*" + ext1))
if len(paths) > 1:
(root, ext2) = path.splitext(root)
paths = glob.glob(path.join(tempdir, uuid, "*" + ext2 + ext1))
if len(paths) == 0:
raise ValueError(
"{} file extension {} does not match any downloaded file.\n"
"Are you sure that UUID {} provides a file of such format?\n".format(
out_path, ext1, uuid
)
)
if len(paths) > 1:
raise ValueError(
"Found more than one downloaded file with extension '{}':\n"
"{}\n"
"Cannot match requested output file {} unambiguously.\n".format(
ext2 + ext1, paths, out_path
)
)
tmp_path = paths[0]
shell("mv {tmp_path} {out_path}")
GENOMEPY¶
Download genomes the easy way: https://github.com/vanheeringen-lab/genomepy
Software dependencies¶
- bioconda::genomepy==0.8.3
Example¶
This wrapper can be used in the following way:
rule genomepy:
output:
multiext("{assembly}/{assembly}", ".fa", ".fa.fai", ".fa.sizes", ".gaps.bed",
".annotation.gtf.gz", ".blacklist.bed")
log:
"logs/genomepy_{assembly}.log"
params:
provider="UCSC" # optional, defaults to ucsc. Choose from ucsc, ensembl, and ncbi
cache: True # mark as eligible for between workflow caching
wrapper:
"0.65.0/bio/genomepy"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Maarten van der Sande
Code¶
__author__ = "Maarten van der Sande"
__copyright__ = "Copyright 2020, Maarten van der Sande"
__email__ = "M.vanderSande@science.ru.nl"
__license__ = "MIT"
from snakemake.shell import shell
# Optional parameters
provider = snakemake.params.get("provider", "UCSC")
# set options for plugins
all_plugins = "blacklist,bowtie2,bwa,gmap,hisat2,minimap2,star"
req_plugins = ","
if any(["blacklist" in out for out in snakemake.output]):
req_plugins = "blacklist,"
annotation = ""
if any(["annotation" in out for out in snakemake.output]):
annotation = "--annotation"
# parse the genome dir
genome_dir = "./"
if snakemake.output[0].count("/") > 1:
genome_dir = "/".join(snakemake.output[0].split("/")[:-1])
log = snakemake.log
# Finally execute genomepy
shell(
"""
# set a trap so we can reset to original user's settings
active_plugins=$(genomepy config show | grep -Po '(?<=- ).*' | paste -s -d, -) || echo ""
trap "genomepy plugin disable {{{all_plugins}}} >> {log} 2>&1;\
genomepy plugin enable {{$active_plugins,}} >> {log} 2>&1" EXIT
# disable all, then enable the ones we need
genomepy plugin disable {{{all_plugins}}} > {log} 2>&1
genomepy plugin enable {{{req_plugins}}} >> {log} 2>&1
# install the genome
genomepy install {snakemake.wildcards.assembly} \
{provider} {annotation} -g {genome_dir} >> {log} 2>&1
"""
)
GRIDSS¶
For gridss, the following wrappers are available:
GRIDSS ASSEMBLE¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. assemble
performs GRIDSS breakend assembly. Documentation at: https://github.com/PapenfussLab/gridss
Software dependencies¶
- gridss ==2.9.4
Example¶
This wrapper can be used in the following way:
WORKING_DIR = "working_dir"
samples = ["A", "B"]
preprocess_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".idsv_metrics",
".insert_size_histogram.pdf",
".insert_size_metrics",
".mapq_metrics",
".sv.bam",
".sv.bam.bai",
".sv_metrics",
".tag_metrics",
)
assembly_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".downsampled_0.bed",
".excluded_0.bed",
".idsv_metrics",
".mapq_metrics",
".quality_distribution.pdf",
".quality_distribution_metrics",
".subsetCalled_0.bed",
".sv.bam",
".sv.bam.bai",
".tag_metrics",
)
reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
rule gridss_assemble:
input:
bams=expand("mapped/{sample}.bam", sample=samples),
bais=expand("mapped/{sample}.bam.bai", sample=samples),
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
indices=multiext("reference/genome.fasta", *reference_index_endings),
preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings)
output:
assembly="assembly/group.bam",
assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/assemble/group.log"
threads:
100
wrapper:
"0.65.0/bio/gridss/assemble"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss assemble"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s assemble " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} " # Threads
"--workingdir {snakemake.params.workingdir} " # Working directory
"--assembly {snakemake.output.assembly} " # Assembly output
"{snakemake.input.bams} "
"{extra}) {log}"
)
GRIDSS CALL¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. call
performs variant calling. Documentation at: https://github.com/PapenfussLab/gridss
Software dependencies¶
- gridss ==2.9.4
- cpulimit =0.2
Example¶
This wrapper can be used in the following way:
WORKING_DIR = "working_dir"
samples = ["A", "B"]
preprocess_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".idsv_metrics",
".insert_size_histogram.pdf",
".insert_size_metrics",
".mapq_metrics",
".sv.bam",
".sv.bam.bai",
".sv_metrics",
".tag_metrics",
)
assembly_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".downsampled_0.bed",
".excluded_0.bed",
".idsv_metrics",
".mapq_metrics",
".quality_distribution.pdf",
".quality_distribution_metrics",
".subsetCalled_0.bed",
".sv.bam",
".sv.bam.bai",
".tag_metrics",
)
reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
rule gridss_call:
input:
bams=expand("mapped/{sample}.bam", sample=samples),
bais=expand("mapped/{sample}.bam.bai", sample=samples),
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
indices=multiext("reference/genome.fasta", *reference_index_endings),
preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings),
assembly="assembly/group.bam",
assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
output:
vcf="vcf/group.vcf",
idx="vcf/group.vcf.idx",
tmpidx=temp(WORKING_DIR + "/group.vcf.gridss.working/group.vcf.allocated.vcf.idx") # be aware the group occurs two times here
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/call/group.log"
threads:
100
wrapper:
"0.65.0/bio/gridss/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss call"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(export JAVA_OPTS='-XX:ActiveProcessorCount={snakemake.threads}' & "
"gridss -s call " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} " # Threads
"--workingdir {snakemake.params.workingdir} " # Working directory
"--assembly {snakemake.input.assembly} " # Assembly input from gridss assemble
"--output {snakemake.output.vcf} " # Assembly vcf
"{snakemake.input.bams} "
"{extra}) {log}"
)
GRIDSS PREPROCESS¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess
pre-processes input BAM files. Can be run per input file. Documentation at: https://github.com/PapenfussLab/gridss
Software dependencies¶
- gridss ==2.9.4
Example¶
This wrapper can be used in the following way:
WORKING_DIR="working_dir"
rule gridss_preprocess:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
output:
multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.bai", ".sv_metrics", ".tag_metrics")
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
threads:
8
wrapper:
"0.65.0/bio/gridss/preprocess"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss preprocess"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s preprocess " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} "
"--workingdir {snakemake.params.workingdir} "
"{snakemake.input.bam} "
"{extra}) {log}"
)
GRIDSS SETUPREFERENCE¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. setupreference
is a once-off setup generating additional files in the same directory as the reference. WARNING multiple instances of GRIDSS attempting to perform setupreference at the same time will result in file corruption. Make sure these files are generated before running parallel GRIDSS jobs. Documentation at: https://github.com/PapenfussLab/gridss
Software dependencies¶
- gridss ==2.9.4
Example¶
This wrapper can be used in the following way:
rule gridss_setupreference:
input:
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
indices=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
output:
multiext("reference/genome.fasta", ".gridsscache", ".img")
params:
extra="--jvmheap 1g"
log:
"log/gridss/setupreference.log"
wrapper:
"0.65.0/bio/gridss/setupreference"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss setupreference"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference", None)
if not snakemake.input.reference:
raise ValueError("A reference genome has to be provided!")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s setupreference " # Tool
"--reference {reference} " # Reference
"{extra}) {log}"
)
HAP.PY¶
For hap.py, the following wrappers are available:
PRE.PY¶
Preprocessing/normalisation of vcf/bcf files. Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).
Software dependencies¶
- hap.py =0.3.10
Example¶
This wrapper can be used in the following way:
rule preprocess_variants:
input:
##vcf/bcf
variants="variants.vcf"
output:
"normalized/variants.vcf"
params:
## path to reference genome
genome="genome.fasta",
## parameters such as -L to left-align variants
extra="-L"
threads: 2
wrapper:
"0.65.0/bio/hap.py/pre.py"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(pre.py"
" --threads {snakemake.threads}"
" -r {snakemake.params.genome}"
" {extra}"
" {snakemake.input.variants}"
" {snakemake.output})"
" {log}"
)
HISAT2¶
For hisat2, the following wrappers are available:
HISAT2 ALIGN¶
Map reads with hisat2.
Software dependencies¶
- hisat2 ==2.1.0
- samtools ==1.9
Example¶
This wrapper can be used in the following way:
rule hisat2_align:
input:
idx = "index/",
reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"]
output:
"mapped/{sample}.bam"
log:
"logs/hisat2_align_{sample}.log"
params:
extra = ""
threads: 2
wrapper:
"0.65.0/bio/hisat2/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The -S flag must not be used since output is already directly piped to samtools for compression.
- The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
- The wrapper does not yet handle SRA input accessions.
- No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()
# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
input_flags = "-1 {0} -2 {1}".format(*reads)
else:
raise RuntimeError(
"Reads parameter must contain at least 1 and at most 2" " input files."
)
# Executed shell command
shell(
"(hisat2 {extra} "
"--threads {snakemake.threads} "
" -x {snakemake.input.idx} {input_flags} "
" | samtools view -Sbh -o {snakemake.output[0]} -) "
" {log}"
)
HISAT2 INDEX¶
Create index with hisat2.
Software dependencies¶
- hisat2 ==2.1.0
- samtools ==1.9
Example¶
This wrapper can be used in the following way:
rule hisat2_index:
input:
fasta = "{genome}.fasta"
output:
directory("index_{genome}")
params:
prefix = "index_{genome}/"
log:
"logs/hisat2_index_{genome}.log"
threads: 2
wrapper:
"0.65.0/bio/hisat2/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for HISAT2 index"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta
hisat_dir = snakemake.params.get("prefix", "")
if hisat_dir:
os.makedirs(hisat_dir)
shell(
"hisat2-build {extra} "
"-p {snakemake.threads} "
"{input_seq} "
"{snakemake.params.prefix} "
"{log}"
)
HMMER¶
For hmmer, the following wrappers are available:
HMMBUILD¶
hmmbuild: construct profile HMM(s) from multiple sequence alignment(s)
Software dependencies¶
- hmmer=3.2.1
Example¶
This wrapper can be used in the following way:
rule hmmbuild_profile:
input:
"test-profile.sto"
output:
"test-profile.hmm"
log:
"logs/test-profile-hmmbuild.log"
params:
extra="",
threads: 4
wrapper:
"0.65.0/bio/hmmer/hmmbuild"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmbuild"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
" hmmbuild {extra} --cpu {snakemake.threads} "
" {snakemake.output} {snakemake.input} {log} "
)
HMMPRESS¶
Format an HMM database into a binary format for hmmscan.
Software dependencies¶
- hmmer=3.2.1
Example¶
This wrapper can be used in the following way:
rule hmmpress_profile:
input:
"test-profile.hmm"
output:
"test-profile.hmm.h3f",
"test-profile.hmm.h3i",
"test-profile.hmm.h3m",
"test-profile.hmm.h3p"
log:
"logs/hmmpress.log"
params:
extra="",
threads: 4
wrapper:
"0.65.0/bio/hmmer/hmmpress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmpress"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# -f Force; overwrites any previous hmmpress-ed datafiles. The default is to bitch about any existing files and ask you to delete them first.
shell("hmmpress -f {snakemake.input} {log}")
HMMSCAN¶
search protein sequence(s) against a protein profile database
Software dependencies¶
- hmmer=3.2.1
Example¶
This wrapper can be used in the following way:
rule hmmscan_profile:
input:
fasta="test-protein.fa",
profile="test-profile.hmm.h3f",
output:
# only one of these is required
tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
pfamtblout="test-prot-pfamtbl.txt", # save table of hits and domains to file, in Pfam format <f>
outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
log:
"logs/hmmscan.log"
params:
evalue_threshold=0.00001,
# if bitscore threshold provided, hmmscan will use that instead
#score_threshold=50,
extra="",
threads: 4
wrapper:
"0.65.0/bio/hmmer/hmmscan"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmscan"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '
# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
out_cmd += " --domtblout {} ".format(domtblout)
# save table of hits and domains to file, in Pfam format <f>
pfamtblout = snakemake.output.get("pfamtblout", "")
if pfamtblout:
out_cmd += " --pfamtblout {} ".format(pfamtblout)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = " -T {} ".format(float(score_threshold))
else:
thresh_cmd = " -E {} ".format(float(evalue_threshold))
# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"hmmscan {out_cmd} {thresh_cmd} --cpu {snakemake.threads}"
" {extra} {profile} {snakemake.input.fasta} {log}"
)
HMMSEARCH¶
search profile(s) against a sequence database
Software dependencies¶
- hmmer=3.2.1
Example¶
This wrapper can be used in the following way:
rule hmmsearch_profile:
input:
fasta="test-protein.fa",
profile="test-profile.hmm.h3f",
output:
# only one of these is required
tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
alignment_hits="test-prot-alignment-hits.txt", # Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
log:
"logs/hmmsearch.log"
params:
evalue_threshold=0.00001,
# if bitscore threshold provided, hmmsearch will use that instead
#score_threshold=50,
extra="",
threads: 4
wrapper:
"0.65.0/bio/hmmer/hmmsearch"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmsearch"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '
# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
out_cmd += " --domtblout {} ".format(domtblout)
# Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
alignment_hits = snakemake.output.get("alignment_hits", "")
if alignment_hits:
out_cmd += " -A {} ".format(alignment_hits)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = " -T {} ".format(float(score_threshold))
else:
thresh_cmd = " -E {} ".format(float(evalue_threshold))
# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
" hmmsearch --cpu {snakemake.threads} "
" {out_cmd} {thresh_cmd} {extra} {profile} "
" {snakemake.input.fasta} {log}"
)
HOMER¶
For homer, the following wrappers are available:
HOMER FINDPEAKS¶
Find ChIP- or ATAC-Seq peaks with the HOMER suite. For more information, please see the documentation.
Software dependencies¶
- homer ==4.11
Example¶
This wrapper can be used in the following way:
rule homer_findPeaks:
input:
# tagDirectory of sample
tag="tagDir/{sample}",
# tagDirectory of control background sample - optional
control="tagDir/control"
output:
"{sample}_peaks.txt"
params:
# one of 7 basic modes of operation, see homer manual
style="histone",
extra="" # optional params, see homer manual
log:
"logs/findPeaks/{sample}.log"
wrapper:
"0.65.0/bio/homer/findPeaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
control = snakemake.input.get("control", "")
if control == "":
control_command = ""
else:
control_command = "-i " + control
shell(
"(findPeaks"
" {snakemake.input.tag}"
" -style {snakemake.params.style}"
" {extra}"
" {control_command}"
" -o {snakemake.output})"
" {log}"
)
HOMER GETDIFFERENTIALPEAKS¶
Detect differentially bound ChIP peaks between samples. For more information, please see the documentation.
Software dependencies¶
- homer ==4.11
Example¶
This wrapper can be used in the following way:
rule homer_getDifferentialPeaks:
input:
# peak/bed file to be tested
peaks="{sample}.peaks.bed",
# tagDirectory of first sample
first="tagDir/{sample}",
# tagDirectory of sample to compare
second="tagDir/second"
output:
"{sample}_diffPeaks.txt"
params:
extra="" # optional params, see homer manual
log:
"logs/diffPeaks/{sample}.log"
wrapper:
"0.65.0/bio/homer/getDifferentialPeaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(getDifferentialPeaks"
" {snakemake.input.peaks}"
" {snakemake.input.first}"
" {snakemake.input.second}"
" {extra}"
" > {snakemake.output})"
" {log}"
)
HOMER MAKETAGDIRECTORY¶
Create a tag directory with the HOMER suite. For more information, please see the documentation.
Software dependencies¶
- homer ==4.11
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule homer_makeTagDir:
input:
# input bam, can be one or a list of files
bam="{sample}.bam",
output:
directory("tagDir/{sample}")
params:
extra="" # optional params, see homer manual
log:
"logs/makeTagDir/{sample}.log"
wrapper:
"0.65.0/bio/homer/makeTagDirectory"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(makeTagDirectory" " {snakemake.output}" " {extra}" " {snakemake.input})" " {log}"
)
HOMER MERGEPEAKS¶
Merge ChIP-Seq peaks from multiple peak files. For more information, please see the documentation. Please be aware that this wrapper does not yet support use of the -prefix
parameter.
Software dependencies¶
- homer ==4.11
Example¶
This wrapper can be used in the following way:
rule homer_mergePeaks:
input:
# input peak files
"peaks/{sample1}.peaks",
"peaks/{sample2}.peaks"
output:
"merged/{sample1}_{sample2}.peaks"
params:
extra="-d given" # optional params, see homer manual
log:
"logs/mergePeaks/{sample1}_{sample2}.log"
wrapper:
"0.65.0/bio/homer/mergePeaks"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
class PrefixNotSupportedError(Exception):
pass
if "-prefix" in extra:
raise PrefixNotSupportedError(
"The use of the -prefix parameter is not yet supported in this wrapper"
)
shell("(mergePeaks" " {snakemake.input}" " {extra}" " > {snakemake.output})" " {log}")
IGV-REPORTS¶
Create self-contained igv.js HTML pages.
Software dependencies¶
- igv-reports =0.9.1
Example¶
This wrapper can be used in the following way:
rule igv_report:
input:
fasta="minigenome.fa",
vcf="variants.vcf",
# any number of additional optional tracks, see igv-reports manual
tracks=["alignments.bam"]
output:
"igv-report.html"
params:
extra="" # optional params, see igv-reports manual
log:
"logs/igv-report.log"
wrapper:
"0.65.0/bio/igv-reports"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for igv-reports."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
tracks = snakemake.input.get("tracks", [])
if tracks:
if isinstance(tracks, str):
tracks = [tracks]
tracks = "--tracks {}".format(" ".join(tracks))
shell(
"create_report {extra} --standalone --output {snakemake.output[0]} {snakemake.input.vcf} {snakemake.input.fasta} {tracks} {log}"
)
INFERNAL¶
For infernal, the following wrappers are available:
INFERNAL CMPRESS¶
Starting from a CM database <cmfile> in standard Infernal-1.1 format, construct binary compressed datafiles for cmscan. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
Software dependencies¶
- infernal=1.1.2
Example¶
This wrapper can be used in the following way:
rule infernal_cmpress:
input:
"test-covariance-model.cm"
output:
"test-covariance-model.cm.i1i",
"test-covariance-model.cm.i1f",
"test-covariance-model.cm.i1m",
"test-covariance-model.cm.i1p"
log:
"logs/cmpress.log"
params:
extra="",
wrapper:
"0.65.0/bio/infernal/cmpress"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Infernal CMpress"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# -F enables overwrite of old (otherwise cmpress will fail if old versions exist)
shell("cmpress -F {snakemake.input} {log}")
INFERNAL CMSCAN¶
cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
Software dependencies¶
- infernal=1.1.2
Example¶
This wrapper can be used in the following way:
rule cmscan_profile:
input:
fasta="test-transcript.fa",
profile="test-covariance-model.cm.i1i"
output:
tblout="tr-infernal-tblout.txt",
log:
"logs/cmscan.log"
params:
evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
extra= "",
#score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
threads: 4
wrapper:
"0.65.0/bio/infernal/cmscan"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Infernal CMscan"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]
assert profile.endswith(".cm"), 'your profile file should end with ".cm"'
# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10) # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = f" -T {float(score_threshold)} "
else:
thresh_cmd = f" -E {float(evalue_threshold)} "
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)
JANNOVAR¶
Annotate predicted effect of nucleotide changes with Jannovar
Software dependencies¶
- jannovar-cli ==0.31
Example¶
This wrapper can be used in the following way:
rule jannovar:
input:
vcf="{sample}.vcf",
pedigree="pedigree_ar.ped" # optional, contains familial relationships
output:
"jannovar/{sample}.vcf.gz"
log:
"logs/jannovar/{sample}.log"
params:
database="hg19_small.ser", # path to jannovar reference dataset
extra="--show-all" # optional parameters
wrapper:
"0.65.0/bio/jannovar"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
pedigree = snakemake.input.get("pedigree", "")
if pedigree:
pedigree = '--pedigree-file "%s"' % pedigree
shell(
"jannovar annotate-vcf --database {snakemake.params.database}"
" --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
" {pedigree} {extra} {log}"
)
KALLISTO¶
For kallisto, the following wrappers are available:
KALLISTO INDEX¶
Index a transcriptome using kallisto.
Software dependencies¶
- kallisto ==0.45.0
Example¶
This wrapper can be used in the following way:
rule kallisto_index:
input:
fasta = "{transcriptome}.fasta"
output:
index = "{transcriptome}.idx"
params:
extra = "--kmer-size=5"
log:
"logs/kallisto_index_{transcriptome}.log"
threads: 1
wrapper:
"0.65.0/bio/kallisto/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for Kallisto index"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file is required"
fasta = " ".join(fasta) if isinstance(fasta, list) else fasta
shell(
"kallisto index " # Tool
"{extra} " # Optional parameters
"--index={snakemake.output.index} " # Output file
"{fasta} " # Input FASTA files
"{log}" # Logging
)
KALLISTO QUANT¶
Pseudoalign reads and quantify transcripts using kallisto.
Software dependencies¶
- kallisto ==0.45.0
Example¶
This wrapper can be used in the following way:
rule kallisto_quant:
input:
fastq = ["reads/{exp}_R1.fastq", "reads/{exp}_R2.fastq"],
index = "index/transcriptome.idx"
output:
directory('quant_results_{exp}')
params:
extra = ""
log:
"logs/kallisto_quant_{exp}.log"
threads: 1
wrapper:
"0.65.0/bio/kallisto/quant"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for Kallisto quant"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTQ files
fastq = snakemake.input.get("fastq")
assert fastq is not None, "input-> a FASTQ-file is required"
fastq = " ".join(fastq) if isinstance(fastq, list) else fastq
shell(
"kallisto quant " # Tool
"{extra} " # Optional parameters
"--threads={snakemake.threads} " # Number of threads
"--index={snakemake.input.index} " # Input file
"--output-dir={snakemake.output} " # Output directory
"{fastq} " # Input FASTQ files
"{log}" # Logging
)
LAST¶
For last, the following wrappers are available:
LASTAL¶
LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)
Software dependencies¶
- last=874
Example¶
This wrapper can be used in the following way:
rule lastal_nucl_x_nucl:
input:
data="test-transcript.fa",
lastdb="test-transcript.fa.prj"
output:
# only one of these outputs is allowed
maf="test-transcript.maf",
#tab="test-transcript.tab",
#blasttab="test-transcript.blasttab",
#blasttabplus="test-transcript.blasttabplus",
params:
#Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
D_length=1000000,
extra=""
log:
"logs/lastal/test.log"
threads: 8
wrapper:
"0.65.0/bio/last/lastal"
rule lastal_nucl_x_prot:
input:
data="test-transcript.fa",
lastdb="test-protein.fa.prj"
output:
# only one of these outputs is allowed
maf="test-tr-x-prot.maf"
#tab="test-tr-x-prot.tab",
#blasttab="test-tr-x-prot.blasttab",
#blasttabplus="test-tr-x-prot.blasttabplus",
params:
frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
extra="",
log:
"logs/lastal/test.log"
threads: 8
wrapper:
"0.65.0/bio/last/lastal"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
""" Snakemake wrapper for lastal """
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000)) # last default
# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"
out_cmd = ""
if maf_out:
out_cmd = "-f {}".format("MAF")
outF = maf_out
elif tab_out:
out_cmd = "-f {}".format("TAB")
outF = tab_out
if btab_out:
out_cmd = "-f {}".format("BlastTab")
outF = btab_out
if btabplus_out:
out_cmd = "-f {}".format("BlastTab+")
outF = btabplus_out
frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
f_cmd = f"-F {frameshift_cost}"
lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]
shell(
"lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)
LASTDB¶
LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)
Software dependencies¶
- last=874
Example¶
This wrapper can be used in the following way:
rule lastdb_transcript:
input:
"test-transcript.fa"
output:
"test-transcript.fa.prj",
params:
protein_input=False,
extra=""
log:
"logs/lastdb/test-transcript.log"
wrapper:
"0.65.0/bio/last/lastdb"
rule lastdb_protein:
input:
"test-protein.fa"
output:
"test-protein.fa.prj",
params:
protein_input=True,
extra=""
log:
"logs/lastdb/test-protein.log"
wrapper:
"0.65.0/bio/last/lastdb"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
protein_cmd = ""
protein = snakemake.params.get("protein_input", False)
if protein:
protein_cmd = " -p "
shell("lastdb {extra} {protein_cmd} -P {snakemake.threads} {snakemake.input} {log}")
LOFREQ¶
For lofreq, the following wrappers are available:
LOFREQ CALL¶
simply call variants
Software dependencies¶
- samtools ==1.6
- lofreq ==2.1.3.1
Example¶
This wrapper can be used in the following way:
rule lofreq:
input:
bam="data/{sample}.bam",
bai="data/{sample}.bai"
output:
"calls/{sample}.vcf"
log:
"logs/lofreq_call/{sample}.log"
params:
ref="data/genome.fasta",
extra=""
threads: 8
wrapper:
"0.65.0/bio/lofreq/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference must be provided")
bam_input = snakemake.input.bam
bai_input = snakemake.input.bai
if bam_input is None:
raise ValueError("Missing bam input file!")
if bai_input is None:
raise ValueError("Missing bai input file!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
raise ValueError("Only expecting one output file: " + str(output_file) + "!")
shell(
"lofreq call-parallel "
" --pp-threads {snakemake.threads}"
" -f {ref}"
" {bam_input}"
" -o {output_file}"
" {extra}"
" {log}"
)
MACS2¶
For macs2, the following wrappers are available:
MACS2 CALLPEAK¶
MACS2 callpeak
model-based analysis tool for ChIP-sequencing that calls peaks from alignment results. For usage information about MACS2 callpeak
, please see the documentation and the command line help. For more information about MACS2
, also see the source code and published article. Depending on the selected extension(s), the option(s) will be set automatically (please see table below). Please note that there are extensions, that are incompatible with each other, because they require the –broad option either to be enabled or disabled.
Extension for the output files Description Format Option NAME_peaks.xls a table with information about called
peaks
excel NAME_control_lambda.bdg local biases estimated for each genomic
location from the control sample
bedGraph –bdg or -B NAME_treat_pileup.bdg pileup signals from treatment sample bedGraph –bdg or -B NAME_peaks.broadPeak similar to _peaks.narrowPeak file,
except for missing the annotating peak
summits
BED 6+3 –broad NAME_peaks.gappedPeak contains the broad region and narrow
peaks
BED 12+3 –broad NAME_peaks.narrowPeak contains the peak locations, peak
summit, p-value and q-value
BED 6+4 if not set –broad NAME_summits.bed peak summits locations for every peak BED if not set –broad
Software dependencies¶
- macs2>=2.2
Example¶
This wrapper can be used in the following way:
rule callpeak:
input:
treatment="samples/a.bam", # required: treatment sample(s)
control="samples/b.bam" # optional: control sample(s)
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("callpeak/basename",
"_peaks.xls", ### required
### optional output files
"_peaks.narrowPeak",
"_summits.bed"
)
log:
"logs/macs2/callpeak.log"
params:
"-f BAM -g hs --nomodel"
wrapper:
"0.65.0/bio/macs2/callpeak"
rule callpeak_options:
input:
treatment="samples/a.bam", # required: treatment sample(s)
control="samples/b.bam" # optional: control sample(s)
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("callpeak_options/basename",
"_peaks.xls", ### required
### optional output files
# these output extensions internally set the --bdg or -B option:
"_treat_pileup.bdg",
"_control_lambda.bdg",
# these output extensions internally set the --broad option:
"_peaks.broadPeak",
"_peaks.gappedPeak"
)
log:
"logs/macs2/callpeak.log"
params:
"-f BAM -g hs --nomodel"
wrapper:
"0.65.0/bio/macs2/callpeak"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
import sys
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
in_contr = snakemake.input.get("control")
params = "{}".format(snakemake.params)
opt_input = ""
out_dir = ""
ext = "_peaks.xls"
out_file = [o for o in snakemake.output if o.endswith(ext)][0]
out_name = os.path.basename(out_file[: -len(ext)])
out_dir = os.path.dirname(out_file)
if in_contr:
opt_input = "-c {contr}".format(contr=in_contr)
if out_dir:
out_dir = "--outdir {dir}".format(dir=out_dir)
if any(out.endswith(("_peaks.narrowPeak", "_summits.bed")) for out in snakemake.output):
if any(
out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak"))
for out in snakemake.output
):
sys.exit(
"Output files with _peaks.narrowPeak and/or _summits.bed extensions cannot be created together with _peaks.broadPeak and/or _peaks.gappedPeak extended output files.\n"
"For usable extensions please see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.\n"
)
else:
if " --broad" in params:
sys.exit(
"If --broad option in params is given, the _peaks.narrowPeak and _summits.bed files will not be created. \n"
"Remove --broad option from params if these files are needed.\n"
)
if any(
out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output
):
if "--broad" not in params:
params += " --broad "
if any(
out.endswith(("_treat_pileup.bdg", "_control_lambda.bdg"))
for out in snakemake.output
):
if all(p not in params for p in ["--bdg", "-B"]):
params += " --bdg "
else:
if any(p in params for p in ["--bdg", "-B"]):
sys.exit(
"If --bdg or -B option in params is given, the _control_lambda.bdg and _treat_pileup.bdg extended files must be specified in output. \n"
)
shell(
"(macs2 callpeak "
"-t {snakemake.input.treatment} "
"{opt_input} "
"{out_dir} "
"-n {out_name} "
"{params}) {log}"
)
MINIMAP2¶
For minimap2, the following wrappers are available:
MINIMAP2¶
A versatile pairwise aligner for genomic and spliced nucleotide sequences https://lh3.github.io/minimap2
Software dependencies¶
- minimap2 ==2.17
Example¶
This wrapper can be used in the following way:
rule minimap2:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"]
output:
"aligned/{input1}_aln.paf"
log:
"logs/minimap2/{input1}.log"
params:
extra="-x map-pb" # optional
threads: 3
wrapper:
"0.65.0/bio/minimap2/aligner"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
- Michael Hall
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
inputQuery = " ".join(snakemake.input.query)
shell(
"(minimap2 -t {snakemake.threads} {extra} -o {snakemake.output[0]} "
"{snakemake.input.target} {inputQuery}) {log}"
)
MINIMAP2 INDEX¶
creates a minimap2 index
Software dependencies¶
- minimap2 ==2.17
Example¶
This wrapper can be used in the following way:
rule minimap2_index:
input:
target="target/{input1}.fasta"
output:
"{input1}.mmi"
log:
"logs/minimap2_index/{input1}.log"
params:
extra="" # optional additional args
threads: 3
wrapper:
"0.65.0/bio/minimap2/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(minimap2 -t {snakemake.threads} {extra} "
"-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)
MSISENSOR¶
For msisensor, the following wrappers are available:
MSISENSOR MSI¶
Score your MSI with MSIsensor
Software dependencies¶
- msisensor ==0.5
Example¶
This wrapper can be used in the following way:
rule test_msisensor_msi:
input:
normal = "example.normal.bam",
tumor = "example.tumor.bam",
microsat = "example.microsate.sites"
output:
"example.msi",
"example.msi_dis",
"example.msi_germline",
"example.msi_somatic"
message:
"Testing MSIsensor msi"
threads:
1
log:
"example.log"
params:
out_prefix = "example.msi"
wrapper:
"0.65.0/bio/msisensor/msi"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
Code¶
"""Snakemake script for MSISensor msi"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from os.path import commonprefix
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")
# Detemining common prefix in output files
# to fill the requested parameter '-o'
prefix = commonprefix(snakemake.output)
shell(
"msisensor msi" # Tool and its sub-command
" -d {snakemake.input.microsat}" # Path to homopolymer/microsat file
" -n {snakemake.input.normal}" # Path to normal bam
" -t {snakemake.input.tumor}" # Path to tumor bam
" -o {prefix}" # Path to output distribution file
" -b {snakemake.threads}" # Maximum number of threads used
" {extra}" # Optional extra parameters
" {log}" # Logging behavior
)
MSISENSOR SCAN¶
Scan homopolymers and microsatelites with MSIsensor
Software dependencies¶
- msisensor ==0.5
Example¶
This wrapper can be used in the following way:
rule test_msisensor_scan:
input:
"genome.fasta"
output:
"microsat.list"
message:
"Testing MSISensor scan"
threads:
1
params:
extra = ""
log:
"logs/msisensor_scan.log"
wrapper:
"0.65.0/bio/msisensor/scan"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
Code¶
"""Snakemake script for MSISensor Scan"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")
shell(
"msisensor scan " # Tool and its sub-command
"-d {snakemake.input} " # Path to fasta file
"-o {snakemake.output} " # Path to output file
"{extra} " # Optional extra parameters
"{log}" # Logging behavior
)
MULTIQC¶
Generate qc report using multiqc.
Software dependencies¶
- multiqc ==1.9
Example¶
This wrapper can be used in the following way:
rule multiqc:
input:
expand("samtools_stats/{sample}.txt", sample=["a", "b"])
output:
"qc/multiqc.html"
params:
"" # Optional: extra parameters for multiqc.
log:
"logs/multiqc.log"
wrapper:
"0.65.0/bio/multiqc"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
input_dirs = set(path.dirname(fp) for fp in snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"multiqc"
" {snakemake.params}"
" --force"
" -o {output_dir}"
" -n {output_name}"
" {input_dirs}"
" {log}"
)
NANOSIM-H¶
NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.
Software dependencies¶
- nanosim-h ==1.1.0.4
Example¶
This wrapper can be used in the following way:
rule nanosimh:
input:
"{sample}.fa"
output:
reads = "{sample}.simulated.fa",
log = "{sample}.simulated.log",
errors = "{sample}.simulated.errors.txt"
params:
extra = "",
num_reads = 10,
perfect_reads = True,
min_read_len = 10,
log:
"logs/nanosim-h/test/{sample}.log"
wrapper:
"0.65.0/bio/nanosim-h"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for NanoSim-H."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
def is_header(query):
return query.startswith(">")
def get_length_of_longest_sequence(fh):
current_length = 0
all_lengths = []
for line in fh:
if not is_header(line):
current_length += len(line.rstrip())
else:
all_lengths.append(current_length)
current_length = 0
all_lengths.append(current_length)
return max(all_lengths)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)
# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
with open(snakemake.input[0]) as fh:
max_read_len = get_length_of_longest_sequence(fh)
perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"nanosim-h {extra} "
"{perfect_reads_flag} "
"--max-len {max_read_len} "
"--min-len {min_read_len} "
"--profile {profile} "
"--number {num_reads} "
"--out-pref {prefix} "
"{snakemake.input} {log}"
)
NGS-DISAMBIGUATE¶
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.
Software dependencies¶
- ngs-disambiguate ==2016.11.10
- bamtools ==2.4.0
Example¶
This wrapper can be used in the following way:
rule disambiguate:
input:
a="mapped/{sample}.a.bam",
b="mapped/{sample}.b.bam"
output:
a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
a_disambiguated='disambiguate/{sample}.graft.bam',
b_disambiguated='disambiguate/{sample}.host.bam',
summary='qc/disambiguate/{sample}.txt'
params:
algorithm="bwa",
# optional: Prefix to use for output. If omitted, a
# suitable value is guessed from the output paths. Prefix
# is used for the intermediate output paths, as well as
# sample name in summary file.
prefix="{sample}",
extra=""
wrapper:
"0.65.0/bio/ngs-disambiguate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")
output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
prefix = path.splitext(path.basename(snakemake.output.summary))[0]
# Run command.
shell(
"ngs_disambiguate"
" {extra}"
" -o {output_dir}"
" -s {prefix}"
" -a {snakemake.params.algorithm}"
" {snakemake.input.a}"
" {snakemake.input.b}"
)
# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)
output_map = {
output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
output_base + "_summary.txt": snakemake.output.summary,
}
for src, dest in output_map.items():
if src != dest:
shell("mv {src} {dest}")
OPTITYPE¶
Precision 4-digit HLA-I-typing from NGS data based on integer linear programming. Use razers3 beforehand to generate input fastq files only mapping to HLA-regions. Please see https://github.com/FRED-2/OptiType
Software dependencies¶
- optitype ==1.3.4
Example¶
This wrapper can be used in the following way:
rule optitype:
input:
# list of input reads
reads=["reads/{sample}_1.fished.fastq", "reads/{sample}_2.fished.fastq"]
output:
multiext("optitype/{sample}", "_coverage_plot.pdf", "_result.tsv")
log:
"logs/optitype/{sample}.log"
params:
# Type of sequencing data. Can be 'dna' or 'rna'. Default is 'dna'.
sequencing_type="dna",
# optiype config file, optional
config="",
# additional parameters
extra=""
wrapper:
"0.65.0/bio/optitype"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
outdir = os.path.dirname(snakemake.output[0])
# get sequencing type
seq_type = snakemake.params.get("sequencing_type", "dna")
seq_type = "--{}".format(seq_type)
# check if non-default config.ini is used
config = snakemake.params.get("config", "")
if any(config):
config = "--config {}".format(config)
shell(
"(OptiTypePipeline.py"
" --input {snakemake.input.reads}"
" --outdir {outdir}"
" --prefix {snakemake.wildcards.sample}"
" {seq_type}"
" {config}"
" {extra})"
" {log}"
)
PALADIN¶
For paladin, the following wrappers are available:
PALADIN ALIGN¶
Align nucleotide reads to a protein fasta file (that has been indexed with paladin index). PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Software dependencies¶
- paladin=1.4.4
- samtools=1.5
Example¶
This wrapper can be used in the following way:
rule paladin_align:
input:
reads=["reads/reads.left.fq.gz"],
index="index/prot.fasta.bwt",
output:
"paladin_mapped/{sample}.bam" # will output BAM format if output file ends with ".bam", otherwise SAM format
log:
"logs/paladin/{sample}.log"
threads: 4
wrapper:
"0.65.0/bio/paladin/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for PALADIN alignment"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
r = snakemake.input.get("reads")
assert (
r is not None
), "reads are required as input. If you have paired end reads, please merge them first (e.g. with PEAR)"
index = snakemake.input.get("index")
assert (
index is not None
), "please index your assembly and provide the basename (with'.bwt' extension) via the 'index' input param"
index_base = str(index).rsplit(".bwt")[0]
outfile = snakemake.output
# if bam output, pipe to bam!
output_cmd = " | samtools view -Sb - > " if str(outfile).endswith(".bam") else " -o "
min_orf_len = snakemake.params.get("f", "250")
shell(
"paladin align -f {min_orf_len} -t {snakemake.threads} {extra} {index_base} {r} {output_cmd} {outfile}"
)
PALADIN INDEX¶
Index a protein fasta file for mapping with paladin. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Software dependencies¶
- paladin=1.4.4
- samtools=1.5
Example¶
This wrapper can be used in the following way:
rule paladin_index:
input:
"prot.fasta",
output:
"index/prot.fasta.bwt"
log:
"logs/paladin/prot_index.log"
params:
reference_type=3
wrapper:
"0.65.0/bio/paladin/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Paladin Index."""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
# this wrapper temporarily copies your assembly into the output dir
# so that all the paladin output files end up in the desired spot
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
input_assembly = snakemake.input
annotation = snakemake.input.get("gff", "")
paladin_index = str(snakemake.output)
reference_type = snakemake.params.get("reference_type", "3")
assert int(reference_type) in [1, 2, 3, 4]
ref_type_cmd = "-r" + str(reference_type)
output_base = paladin_index.rsplit(".bwt")[0]
shell("cp {input_assembly} {output_base}")
shell("paladin index {ref_type_cmd} {output_base} {annotation} {extra} {log}")
shell("rm -f {output_base}")
PALADIN PREPARE¶
Download and prepare uniprot refs for paladin mapping. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Software dependencies¶
- paladin=1.4.4
- samtools=1.5
Example¶
This wrapper can be used in the following way:
rule paladin_prepare:
output:
"uniprot_sprot.fasta.gz",
"uniprot_sprot.fasta.gz.pro"
log:
"logs/paladin/prepare_sprot.log"
params:
reference_type=1, # 1=swiss-prot, 2=uniref90
wrapper:
"0.65.0/bio/paladin/prepare"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Paladin Prepare"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
reference_type = snakemake.params.get(
"reference_type", "1"
) # download swissprot as default
assert int(reference_type) in [1, 2]
ref_type_cmd = "-r" + str(reference_type)
shell("paladin prepare {ref_type_cmd} {extra} {log}")
PEAR¶
PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger
Software dependencies¶
- pear=0.9.6
Example¶
This wrapper can be used in the following way:
rule pear_merge:
input:
read1="reads/reads.left.fq.gz",
read2="reads/reads.right.fq.gz"
output:
assembled="pear/reads_pear_assembled.fq.gz",
discarded="pear/reads_pear_discarded.fq.gz",
unassembled_read1="pear/reads_pear_unassembled_r1.fq.gz",
unassembled_read2="pear/reads_pear_unassembled_r2.fq.gz",
log:
'logs/pear.log'
params:
pval=".01",
extra=""
threads: 4
resources:
mem_mb=4000 # define amount of memory to be used by pear
wrapper:
"0.65.0/bio/pear"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
r1 = snakemake.input.get("read1")
r2 = snakemake.input.get("read2")
assert r1 is not None and r2 is not None, "r1 and r2 files are required as input"
assembled = snakemake.output.get("assembled")
assert assembled is not None, "require 'assembled' outfile"
gzip = True if assembled.endswith(".gz") else False
out_base, out_end = assembled.rsplit(".f")
out_end = ".f" + out_end
df_assembled = out_base + ".assembled.fastq"
df_discarded = out_base + ".discarded.fastq"
df_unassembled_r1 = out_base + ".unassembled.forward.fastq"
df_unassembled_r2 = out_base + ".unassembled.reverse.fastq"
df_outputs = [df_assembled, df_discarded, df_unassembled_r1, df_unassembled_r2]
discarded = snakemake.output.get("discarded", out_base + ".discarded" + out_end)
unassembled_r1 = snakemake.output.get(
"unassembled_read1", out_base + ".unassembled_r1" + out_end
)
unassembled_r2 = snakemake.output.get(
"unassembled_read2", out_base + ".unassembled_r2" + out_end
)
final_outputs = [assembled, discarded, unassembled_r1, unassembled_r2]
def move_files(in_list, out_list, gzip):
for f, o in zip(in_list, out_list):
if f != o:
if gzip:
shell("gzip -9 -c {f} > {o}")
shell("rm -f {f}")
else:
shell("cp {f} {o}")
shell("rm -f {f}")
elif gzip:
shell("gzip -9 {f}")
pval = float(snakemake.params.get("pval", ".01"))
max_mem = snakemake.resources.get("mem_mb", "4000")
extra = snakemake.params.get("extra", "")
shell(
"pear -f {r1} -r {r2} -p {pval} -j {snakemake.threads} -y {max_mem} {extra} -o {out_base} {log}"
)
move_files(df_outputs, final_outputs, gzip)
PICARD¶
For picard, the following wrappers are available:
PICARD ADDORREPLACEREADGROUPS¶
Add or replace read groups with picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule replace_rg:
input:
"mapped/{sample}.bam"
output:
"fixed-rg/{sample}.bam"
log:
"logs/picard/replace_rg/{sample}.log"
params:
"RGLB=lib1 RGPL=illumina RGPU={sample} RGSM={sample}"
wrapper:
"0.65.0/bio/picard/addorreplacereadgroups"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell(
"picard AddOrReplaceReadGroups {snakemake.params} I={snakemake.input} "
"O={snakemake.output} &> {snakemake.log}"
)
PICARD BEDTOINTERVALLIST¶
picard BedToIntervalList converts a BED file to Picard Interval List format.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule bed_to_interval_list:
input:
bed="resources/a.bed",
dict="resources/genome.dict"
output:
"a.interval_list"
log:
"logs/picard/bedtointervallist/a.log"
params:
# optional parameters
"SORT=true " # sort output interval list before writing
wrapper:
"0.65.0/bio/picard/bedtointervallist"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Fabian Kilpert
Code¶
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"picard BedToIntervalList "
"{snakemake.params} "
"INPUT={snakemake.input.bed} "
"SEQUENCE_DICTIONARY={snakemake.input.dict} "
"OUTPUT={snakemake.output} "
"{log} "
)
PICARD COLLECTALIGNMENTSUMMARYMETRICS¶
Collect metrics on aligned reads with picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
ref="genome.fasta",
bam="mapped/{sample}.bam"
output:
"stats/{sample}.summary.txt"
log:
"logs/picard/alignment-summary/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.65.0/bio/picard/collectalignmentsummarymetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"picard CollectAlignmentSummaryMetrics {snakemake.params} "
"INPUT={snakemake.input.bam} OUTPUT={snakemake.output[0]} "
"REFERENCE_SEQUENCE={snakemake.input.ref} {log}"
)
PICARD COLLECTHSMETRICS¶
Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule picard_collect_hs_metrics:
input:
bam="mapped/{sample}.bam",
reference="genome.fasta",
# Baits and targets should be given as interval lists. These can
# be generated from bed files using picard BedToIntervalList.
bait_intervals="regions.intervals",
target_intervals="regions.intervals"
output:
"stats/hs_metrics/{sample}.txt"
params:
# Optional extra arguments. Here we reduce sample size
# to reduce the runtime in our unit test.
extra="SAMPLE_SIZE=1000"
log:
"logs/picard_collect_hs_metrics/{sample}.log"
wrapper:
"0.65.0/bio/picard/collecthsmetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard CollectHSMetrics."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard CollectHsMetrics"
" {extra}"
" INPUT={snakemake.input.bam}"
" OUTPUT={snakemake.output[0]}"
" REFERENCE_SEQUENCE={snakemake.input.reference}"
" BAIT_INTERVALS={snakemake.input.bait_intervals}"
" TARGET_INTERVALS={snakemake.input.target_intervals}"
" {log}"
)
PICARD COLLECTINSERTSIZEMETRICS¶
Collect metrics on insert size of paired end reads with picard tools.
Software dependencies¶
- picard ==2.22.1
- r-base ==3.6.2
Example¶
This wrapper can be used in the following way:
rule insert_size:
input:
"mapped/{sample}.bam"
output:
txt="stats/{sample}.isize.txt",
pdf="stats/{sample}.isize.pdf"
log:
"logs/picard/insert_size/{sample}.log"
params:
# optional parameters (e.g. relax checks as below)
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE"
wrapper:
"0.65.0/bio/picard/collectinsertsizemetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"picard CollectInsertSizeMetrics {snakemake.params} "
"INPUT={snakemake.input} OUTPUT={snakemake.output.txt} "
"HISTOGRAM_FILE={snakemake.output.pdf} {log}"
)
PICARD COLLECTMULTIPLEMETRICS¶
A picard
meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard
’s documentation. For more information about picard
, also see the source code.
You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below).
Tool Extension(s) for the output files CollectAlignmentSummaryMetrics “.alignment_summary_metrics” CollectInsertSizeMetrics “.insert_size_metrics”,
“.insert_size_histogram.pdf”
QualityScoreDistribution “.quality_distribution_metrics”,
“.quality_distribution.pdf”
MeanQualityByCycle “.quality_by_cycle_metrics”,
“.quality_by_cycle.pdf”
CollectBaseDistributionByCycle “.base_distribution_by_cycle_metrics”,
“.base_distribution_by_cycle.pdf”
CollectGcBiasMetrics “.gc_bias.detail_metrics”,
“.gc_bias.summary_metrics”,
“.gc_bias.pdf”
RnaSeqMetrics “.rna_metrics” CollectSequencingArtifactMetrics “.bait_bias_detail_metrics”,
“.bait_bias_summary_metrics”,
“.error_summary_metrics”,
“.pre_adapter_detail_metrics”,
“.pre_adapter_summary_metrics”
CollectQualityYieldMetrics “.quality_yield_metrics”
Software dependencies¶
- picard ==2.23.0
Example¶
This wrapper can be used in the following way:
rule collect_multiple_metrics:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta"
output:
# Through the output file extensions the different tools for the metrics can be selected
# so that it is not necessary to specify them under params with the "PROGRAM" option.
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
multiext("stats/{sample}",
".alignment_summary_metrics",
".insert_size_metrics",
".insert_size_histogram.pdf",
".quality_distribution_metrics",
".quality_distribution.pdf",
".quality_by_cycle_metrics",
".quality_by_cycle.pdf",
".base_distribution_by_cycle_metrics",
".base_distribution_by_cycle.pdf",
".gc_bias.detail_metrics",
".gc_bias.summary_metrics",
".gc_bias.pdf",
".rna_metrics",
".bait_bias_detail_metrics",
".bait_bias_summary_metrics",
".error_summary_metrics",
".pre_adapter_detail_metrics",
".pre_adapter_summary_metrics",
".quality_yield_metrics"
)
resources:
# This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
mem_gb=3
log:
"logs/picard/multiple_metrics/{sample}.log"
params:
# optional parameters
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE "
"REF_FLAT=ref_flat.txt " # is required if RnaSeqMetrics are used
wrapper:
"0.65.0/bio/picard/collectmultiplemetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Antonie Vietor
Code¶
__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import sys
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
res = snakemake.resources.get("mem_gb", "3")
if not res or res is None:
res = 3
exts_to_prog = {
".alignment_summary_metrics": "CollectAlignmentSummaryMetrics",
".insert_size_metrics": "CollectInsertSizeMetrics",
".insert_size_histogram.pdf": "CollectInsertSizeMetrics",
".quality_distribution_metrics": "QualityScoreDistribution",
".quality_distribution.pdf": "QualityScoreDistribution",
".quality_by_cycle_metrics": "MeanQualityByCycle",
".quality_by_cycle.pdf": "MeanQualityByCycle",
".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle",
".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle",
".gc_bias.detail_metrics": "CollectGcBiasMetrics",
".gc_bias.summary_metrics": "CollectGcBiasMetrics",
".gc_bias.pdf": "CollectGcBiasMetrics",
".rna_metrics": "RnaSeqMetrics",
".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics",
".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics",
".error_summary_metrics": "CollectSequencingArtifactMetrics",
".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics",
".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics",
".quality_yield_metrics": "CollectQualityYieldMetrics",
}
progs = set()
for file in snakemake.output:
matched = False
for ext in exts_to_prog:
if file.endswith(ext):
progs.add(exts_to_prog[ext])
matched = True
if not matched:
sys.exit(
"Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
)
programs = " PROGRAM=" + " PROGRAM=".join(progs)
out = str(snakemake.wildcards.sample) # as default
output_file = str(snakemake.output[0])
for ext in exts_to_prog:
if output_file.endswith(ext):
out = output_file[: -len(ext)]
break
shell(
"(picard -Xmx{res}g CollectMultipleMetrics "
"I={snakemake.input.bam} "
"O={out} "
"R={snakemake.input.ref} "
"{snakemake.params}{programs}) {log}"
)
PICARD COLLECTTARGETEDPCRMETRICS¶
Collect metric information for target pcr metrics runs, with picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule CollectTargetedPcrMetrics:
input:
bam="mapped/{sample}.bam",
amplicon_intervals="amplicon.list",
target_intervals="target.list"
output:
"stats/{sample}.pcr.txt"
log:
"logs/picard/collecttargetedpcrmetrics/{sample}.log"
params:
extra=""
wrapper:
"0.65.0/bio/picard/collecttargetedpcrmetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@mail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
shell(
"picard CollectTargetedPcrMetrics "
"{extra} "
"INPUT={snakemake.input.bam} "
"OUTPUT={snakemake.output[0]} "
"AMPLICON_INTERVALS={snakemake.input.amplicon_intervals} "
"TARGET_INTERVALS={snakemake.input.target_intervals} "
"{log}"
)
PICARD CREATESEQUENCEDICTIONARY¶
Create a .dict file for a given FASTA file
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule create_dict:
input:
"genome.fasta"
output:
"genome.dict"
log:
"logs/picard/create_dict.log"
params:
extra="" # optional: extra arguments for picard.
wrapper:
"0.65.0/bio/picard/createsequencedictionary"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard "
"CreateSequenceDictionary "
"{extra} "
"R={snakemake.input[0]} "
"O={snakemake.output[0]} "
"{log}"
)
PICARD MARKDUPLICATES¶
Mark PCR and optical duplicates with picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule mark_duplicates:
input:
"mapped/{sample}.bam"
output:
bam="dedup/{sample}.bam",
metrics="dedup/{sample}.metrics.txt"
log:
"logs/picard/dedup/{sample}.log"
params:
"REMOVE_DUPLICATES=true"
wrapper:
"0.65.0/bio/picard/markduplicates"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
"OUTPUT={snakemake.output.bam} METRICS_FILE={snakemake.output.metrics} "
"{log}"
)
PICARD MERGESAMFILES¶
Merge sam/bam files using picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule merge_bams:
input:
expand("mapped/{sample}.bam", sample=["a", "b"])
output:
"merged.bam"
log:
"logs/picard_mergesamfiles.log"
params:
"VALIDATION_STRINGENCY=LENIENT"
wrapper:
"0.65.0/bio/picard/mergesamfiles"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard"
" MergeSamFiles"
" {snakemake.params}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}"
)
PICARD MERGEVCFS¶
Merge vcf files using picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule merge_vcfs:
input:
vcfs=["snvs.chr1.vcf", "snvs.chr2.vcf"]
output:
"snvs.vcf"
log:
"logs/picard/mergevcfs.log"
params:
extra=""
wrapper:
"0.65.0/bio/picard/mergevcfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
inputs = " ".join("INPUT={}".format(f) for f in snakemake.input.vcfs)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"picard"
" MergeVcfs"
" {extra}"
" {inputs}"
" OUTPUT={snakemake.output[0]}"
" {log}"
)
PICARD REVERTSAM¶
Reverts SAM or BAM files to a previous state. .
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule revert_bam:
input:
"mapped/{sample}.bam"
output:
"revert/{sample}.bam"
log:
"logs/picard/revert_sam/{sample}.log"
params:
extra="SANITIZE=true" # optional: Extra arguments for picard.
wrapper:
"0.65.0/bio/picard/revertsam"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard RevertSam."""
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard"
" RevertSam"
" {extra}"
" INPUT={snakemake.input[0]}"
" OUTPUT={snakemake.output[0]}"
" {log}"
)
PICARD SOMTOFASTQ¶
Converts a SAM or BAM file to FASTQ.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule bam_to_fastq:
input:
"mapped/{sample}.bam"
output:
fastq1="reads/{sample}.R1.fastq",
fastq2="reads/{sample}.R2.fastq"
log:
"logs/picard/sam_to_fastq/{sample}.log"
params:
extra="" # optional: Extra arguments for picard.
wrapper:
"0.65.0/bio/picard/samtofastq"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)
if not isinstance(fastq1, str):
raise ValueError("f1 needs to be provided")
output = " FASTQ=" + fastq1
if isinstance(fastq2, str):
output += " SECOND_END_FASTQ=" + fastq2
if isinstance(fastq_unpaired, str):
if not isinstance(fastq2, str):
raise ValueError("f2 is required if fastq_unpaired is set")
else:
output += " UNPAIRED_FASTQ=" + fastq_unpaired
shell(
"picard" " SamToFastq" " {extra}" " INPUT={snakemake.input[0]}" " {output}" " {log}"
)
PICARD SORTSAM¶
Sort sam/bam files using picard tools.
Software dependencies¶
- picard ==2.22.1
Example¶
This wrapper can be used in the following way:
rule sort_bam:
input:
"mapped/{sample}.bam"
output:
"sorted/{sample}.bam"
log:
"logs/picard/sort_sam/{sample}.log"
params:
sort_order="coordinate",
extra="VALIDATION_STRINGENCY=LENIENT" # optional: Extra arguments for picard.
wrapper:
"0.65.0/bio/picard/sortsam"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"picard"
" SortSam"
" {extra}"
" INPUT={snakemake.input[0]}"
" OUTPUT={snakemake.output[0]}"
" SORT_ORDER={snakemake.params.sort_order}"
" {log}"
)
PINDEL¶
For pindel, the following wrappers are available:
PINDEL¶
Call variants with pindel.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]
rule pindel:
input:
ref="genome.fasta",
# samples to call
samples=["mapped/a.bam"],
# bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
config="pindel_config.txt"
output:
expand("pindel/all_{type}", type=pindel_types)
params:
# prefix must be consistent with output files
prefix="pindel/all",
extra="" # optional parameters (except -i, -f, -o)
log:
"logs/pindel.log"
threads: 4
wrapper:
"0.65.0/bio/pindel/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"pindel -T {snakemake.threads} {snakemake.params.extra} -i {snakemake.input.config} "
"-f {snakemake.input.ref} -o {snakemake.params.prefix} {log}"
)
PINDEL2VCF¶
Convert pindel output to vcf.
Software dependencies¶
- pindel ==0.2.5b8
Example¶
This wrapper can be used in the following way:
rule pindel2vcf:
input:
ref="genome.fasta",
pindel="pindel/all_{type}"
output:
"pindel/all_{type}.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.{type}.log"
wrapper:
"0.65.0/bio/pindel/pindel2vcf"
rule pindel2vcf_multi_input:
input:
ref="genome.fasta",
pindel=["pindel/all_D", "pindel/all_INV"]
output:
"pindel/all.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.log"
wrapper:
"0.65.0/bio/pindel/pindel2vcf"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
expected_endings = [
"INT",
"D",
"SI",
"INV",
"INV_final" "TD",
"LI",
"BP",
"CloseEndMapped",
"RP",
]
def split_file_name(file_parts, file_ending_index):
return (
"_".join(file_parts[:file_ending_index]),
"_".join(file_parts[file_ending_index]),
)
def process_input_path(input_file):
"""
:params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
:return: ""/path/to/file", "all"
"""
file_path, file_name = os.path.split(input_file)
file_parts = file_name.split("_")
# seperate ending and name, to name: all ending: D or name: all ending: INV_final
file_name, file_ending = split_file_name(
file_parts, -2 if file_name.endswith("_final") else -1
)
if not file_ending in expected_endings:
raise Exception("Unexpected variant type: " + file_ending)
return file_path, file_name
with tempfile.TemporaryDirectory() as tmpdirname:
input_flag = "-p"
input_file = snakemake.input.get("pindel")
if isinstance(input_file, list) and len(input_file) > 1:
input_flag = "-P"
input_path, input_name = process_input_path(input_file[0])
input_file = os.path.join(input_path, input_name)
for variant_input in snakemake.input.pindel:
if not variant_input.startswith(input_file):
raise Exception(
"Unable to extract common path from multi file input, expect path is: "
+ input_file
)
if not os.path.isfile(variant_input):
raise Exception('Input "' + input_file + '" is not a file!')
os.symlink(
os.path.abspath(variant_input),
os.path.join(tmpdirname, os.path.basename(variant_input)),
)
input_file = os.path.join(tmpdirname, input_name)
shell(
"pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}"
)
PLASS¶
Plass (Protein-Level ASSembler) is software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.
Software dependencies¶
- plass=2.c7e35
Example¶
This wrapper can be used in the following way:
rule plass_paired:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
output:
"plass/prot.fasta"
log:
"logs/plass.log"
params:
extra=""
threads: 4
wrapper:
"0.65.0/bio/plass"
rule plass_single:
input:
single=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
output:
"plass/prot_single.fasta"
log:
"logs/plass_single.log"
params:
extra=""
threads: 4
wrapper:
"0.65.0/bio/plass"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for PLASS Protein-Level Assembler."""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2018, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
# allow multiple input files for single assembly
left = snakemake.input.get("left")
single = snakemake.input.get("single")
assert (
left is not None or single is not None
), "please check read inputs: either left/right or single read file inputs are required"
if left:
left = (
[snakemake.input.left]
if isinstance(snakemake.input.left, str)
else snakemake.input.left
)
right = snakemake.input.get("right")
assert (
right is not None
), "please input 'right' reads or specify that the reads are 'single'"
right = (
[snakemake.input.right]
if isinstance(snakemake.input.right, str)
else snakemake.input.right
)
assert len(left) == len(
right
), "left input needs to contain the same number of files as the right input"
input_str_left = " " + " ".join(left)
input_str_right = " " + " ".join(right)
input_cmd = input_str_left + " " + input_str_right
else:
single = (
[snakemake.input.single]
if isinstance(snakemake.input.single, str)
else snakemake.input.single
)
input_cmd = " " + " ".join(single)
outdir = path.dirname(snakemake.output[0])
tmpdir = path.join(outdir, "tmp")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"plass assemble {input_cmd} {snakemake.output} {tmpdir} --threads {snakemake.threads} {snakemake.params.extra} {log}"
)
PRESEQ¶
For preseq, the following wrappers are available:
PRESEQ LC_EXTRAP¶
preseq
estimates the library complexity of existing sequencing data to then estimate the yield of future experiments based on their design. For usage information, please see preseq
’s command line help (this seems more up to date than the available documentation from 2014 ). For more information about preseq
, also see the source code.
Software dependencies¶
- preseq ==2.0.3
Example¶
This wrapper can be used in the following way:
rule preseq_lc_extrap_bam:
input:
"samples/{sample}.sorted.bam"
output:
"test_bam/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bam/{sample}.log"
wrapper:
"0.65.0/bio/preseq/lc_extrap"
rule preseq_lc_extrap_bed:
input:
"samples/{sample}.sorted.bed"
output:
"test_bed/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bed/{sample}.log"
wrapper:
"0.65.0/bio/preseq/lc_extrap"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
if "-bam" not in (snakemake.input[0]):
params = "-bam "
shell(
"(preseq lc_extrap {params} {snakemake.params} {snakemake.input[0]} -output {snakemake.output[0]}) {log}"
)
PRIMERCLIP¶
Primer trimming on sam file, https://github.com/swiftbiosciences/primerclip
Software dependencies¶
- samtools ==1.9
- primerclip ==0.3.8
Example¶
This wrapper can be used in the following way:
rule primerclip:
input:
0.65.0_file="0.65.0_file",
alignment_file="mapped/{sample}.bam"
output:
alignment_file="mapped/{sample}.trimmed.bam"
log:
"logs/primerclip/{sample}.log"
params:
extra=""
wrapper:
"0.65.0/bio/primerclip"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
master_file = snakemake.input.master_file
in_alignment_file = snakemake.input.alignment_file
out_alignment_file = snakemake.output.alignment_file
# Check inputs/arguments.
if not isinstance(master_file, str):
raise ValueError("master_file, path to the master file")
if not isinstance(in_alignment_file, str):
raise ValueError("in_alignment_file, path to the input alignment file")
if not isinstance(out_alignment_file, str):
raise ValueError("out_alignment_file, path to the output file")
samtools_input_command = "samtools view -h " + in_alignment_file
samtools_output_command = " | head -n -3 | samtools view -Sh"
if out_alignment_file.endswith(".cram"):
samtools_output_command += "C -o " + out_alignment_file
elif out_alignment_file.endswith(".sam"):
samtools_output_command += " -o " + out_alignment_file
else:
samtools_output_command += "b -o " + out_alignment_file
shell(
"{samtools_input_command} |"
" primerclip"
" {master_file}"
" /dev/stdin"
" /dev/stdout"
" {samtools_output_command}"
" {log}"
)
PROSOLO¶
For prosolo, the following wrappers are available:
PROSOLO FDR CONTROL¶
ProSolo can control the false discovery rate of any combination of its defined single cell events (like the presence of an alternative allele or the dropout of an allele).
Software dependencies¶
- prosolo ==0.6.1
Example¶
This wrapper can be used in the following way:
rule prosolo_fdr_control:
input:
"variant_calling/{sc}.{bulk}.prosolo.bcf"
output:
"fdr_control/{sc}.{bulk}.prosolo.fdr.bcf"
threads:
1
params:
# comma-separated set of events for whose (joint)
# false discovery rate you want to control
events = "ADO_TO_REF,HET",
# false discovery rate to control for
fdr = 0.05
log:
"logs/prosolo_{sc}_{bulk}.fdr.log"
wrapper:
"0.65.0/bio/prosolo/control-fdr"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Lähnemann
Code¶
"""Snakemake wrapper for ProSolo FDR control"""
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"( prosolo control-fdr"
" {snakemake.input}"
" --events {snakemake.params.events}"
" --var SNV"
" --fdr {snakemake.params.fdr}"
" --output {snakemake.output} )"
"{log} "
)
PROSOLO¶
ProSolo calls variants or other events (like allele dropout) in a single cell sample against a bulk background sample. The single cell should stem from the same population of cells as the bulk background sample. The single cell sample should be amplified using multiple displacement amplification to match ProSolo’s statistical model.
Software dependencies¶
- prosolo ==0.6.1
Example¶
This wrapper can be used in the following way:
rule prosolo_calling:
input:
single_cell = "data/mapped/{sc}.sorted.bam",
single_cell_index = "data/mapped/{sc}.sorted.bam.bai",
bulk = "data/mapped/{bulk}.sorted.bam",
bulk_index = "data/mapped/{bulk}.sorted.bam.bai",
ref = "data/genome.fa",
ref_idx = "data/genome.fa.fai",
candidates = "data/{sc}.{bulk}.prosolo_candidates.bcf",
output:
"variant_calling/{sc}.{bulk}.prosolo.bcf"
params:
extra = ""
threads:
1
log:
"logs/prosolo_{sc}_{bulk}.log"
wrapper:
"0.65.0/bio/prosolo/single-cell-bulk"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Lähnemann
Code¶
"""Snakemake wrapper for ProSolo single-cell-bulk calling"""
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"( prosolo single-cell-bulk "
"--omit-indels "
" {snakemake.params.extra} "
"--candidates {snakemake.input.candidates} "
"--output {snakemake.output} "
"{snakemake.input.single_cell} "
"{snakemake.input.bulk} "
"{snakemake.input.ref} ) "
"{log} "
)
PTRIMMER¶
Tool to trim off the primer sequence from mutiplex amplicon sequencing
Software dependencies¶
- ptrimmer ==1.3.3
Example¶
This wrapper can be used in the following way:
rule ptrimmer_pe:
input:
r1="resources/a.lane1_R1.fastq.gz",
r2="resources/a.lane1_R2.fastq.gz",
primers="resources/primers.txt"
output:
r1="results/a.lane1_R1.fq.gz",
r2="results/a.lane1_R2.fq.gz"
log:
"logs/ptrimmer/a.lane.log"
wrapper:
"0.65.0/bio/ptrimmer"
rule ptrimmer_se:
input:
r1="resources/a.lane1_R1.fastq.gz",
primers="resources/primers.txt"
output:
r1="results/a.lane1_R1.fq",
log:
"logs/ptrimmer/a.lane1.log"
wrapper:
"0.65.0/bio/ptrimmer"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Felix Mölder
Code¶
__author__ = "Felix Mölder"
__copyright__ = "Copyright 2020, Felix Mölder"
__email__ = "felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
import ntpath
input_reads = "-f {r1}".format(r1=snakemake.input.r1)
out_r1 = ntpath.basename(snakemake.output.r1)
output_reads = "-d {o1}".format(o1=out_r1)
if snakemake.input.get("r2", ""):
seqmode = "pair"
input_reads = "{reads} -r {r2}".format(reads=input_reads, r2=snakemake.input.r2)
out_r2 = ntpath.basename(snakemake.output.r2)
output_reads = "{reads} -e {o2}".format(reads=output_reads, o2=out_r2)
else:
seqmode = "single"
primers = snakemake.input.primers
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
ptrimmer_params = "-t {mode} {in_reads} -a {primers} {out_reads}".format(
mode=seqmode, in_reads=input_reads, primers=primers, out_reads=output_reads
)
process_r1 = "mv {out_read} {final_output_path}".format(
out_read=out_r1, final_output_path=snakemake.output.r1
)
process_r2 = ""
if snakemake.input.get("r2", ""):
process_r2 = "&& mv {out_read} {final_output_path}".format(
out_read=out_r2, final_output_path=snakemake.output.r2
)
shell("(ptrimmer {ptrimmer_params} && {process_r1} {process_r2}) {log}")
PYFASTAQ¶
For pyfastaq, the following wrappers are available:
PYFASTAQ REPLACE_BASES¶
Replaces all occurrences of one letter with another.
Software dependencies¶
- pyfastaq ==3.17.0
Example¶
This wrapper can be used in the following way:
rule replace_bases:
input:
"{sample}.rna.fa"
output:
"{sample}.dna.fa",
params:
old_base = "U",
new_base = "T",
log:
"logs/fastaq/replace_bases/test/{sample}.log"
wrapper:
"0.65.0/bio/pyfastaq/replace_bases"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"fastaq replace_bases"
" {snakemake.input[0]}"
" {snakemake.output[0]}"
" {snakemake.params.old_base}"
" {snakemake.params.new_base}"
" {log}"
)
RAZERS3¶
Mapping (short) reads against a reference sequence. Can have multiple output formats, please see https://github.com/seqan/seqan/tree/master/apps/razers3
Software dependencies¶
- razers3 ==3.5.8
Example¶
This wrapper can be used in the following way:
rule razers3:
input:
# list of input reads
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
# output format is automatically inferred from file extension. Can be bam/sam or other formats.
"mapped/{sample}.bam"
log:
"logs/razers3/{sample}.log"
params:
# the reference genome
genome="genome.fasta",
# additional parameters
extra=""
threads: 8
wrapper:
"0.65.0/bio/razers3"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(razers3"
" -tc {snakemake.threads}"
" {extra}"
" -o {snakemake.output[0]}"
" {snakemake.params.genome}"
" {snakemake.input.reads})"
" {log}"
)
REFERENCE¶
For reference, the following wrappers are available:
ENSEMBL-ANNOTATION¶
Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.
Software dependencies¶
- curl
Example¶
This wrapper can be used in the following way:
rule get_annotation:
output:
"refs/annotation.gtf"
params:
species="homo_sapiens",
release="87",
build="GRCh37",
fmt="gtf",
flavor="" # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
log:
"logs/get_annotation.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.65.0/bio/reference/ensembl-annotation"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess
import sys
from snakemake.shell import shell
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
fmt = snakemake.params.fmt
build = snakemake.params.build
flavor = snakemake.params.get("flavor", "")
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
if flavor:
flavor += "."
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
suffix = ""
if fmt == "gtf":
suffix = "gtf.gz"
elif fmt == "gff3":
suffix = "gff3.gz"
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{fmt}/{species}/{species_cap}.{build}.{release}.{flavor}{suffix}".format(
release=release,
build=build,
species=species,
fmt=fmt,
species_cap=species.capitalize(),
suffix=suffix,
flavor=flavor,
branch=branch,
)
try:
shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
if snakemake.log:
sys.stderr = open(snakemake.log[0], "a")
print(
"Unable to download annotation data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided?",
file=sys.stderr,
)
exit(1)
ENSEMBL-SEQUENCE¶
Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.
Software dependencies¶
- curl
Example¶
This wrapper can be used in the following way:
rule get_genome:
output:
"refs/genome.fasta"
params:
species="saccharomyces_cerevisiae",
datatype="dna",
build="R64-1-1",
release="98"
log:
"logs/get_genome.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.65.0/bio/reference/ensembl-sequence"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess as sp
import sys
from itertools import product
from snakemake.shell import shell
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
build=build, release=release
)
suffixes = ""
datatype = snakemake.params.get("datatype", "")
if datatype == "dna":
suffixes = ["dna.primary_assembly.fa.gz", "dna.toplevel.fa.gz"]
elif datatype == "cdna":
suffixes = ["cdna.all.fa.gz"]
elif datatype == "cds":
suffixes = ["cds.all.fa.gz"]
elif datatype == "ncrna":
suffixes = ["ncrna.fa.gz"]
elif datatype == "pep":
suffixes = ["pep.all.fa.gz"]
else:
raise ValueError("invalid datatype, must be one of dna, cdna, cds, ncrna, pep")
success = False
for suffix in suffixes:
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species_cap}.{spec}.{suffix}".format(
release=release,
species=species,
datatype=datatype,
spec=spec.format(build=build, release=release),
suffix=suffix,
species_cap=species.capitalize(),
branch=branch,
)
try:
shell("curl -sSf {url} > /dev/null 2> /dev/null")
except sp.CalledProcessError:
continue
shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
success = True
break
if not success:
print(
"Unable to download requested sequence data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided?",
file=sys.stderr,
)
exit(1)
ENSEMBL-VARIATION¶
Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.
Software dependencies¶
- bcftools =1.10
- curl
Example¶
This wrapper can be used in the following way:
rule get_variation:
output:
vcf="refs/variation.vcf.gz"
# Optional: add fai to get VCF with annotated contig lengths (as required by GATK)
# and properly sorted VCFs.
# fai="refs/genome.fasta.fai"
params:
species="saccharomyces_cerevisiae",
release="98", # releases <98 are unsupported
build="R64-1-1",
type="all" # one of "all", "somatic", "structural_variation"
log:
"logs/get_variation.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.65.0/bio/reference/ensembl-variation"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import tempfile
import subprocess
import sys
import os
from snakemake.shell import shell
from snakemake.exceptions import WorkflowError
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
type = snakemake.params.type
if release < 98:
print("Ensembl releases <98 are unsupported.", file=open(snakemake.log[0], "w"))
exit(1)
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if type == "all":
if species == "homo_sapiens" and release >= 93:
suffixes = [
"-chr{}".format(chrom) for chrom in list(range(1, 23)) + ["X", "Y", "MT"]
]
else:
suffixes = [""]
elif type == "somatic":
suffixes = ["_somatic"]
elif type == "structural_variations":
suffixes = ["_structural_variations"]
else:
raise ValueError(
"Unsupported type {} (only all, somatic, structural_variations are allowed)".format(
type
)
)
species_filename = species if release >= 91 else species.capitalize()
urls = [
"ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.{ext}".format(
release=release,
species=species,
suffix=suffix,
species_filename=species_filename,
branch=branch,
ext=ext,
)
for suffix in suffixes
for ext in ["vcf.gz", "vcf.gz.csi"]
]
names = [os.path.basename(url) for url in urls if url.endswith(".gz")]
try:
gather = "curl {urls}".format(urls=" ".join(map("-O {}".format, urls)))
workdir = os.getcwd()
with tempfile.TemporaryDirectory() as tmpdir:
if snakemake.input.get("fai"):
shell(
"(cd {tmpdir}; {gather} && "
"bcftools concat -Oz --naive {names} > concat.vcf.gz && "
"bcftools reheader --fai {workdir}/{snakemake.input.fai} concat.vcf.gz "
"> {workdir}/{snakemake.output}) {log}"
)
else:
shell(
"(cd {tmpdir}; {gather} && "
"bcftools concat -Oz --naive {names} "
"> {workdir}/{snakemake.output}) {log}"
)
except subprocess.CalledProcessError as e:
if snakemake.log:
sys.stderr = open(snakemake.log[0], "a")
print(
"Unable to download variation data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided? ",
file=sys.stderr,
)
exit(1)
REFGENIE¶
Deploy biomedical reference datasets via refgenie.
Software dependencies¶
- refgenie =0.9.2
- refgenconf =0.9.0
Example¶
This wrapper can be used in the following way:
rule obtain_asset:
output:
# the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
fai="refs/genome.fasta"
# Multiple outputs/seek keys are possible here.
params:
genome="human_alu",
asset="fasta",
tag="default"
wrapper:
"0.65.0/bio/refgenie"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
import refgenconf
genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag
conf_path = os.environ["REFGENIE"]
rgc = refgenconf.RefGenConf(conf_path, writable=True)
# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)
for seek_key, out in snakemake.output.items():
path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
os.symlink(path, out)
RUBIC¶
RUBIC detects recurrent copy number alterations using copy number breaks.
Software dependencies¶
- r-base =3.4.1
- r-rubic =1.0.3
- r-data.table =1.10.4
- r-pracma =2.0.4
- r-ggplot2 =2.2.1
- r-gtable =0.2.0
- r-codetools =0.2_15
- r-digest =0.6.12
Example¶
This wrapper can be used in the following way:
rule rubic:
input:
seg="{samples}/segments.txt",
markers="{samples}/markers.txt"
output:
out_gains="{samples}/gains.txt",
out_losses="{samples}/losses.txt",
out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
params:
fdr="",
genefile=""
wrapper:
"0.65.0/bio/rubic"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Beatrice F. Tan
Code¶
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"
library(RUBIC)
all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]
rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])
SALMON¶
For salmon, the following wrappers are available:
SALMON_INDEX¶
Index a transcriptome assembly with salmon
Software dependencies¶
- salmon ==0.14.1
Example¶
This wrapper can be used in the following way:
rule salmon_index:
input:
"assembly/transcriptome.fasta"
output:
directory("salmon/transcriptome_index")
log:
"logs/salmon/transcriptome_index.log"
threads: 2
params:
# optional parameters
extra=""
wrapper:
"0.65.0/bio/salmon/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Index."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"salmon index -t {snakemake.input} -i {snakemake.output} "
" --threads {snakemake.threads} {extra} {log}"
)
SALMON_QUANT¶
Quantify transcripts with salmon
Software dependencies¶
- salmon ==0.14.1
Example¶
This wrapper can be used in the following way:
rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1 = "reads/{sample}_1.fq.gz",
r2 = "reads/{sample}_2.fq.gz",
index = "salmon/transcriptome_index"
output:
quant = 'salmon/{sample}/quant.sf',
lib = 'salmon/{sample}/lib_format_counts.json'
log:
'logs/salmon/{sample}.log'
params:
# optional parameters
libtype ="A",
#zip_ext = bz2 # req'd for bz2 files ('bz2'); optional for gz files('gz')
extra=""
threads: 2
wrapper:
"0.65.0/bio/salmon/quant"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Salmon Quant"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
def manual_decompression(reads, zip_ext):
""" Allow *.bz2 input into salmon. Also provide same
decompression for *gz files, as salmon devs mention
it may be faster in some cases."""
if zip_ext and reads:
if zip_ext == "bz2":
reads = " < (bunzip2 -c " + reads + ")"
elif zip_ext == "gz":
reads = " < (gunzip -c " + reads + ")"
return reads
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
zip_extension = snakemake.params.get("zip_extension", "")
libtype = snakemake.params.get("libtype", "A")
r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")
assert (
r1 is not None and r2 is not None
) or r is not None, "either r1 and r2 (paired), or r (unpaired) are required as input"
if r1:
r1 = (
[snakemake.input.r1]
if isinstance(snakemake.input.r1, str)
else snakemake.input.r1
)
r2 = (
[snakemake.input.r2]
if isinstance(snakemake.input.r2, str)
else snakemake.input.r2
)
assert len(r1) == len(r2), "input-> equal number of files required for r1 and r2"
r1_cmd = " -1 " + manual_decompression(" ".join(r1), zip_extension)
r2_cmd = " -2 " + manual_decompression(" ".join(r2), zip_extension)
read_cmd = " ".join([r1_cmd, r2_cmd])
if r:
assert (
r1 is None and r2 is None
), "Salmon cannot quantify mixed paired/unpaired input files. Please input either r1,r2 (paired) or r (unpaired)"
r = [snakemake.input.r] if isinstance(snakemake.input.r, str) else snakemake.input.r
read_cmd = " -r " + manual_decompression(" ".join(r), zip_extension)
outdir = path.dirname(snakemake.output.get("quant"))
shell(
"salmon quant -i {snakemake.input.index} "
" -l {libtype} {read_cmd} -o {outdir} "
" -p {snakemake.threads} {extra} {log} "
)
SAMBAMBA¶
For sambamba, the following wrappers are available:
SAMBAMBA SORT¶
Sort bam file with sambamba
Software dependencies¶
- sambamba ==0.6.6
Example¶
This wrapper can be used in the following way:
rule sambamba_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"" # optional parameters
threads: 8
wrapper:
"0.65.0/bio/sambamba/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
shell(
"sambamba sort {snakemake.params} -t {snakemake.threads} "
"-o {snakemake.output[0]} {snakemake.input[0]}"
)
SAMTOOLS¶
For samtools, the following wrappers are available:
SAMTOOLS BAM2FQ INTERLEAVED¶
Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_bam2fq_interleaved:
input:
"mapped/{sample}.bam"
output:
"reads/{sample}.fq"
params:
" "
threads: 3
wrapper:
"0.65.0/bio/samtools/bam2fq/interleaved"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
shell(
"samtools bam2fq {snakemake.params} "
" -@ {snakemake.threads} "
" {snakemake.input[0]}"
" >{snakemake.output[0]} "
)
SAMTOOLS BAM2FQ SEPARATE¶
Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools bam2fq documentation), as are secondary (0x100) and supplementary reads (0x800).
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_bam2fq_separate:
input:
"mapped/{sample}.bam"
output:
"reads/{sample}.1.fq",
"reads/{sample}.2.fq"
params:
sort = "-m 4G",
bam2fq = "-n"
threads: # Remember, this is the number of samtools' additional threads
3 # At least 2 threads have to be requested on cluster sumbission.
# Thus, this value - 2 will be sent to samtools sort -@ argument.
wrapper:
"0.65.0/bio/samtools/bam2fq/separate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools bam2fq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = "" if snakemake.threads <= 2 else " -@ {} ".format(snakemake.threads - 2)
shell(
"samtools sort -n "
" {threads} "
" -T {prefix} "
" {snakemake.params.sort} "
" {snakemake.input[0]} | "
"samtools bam2fq "
" {snakemake.params.bam2fq} "
" -1 {snakemake.output[0]} "
" -2 {snakemake.output[1]} "
" -0 /dev/null "
" -s /dev/null "
" -F 0x900 "
" - "
)
SAMTOOLS DEPTH¶
Compute the read depth at each position or region using samtools.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_depth:
input:
bams=["mapped/A.bam", "mapped/B.bam"],
bed="regionToCalcDepth.bed", # optional
output:
"depth.txt"
params:
# optional bed file passed to -b
extra="" # optional additional parameters as string
wrapper:
"0.65.0/bio/samtools/depth"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Dayne Filer
Code¶
"""Snakemake wrapper for running samtools depth."""
__author__ = "Dayne L Filer"
__copyright__ = "Copyright 2020, Dayne L Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
params = snakemake.params.get("extra", "")
# check for optional bed file
bed = snakemake.input.get("bed", "")
if bed:
bed = "-b " + bed
shell(
"samtools depth {params} {bed} " "-o {snakemake.output[0]} {snakemake.input.bams}"
)
SAMTOOLS FAIDX¶
index reference sequence in FASTA format from reference sequence
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input:
"{sample}.fa"
output:
"{sample}.fa.fai"
params:
"" # optional params string
wrapper:
"0.65.0/bio/samtools/faidx"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Chambers
Code¶
__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "greenkidneybean@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools faidx {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS FIXMATE¶
Use samtools to correct mate information after BWA mapping.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_fixmate:
input:
"mapped/{input}"
output:
"fixed/{input}"
message:
"Fixing mate information in {wildcards.input}"
threads:
1
params:
extra = ""
wrapper:
"0.65.0/bio/samtools/fixmate/"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for samtools fixmate"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Samtools' threads parameter lists ADDITIONAL threads.
# that is why threads - 1 has to be given to the -@ parameter
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)
makedirs(op.dirname(snakemake.output[0]))
shell(
"samtools fixmate {extra} {threads}" " {snakemake.input[0]} {snakemake.output[0]}"
)
SAMTOOLS FLAGSTAT¶
Use samtools to create a flagstat file from a bam or sam file.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_flagstat:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.bam.flagstat"
wrapper:
"0.65.0/bio/samtools/flagstat"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Preusch
Code¶
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools flagstat {snakemake.input[0]} > {snakemake.output[0]}")
SAMTOOLS IDXSTATS¶
Use samtools to retrieve and print stats form indexed bam, sam or cram files
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_idxstats:
input:
bam="mapped/{sample}.bam",
idx="mapped/{sample}.bam.bai"
output:
"mapped/{sample}.bam.idxstats"
log:
"logs/samtools/idxstats/{sample}.log"
wrapper:
"0.65.0/bio/samtools/idxstats"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("samtools idxstats {snakemake.input.bam} > {snakemake.output[0]} {log}")
SAMTOOLS INDEX¶
Index bam file with samtools.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input:
"mapped/{sample}.sorted.bam"
output:
"mapped/{sample}.sorted.bam.bai"
params:
"" # optional params string
wrapper:
"0.65.0/bio/samtools/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}")
SAMTOOLS MERGE¶
Merge two bam files with samtools.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_merge:
input:
["mapped/A.bam", "mapped/B.bam"]
output:
"merged.bam"
params:
"" # optional additional parameters as string
threads: # Samtools takes additional threads through its option -@
8 # This value - 1 will be sent to -@
wrapper:
"0.65.0/bio/samtools/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)
shell(
"samtools merge {threads} {snakemake.params} "
"{snakemake.output[0]} {snakemake.input}"
)
SAMTOOLS MPILEUP¶
Generate pileup using samtools.
Software dependencies¶
- samtools ==1.10
- pigz ==2.3.4
Example¶
This wrapper can be used in the following way:
rule mpilup:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
reference_genome="genome.fasta"
output:
"mpileup/{sample}.mpileup.gz"
log:
"logs/samtools/mpileup/{sample}.log"
params:
extra="-d 10000", # optional
wrapper:
"0.65.0/bio/samtools/mpileup"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for running mpileup."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
bam_input = snakemake.input.bam
reference_genome = snakemake.input.reference_genome
extra = snakemake.params.get("extra", "")
if not snakemake.output[0].endswith(".gz"):
raise Exception(
'output file will be compressed and therefore filename should end with ".gz"'
)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"samtools mpileup "
"{extra} "
"-f {reference_genome} "
"{bam_input} "
" | pigz > {snakemake.output} "
"{log}"
)
SAMTOOLS SORT¶
Sort bam file with samtools.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: # Samtools takes additional threads through its option -@
8 # This value - 1 will be sent to -@.
wrapper:
"0.65.0/bio/samtools/sort"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Samtools -@/–threads takes one integer as input. This is the number of additional threads and not raw threads.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
prefix = os.path.splitext(snakemake.output[0])[0]
# Samtools takes additional threads through its option -@
# One thread for samtools
# Other threads are *additional* threads passed to the argument -@
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)
shell(
"samtools sort {snakemake.params} {threads} -o {snakemake.output[0]} "
"-T {prefix} {snakemake.input[0]}"
)
SAMTOOLS STATS¶
Generate stats using samtools.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_stats:
input:
"mapped/{sample}.bam"
output:
"samtools_stats/{sample}.txt"
params:
extra="", # Optional: extra arguments.
region="xx:1000000-2000000" # Optional: region string.
log:
"logs/samtools_stats/{sample}.log"
wrapper:
"0.65.0/bio/samtools/stats"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("samtools stats {extra} {snakemake.input} {region} > {snakemake.output} {log}")
SAMTOOLS VIEW¶
Convert or filter SAM/BAM.
Software dependencies¶
- samtools ==1.10
Example¶
This wrapper can be used in the following way:
rule samtools_view:
input:
"{sample}.sam"
output:
"{sample}.bam"
params:
"-b" # optional params string
wrapper:
"0.65.0/bio/samtools/view"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
shell("samtools view {snakemake.params} {snakemake.input[0]} > {snakemake.output[0]}")
SEQTK¶
For seqtk, the following wrappers are available:
SEQTK-SUBSAMPLE-PE¶
Subsample reads from paired FASTQ files
Software dependencies¶
- seqtk ==1.3
Example¶
This wrapper can be used in the following way:
rule seqtk_subsample_pe:
input:
f1="{sample}.1.fastq.gz",
f2="{sample}.2.fastq.gz"
output:
f1="{sample}.1.subsampled.fastq.gz",
f2="{sample}.2.subsampled.fastq.gz"
params:
n=3,
seed=12345
log:
"logs/seqtk_subsample/{sample}.log"
threads:
1
wrapper:
"0.65.0/bio/seqtk/subsample/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Fabian Kilpert
Code¶
"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"( "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f1} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f1} "
"&& "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f2} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f2} "
") {log} "
)
SEQTK-SUBSAMPLE-SE¶
Subsample reads from FASTQ file
Software dependencies¶
- seqtk ==1.3
Example¶
This wrapper can be used in the following way:
rule seqtk_subsample_se:
input:
"{sample}.fastq.gz"
output:
"{sample}.subsampled.fastq.gz"
params:
n=3,
seed=12345
log:
"logs/seqtk_subsample/{sample}.log"
threads:
1
wrapper:
"0.65.0/bio/seqtk/subsample/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Fabian Kilpert
Code¶
"""Snakemake wrapper for subsampling reads from FASTQ file using seqtk."""
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"( "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output} "
") {log} "
)
SHOVILL¶
Assemble bacterial isolate genomes from Illumina paired-end reads.
Software dependencies¶
- shovill ==1.1.0
Example¶
This wrapper can be used in the following way:
rule shovill:
input:
r1="reads/{sample}_R1.fq.gz",
r2="reads/{sample}_R2.fq.gz"
output:
raw_assembly="assembly/{sample}.{assembler}.assembly.fa",
contigs="assembly/{sample}.{assembler}.contigs.fa"
params:
extra=""
log:
"logs/shovill/{sample}.{assembler}.log"
threads: 1
wrapper:
"0.65.0/bio/shovill"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Sangram Keshari Sahu
Code¶
"""Snakemake wrapper for shovill."""
__author__ = "Sangram Keshari Sahu"
__copyright__ = "Copyright 2020, Sangram Keshari Sahu"
__email__ = "sangramsahu15@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
# Placeholder for optional parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
params = snakemake.params.get("extra", "")
with TemporaryDirectory() as tempdir:
shell(
"(shovill"
" --assembler {snakemake.wildcards.assembler}"
" --outdir {tempdir} --force"
" --R1 {snakemake.input.r1}"
" --R2 {snakemake.input.r2}"
" --cpus {snakemake.threads}"
" {params}) {log}"
)
shell(
"mv {tempdir}/{snakemake.wildcards.assembler}.fasta {snakemake.output.raw_assembly}"
" && mv {tempdir}/contigs.fa {snakemake.output.contigs}"
)
SICKLE¶
For sickle, the following wrappers are available:
SICKLE PE¶
Trim paired-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
r1="input_R1.fq",
r2="input_R2.fq"
output:
r1="output_R1.fq",
r2="output_R2.fq",
rs="output_single.fq",
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
# optional log file
"logs/sickle/job.log"
wrapper:
"0.65.0/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
"-o {snakemake.output.r1} -p {snakemake.output.r2} "
"-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
"{extra}) {log}"
)
SICKLE SE¶
Trim single-end reads with sickle.
Software dependencies¶
- sickle-trim ==1.33
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
"input_R1.fq"
output:
"output_R1.fq"
params:
qual_type="sanger",
# optional extra parameters
extra=""
log:
"logs/sickle/job.log"
wrapper:
"0.65.0/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
"-t {snakemake.params.qual_type} {extra}) {log}"
)
SNP-MUTATOR¶
Generate mutated sequence files from a reference genome.
Software dependencies¶
- snp-mutator ==1.2.0
Example¶
This wrapper can be used in the following way:
NUM_SIMULATIONS = 2
rule snpmutator:
input:
"{sample}.fa"
output:
vcf = "{sample}.mutated.vcf",
sequences = expand(
"{{sample}}_mutated_{simulation_number}.fasta",
simulation_number=range(1, NUM_SIMULATIONS + 1)
)
params:
num_simulations = NUM_SIMULATIONS,
extra = " ".join([
"--num-substitutions 2",
"--num-insertions 2",
"--num-deletions 0"
]),
log:
"logs/snp-mutator/test/{sample}.log"
wrapper:
"0.65.0/bio/snp-mutator"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for SNP Mutator."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
num_simulations = snakemake.params.get("num_simulations", 100)
fasta_outdir = Path(snakemake.output.sequences[0]).absolute().parent
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"snpmutator {extra} "
"--num-simulations {num_simulations} "
"--vcf {snakemake.output.vcf} "
"-F {fasta_outdir} "
"{snakemake.input} {log} "
)
SNPEFF¶
For snpeff, the following wrappers are available:
SNPEFF¶
Annotate predicted effect of nucleotide changes with SnpEff
Software dependencies¶
- snpeff ==4.3.1t
- bcftools =1.10
Example¶
This wrapper can be used in the following way:
rule snpeff:
input:
calls="{sample}.vcf", # (vcf, bcf, or vcf.gz)
db="resources/snpeff/ebola_zaire" # path to reference db downloaded with the snpeff download wrapper
output:
calls="snpeff/{sample}.vcf", # annotated calls (vcf, bcf, or vcf.gz)
stats="snpeff/{sample}.html", # summary statistics (in HTML), optional
csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
log:
"logs/snpeff/{sample}.log"
params:
extra="-Xmx4g" # optional parameters (e.g., max memory 4g)
wrapper:
"0.65.0/bio/snpeff/annotate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
from os import path
import shutil
import tempfile
from pathlib import Path
outcalls = snakemake.output.calls
if outcalls.endswith(".vcf.gz"):
outprefix = "| bcftools view -Oz"
elif outcalls.endswith(".bcf"):
outprefix = "| bcftools view -Ob"
else:
outprefix = ""
incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
incalls = "< <(bcftools view {})".format(incalls)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
data_dir = Path(snakemake.input.db).parent.resolve()
stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = "" if not csvstats else "-csvStats {}".format(csvstats)
stats_opt = "-noStats" if not stats else "-stats {}".format(stats)
reference = path.basename(snakemake.input.db)
shell(
"snpEff -dataDir {data_dir} {stats_opt} {csvstats_opt} {extra} "
"{reference} {incalls} "
"{outprefix} > {outcalls} {log}"
)
SNPEFF DOWNLOAD¶
Download snpeff DB for a given species.
Software dependencies¶
- snpeff ==4.3.1t
- bcftools =1.10
Example¶
This wrapper can be used in the following way:
rule snpeff_download:
output:
# wildcard {reference} may be anything listed in `snpeff databases`
directory("resources/snpeff/{reference}")
log:
"logs/snpeff/download/{reference}.log"
params:
reference="{reference}"
wrapper:
"0.65.0/bio/snpeff/download"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
reference = snakemake.params.reference
outdir = Path(snakemake.output[0]).parent.resolve()
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("snpEff download -dataDir {outdir} {reference} {log}")
SNPSIFT¶
For snpsift, the following wrappers are available:
SNPSIFT ANNOTATE¶
Annotate using fields from another VCF file.
Software dependencies¶
- snpsift ==4.3.1t
- bcftools ==1.10.2
- pbgzip ==2016.08.04
Example¶
This wrapper can be used in the following way:
rule test_snpsift_annotate:
input:
call="in.vcf",
database="annotation.vcf"
output:
call="annotated/out.vcf"
log:
"annotate.log"
wrapper:
"0.65.0/bio/snpsift/annotate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift annotate"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1
incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
min_threads += 1
incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
min_threads += 1
incall = "< <(gunzip -c {})".format(incall)
outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
min_threads += 1
outcall = "| gzip -c > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
min_threads += 1
outcall = "| bcftools view > {}".format(outcall)
else:
outcall = "> {}".format(outcall)
if snakemake.threads < min_threads:
raise ValueError(
"At least {} threads required, {} provided".format(
min_threads, snakemake.threads
)
)
shell(
"SnpSift annotate" # Tool and its subcommand
" {extra}" # Extra parameters
" {snakemake.input.database}" # Path to annotation vcf file
" {incall} " # Path to input vcf file
" {outcall} " # Path to output vcf file
" {log}" # Logging behaviour
)
SNPSIFT VARTYPE¶
Add an INFO field denoting variant type.
Software dependencies¶
- snpsift =4.3.1t
Example¶
This wrapper can be used in the following way:
rule test_snpsift_vartype:
input:
vcf="in.vcf"
output:
vcf="annotated/out.vcf"
message:
"Testing SnpSift varType"
log:
"varType.log"
wrapper:
"0.65.0/bio/snpsift/varType"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift varType"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"SnpSift varType" # Tool and its subcommand
" {extra}" # Extra parameters
" {snakemake.input.vcf}" # Path to input vcf file
" > {snakemake.output.vcf}" # Path to output vcf file
" {log}" # Logging behaviour
)
SOURMASH¶
For sourmash, the following wrappers are available:
SOURMASH_COMPUTE¶
Build a MinHash signature for a transcriptome, genome, or reads
Software dependencies¶
- sourmash==2.0.0a7
Example¶
This wrapper can be used in the following way:
rule sourmash_reads:
input:
"reads/a.fastq"
output:
"reads.sig"
log:
"logs/sourmash/sourmash_compute_reads.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.65.0/bio/sourmash/compute"
rule sourmash_transcriptome:
input:
"assembly/transcriptome.fasta"
output:
"transcriptome.sig"
log:
"logs/sourmash/sourmash_compute_transcriptome.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"0.65.0/bio/sourmash/compute"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Lisa K. Johnson
Code¶
"""Snakemake wrapper for sourmash compute."""
__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled", "1000")
k = snakemake.params.get("k", "31")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
" {extra} {log}"
)
SRA-TOOLS¶
For sra-tools, the following wrappers are available:
SRA-TOOLS FASTERQ-DUMP¶
Download FASTQ files from SRA.
Software dependencies¶
- sra-tools >2.9.1
Example¶
This wrapper can be used in the following way:
rule get_fastq_pe:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/{accession}_1.fastq",
"data/{accession}_2.fastq"
params:
# optional extra arguments
extra=""
threads: 6 # defaults to 6
wrapper:
"0.65.0/bio/sra-tools/fasterq-dump"
rule get_fastq_se:
output:
"data/{accession}.fastq"
params:
extra=""
threads: 6
wrapper:
"0.65.0/bio/sra-tools/fasterq-dump"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Derek Croote
Code¶
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
outdir = os.path.dirname(snakemake.output[0])
if outdir:
outdir = "--outdir {}".format(outdir)
extra = snakemake.params.get("extra", "")
with tempfile.TemporaryDirectory() as tmp:
shell(
"fasterq-dump --temp {tmp} --threads {snakemake.threads} "
"{extra} {outdir} {snakemake.wildcards.accession} {log}"
)
STAR¶
For star, the following wrappers are available:
STAR¶
Map reads with STAR.
Software dependencies¶
- star ==2.7.3a
Example¶
This wrapper can be used in the following way:
rule star_pe_multi:
input:
# use a list for multiple fastq files for one sample
# usually technical replicates across lanes/flowcells
fq1 = ["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
# paired end reads needs to be ordered so each item in the two lists match
fq2 = ["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"] #optional
output:
# see STAR manual for additional output files
"star/pe/{sample}/Aligned.out.sam"
log:
"logs/star/pe/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.65.0/bio/star/align"
rule star_se:
input:
fq1 = "reads/{sample}_R1.1.fastq"
output:
# see STAR manual for additional output files
"star/{sample}/Aligned.out.sam"
log:
"logs/star/{sample}.log"
params:
# path to STAR reference genome index
index="index",
# optional parameters
extra=""
threads: 8
wrapper:
"0.65.0/bio/star/align"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Tomás Di Domenico
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = (
[snakemake.input.fq1]
if isinstance(snakemake.input.fq1, str)
else snakemake.input.fq1
)
fq2 = snakemake.input.get("fq2")
if fq2:
fq2 = (
[snakemake.input.fq2]
if isinstance(snakemake.input.fq2, str)
else snakemake.input.fq2
)
assert len(fq1) == len(
fq2
), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])
if fq1[0].endswith(".gz"):
readcmd = "--readFilesCommand zcat"
else:
readcmd = ""
outprefix = os.path.dirname(snakemake.output[0]) + "/"
shell(
"STAR "
"{extra} "
"--runThreadN {snakemake.threads} "
"--genomeDir {snakemake.params.index} "
"--readFilesIn {input_str} "
"{readcmd} "
"--outFileNamePrefix {outprefix} "
"--outStd Log "
"{log}"
)
STAR INDEX¶
Index fasta sequences with STAR
Software dependencies¶
- star ==2.7.3a
Example¶
This wrapper can be used in the following way:
rule star_index:
input:
fasta = "{genome}.fasta"
output:
directory("{genome}")
message:
"Testing STAR index"
threads:
1
params:
extra = ""
log:
"logs/star_index_{genome}.log"
wrapper:
"0.65.0/bio/star/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
- Tomás Di Domenico
Code¶
"""Snakemake wrapper for STAR index"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
sjdb_overhang = snakemake.params.get("sjdbOverhang", "100")
gtf = snakemake.input.get("gtf")
if gtf is not None:
gtf = "--sjdbGTFfile " + gtf
sjdb_overhang = "--sjdbOverhang " + sjdb_overhang
else:
gtf = sjdb_overhang = ""
makedirs(snakemake.output)
shell(
"STAR " # Tool
"--runMode genomeGenerate " # Indexation mode
"{extra} " # Optional parameters
"--runThreadN {snakemake.threads} " # Number of threads
"--genomeDir {snakemake.output} " # Path to output
"--genomeFastaFiles {snakemake.input.fasta} " # Path to fasta files
"{sjdb_overhang} " # Read-len - 1
"{gtf} " # Highly recommended GTF
"{log}" # Logging
)
STRELKA¶
For strelka, the following wrappers are available:
STRELKA GERMLINE¶
Call germline variants with Strelka.
Software dependencies¶
- strelka ==2.9.10
Example¶
This wrapper can be used in the following way:
rule strelka_germline:
input:
# the required bam file
bam="mapped/{sample}.bam",
# path to reference genome fasta and index
fasta="genome.fasta",
fasta_index="genome.fasta.fai"
output:
# Strelka results - either use directory or complete file path
directory("strelka/{sample}")
log:
"logs/strelka/germline/{sample}.log"
params:
# optional parameters
config_extra="",
run_extra=""
threads: 8
wrapper:
"0.65.0/bio/strelka/germline"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"
import os
from pathlib import Path
from snakemake.shell import shell
config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
bam = snakemake.input.get("bam") # input bam file, required
assert bam is not None, "input-> bam is a required input parameter"
if snakemake.output[0].endswith(".vcf.gz"):
run_dir = Path(snakemake.output[0]).parents[2]
else:
run_dir = snakemake.output
shell(
"configureStrelkaGermlineWorkflow.py " # configure the strelka run
"--bam {bam} " # input bam
"--referenceFasta {snakemake.input.fasta} " # reference genome
"--runDir {run_dir} " # output directory
"{config_extra} " # additional parameters for the configuration
"&& {run_dir}/runWorkflow.py " # run the strelka workflow
"-m local " # run in local mode
"-j {snakemake.threads} " # number of threads
"{run_extra} " # additional parameters for the run
"{log}"
) # logging
STRELKA¶
Strelka calls somatic and germline small variants from mapped sequencing reads
Software dependencies¶
- strelka ==2.9.10
Example¶
This wrapper can be used in the following way:
rule strelka:
input:
# The normal bam and its index
# are optional input
# normal = "data/b.bam",
# normal_index = "data/b.bam.bai"
tumor = "data/{tumor}.bam",
tumor_index = "data/{tumor}.bam.bai",
fasta = "data/genome.fasta",
fasta_index = "data/genome.fasta.fai"
output:
# Strelka output - can be directory or full file path
directory("{tumor}_vcf")
threads:
1
params:
run_extra = "",
config_extra = ""
log:
"logs/strelka_{tumor}.log"
wrapper:
"0.65.0/bio/strelka/somatic"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Strelka"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
# If a normal bam is given in input,
# then it should be provided in the input
# block, so Snakemake will perform additional
# tests on file existance.
normal = (
"--normalBam {}".format(snakemake.input["normal"])
if "normal" in snakemake.input.keys()
else ""
)
if snakemake.output[0].endswith("vcf.gz"):
run_dir = Path(snakemake.output[0]).parents[2]
else:
run_dir = snakemake.output
shell(
"(configureStrelkaSomaticWorkflow.py " # Configuration script
"{normal} " # Path to normal bam (if any)
"--tumorBam {snakemake.input.tumor} " # Path to tumor bam
"--referenceFasta {snakemake.input.fasta} " # Path to fasta file
"--runDir {run_dir} " # Path to output directory
"{config_extra} " # Extra parametersfor configuration
" && "
"{run_dir}/runWorkflow.py " # Run the pipeline
"--mode local " # Stop internal job submission
"--jobs {snakemake.threads} " # Nomber of threads
"{run_extra}) " # Extra parameters for runWorkflow
"{log}" # Logging behaviour
)
STRLING¶
For strling, the following wrappers are available:
STRLING CALL¶
STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. call
calls genotypes/estimate allele sizes for all loci in each sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Software dependencies¶
- strling ==0.3
Example¶
This wrapper can be used in the following way:
rule strling_call:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
bin="extract/{sample}.bin",
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
bounds="merged/group-bounds.txt" # optional, produced by strling merge
output:
"call/{sample}-bounds.txt", # must end with -bounds.txt
"call/{sample}-genotype.txt", # must end with -genotype.txt
"call/{sample}-unplaced.txt" # must end with -unplaced.txt
params:
extra="" # optional extra command line arguments
log:
"log/strling/call/{sample}.log"
wrapper:
"0.65.0/bio/strling/call"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling call"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
bin = snakemake.input.get("bin", None)
reference = snakemake.input.get("reference", None)
bounds = snakemake.input.get("bounds", None)
if not bam or (isinstance(bam, list) and len(bam) != 1):
raise ValueError("Please provide exactly one 'bam' as input.")
if not path.exists(bam + ".bai"):
raise ValueError(
"Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
)
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not bounds: # optional
bounds_string = ""
else:
bounds_string = "-b {}".format(bounds)
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if not any(o.endswith("-bounds.txt") for o in snakemake.output):
raise ValueError("Please provide a file that ends with -bounds.txt in the output.")
for filename in snakemake.output:
if filename.endswith("-bounds.txt"):
prefix = filename[: -len("-bounds.txt")]
break
if not any(o == "{}-genotype.txt".format(prefix) for o in snakemake.output):
raise ValueError(
"Please provide an output file that ends with -genotype.txt and has the same prefix as -bounds.txt"
)
if not any(o == "{}-unplaced.txt".format(prefix) for o in snakemake.output):
raise ValueError(
"Please provide an output file that ends with -unplaced.txt and has the same prefix as -bounds.txt"
)
shell(
"(strling call "
"{bam} "
"{bin} "
"{bounds_string} "
"-o {prefix} "
"{extra}) {log}"
)
STRLING EXTRACT¶
STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. extract
retrieves informative read pairs to a binary format for a single sample (same as above, you can use the same bin files). Documentation at: https://strling.readthedocs.io/en/latest/run.html
Software dependencies¶
- strling ==0.3
Example¶
This wrapper can be used in the following way:
rule strling_extract:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
index="reference/genome.fasta.str" # optional
output:
"extract/{sample}.bin"
log:
"log/strling/extract/{sample}.log"
params:
extra="" # optionally add further command line arguments
wrapper:
"0.65.0/bio/strling/extract"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling extract"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
reference = snakemake.input.get("reference", None)
index = snakemake.input.get("index", None)
if not bam or (isinstance(bam, list) and len(bam) != 1):
raise ValueError("Please provide exactly one 'bam' input.")
if not path.exists(bam + ".bai"):
raise ValueError(
"Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
)
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if not index: # optional
index_string = ""
else:
index_string = "-g {}".format(index)
if len(snakemake.output) != 1:
raise ValueError("Please provide exactly one output file (.bin).")
shell(
"(strling extract "
"{bam} "
"{snakemake.output[0]} "
"-f {reference} "
"{index_string} "
"{extra}) {log}"
)
STRLING INDEX¶
STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. index
creates a bed file of large STR regions in the reference genome. This step is performed automatically as part of strling extract
. However, when running multiple samples, it is more efficient to do it once, then pass the file to strling extract using the -g
option. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Software dependencies¶
- strling ==0.3
Example¶
This wrapper can be used in the following way:
rule strling_index:
input:
"reference/genome.fasta"
output:
index="reference/genome.fasta.str",
fai="reference/genome.fasta.fai"
params:
extra="" # optionally add further command line arguments
log:
"log/strling/index.log"
wrapper:
"0.65.0/bio/strling/index"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling index"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
if len(snakemake.input) != 1:
raise ValueError("Please provide exactly one reference genome.")
shell(
"(strling index {snakemake.input[0]} "
"-g {snakemake.output.index} "
"{extra}) {log}"
)
STRLING MERGE¶
STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. merge
prepares joint calling of STR loci across all given samples. Requires minimum read evidence from at least one sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Software dependencies¶
- strling ==0.3
Example¶
This wrapper can be used in the following way:
rule strling_merge:
input:
bins=["extract/A.bin", "extract/B.bin"],
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
output:
"merged/group-bounds.txt" # must end with "-bounds.txt"
params:
extra="" # optionally add further command line arguments
log:
"log/strling/merge/group.log"
wrapper:
"0.65.0/bio/strling/merge"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling merge"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bins = snakemake.input.get("bins", None)
reference = snakemake.input.get("reference", None)
fai = snakemake.input.get("fai", None)
if not bins or len(bins) < 2:
raise ValueError("Please provide at least two 'bins' as input.")
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if len(snakemake.output) != 1:
raise ValueError("Please provide exactly one output file (.bin).")
if not snakemake.output[0].endswith("-bounds.txt"):
raise ValueError(
"Output file must end with '-bounds.txt'. Please change the output file name."
)
prefix = snakemake.output[0][: -len("-bounds.txt")]
shell("(strling merge " "{bins} " "-o {prefix} " "{extra}) {log}")
TABIX¶
Process given file with tabix (e.g., create index).
Software dependencies¶
- htslib ==1.10
Example¶
This wrapper can be used in the following way:
rule tabix:
input:
"{prefix}.vcf.gz"
output:
"{prefix}.vcf.gz.tbi"
params:
# pass arguments to tabix (e.g. index a vcf)
"-p vcf"
log:
"logs/tabix/{prefix}.log"
wrapper:
"0.65.0/bio/tabix"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("tabix {snakemake.params} {snakemake.input[0]} {log}")
TRANSDECODER¶
For transdecoder, the following wrappers are available:
TRANSDECODER LONGORFS¶
TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.
Software dependencies¶
- transdecoder=5.5.0
Example¶
This wrapper can be used in the following way:
rule transdecoder_longorfs:
input:
fasta="test.fa.gz", # required
gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
output:
"test.fa.transdecoder_dir/longest_orfs.pep"
log:
"logs/transdecoder/test-longorfs.log"
params:
extra=""
wrapper:
"0.65.0/bio/transdecoder/longorfs"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Transdecoder LongOrfs"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
gtm_cmd = " --gene_trans_map " + gtm
output_dir = path.dirname(str(snakemake.output))
# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")
TRANSDECODER PREDICT¶
Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.
Software dependencies¶
- transdecoder=5.5.0
Example¶
This wrapper can be used in the following way:
rule transdecoder_predict:
input:
fasta="test.fa.gz", # required input; optionally gzipped
pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
# you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
#longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
output:
"test.fa.transdecoder.bed",
"test.fa.transdecoder.cds",
"test.fa.transdecoder.pep",
"test.fa.transdecoder.gff3"
log:
"logs/transdecoder/test-predict.log"
params:
extra=""
wrapper:
"0.65.0/bio/transdecoder/predict"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Transdecoder Predict"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
addl_outputs += " --retain_pfam_hits " + pfam
blast = snakemake.input.get("blastp_hits", "")
if blast:
addl_outputs += " --retain_blastp_hits " + blast
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")
TRIM_GALORE¶
For trim_galore, the following wrappers are available:
TRIM_GALORE-PE¶
Trim paired-end reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.5
Example¶
This wrapper can be used in the following way:
rule trim_galore_pe:
input:
["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"]
output:
"trimmed/{sample}.1_val_1.fq.gz",
"trimmed/{sample}.1.fastq.gz_trimming_report.txt",
"trimmed/{sample}.2_val_2.fq.gz",
"trimmed/{sample}.2.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.65.0/bio/trim_galore/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError(
"The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead."
)
# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), (
"trim_galore can only output files to a single directory."
" Please indicate only one directory for the output files."
)
shell(
"(trim_galore"
" {snakemake.params.extra}"
" --paired"
" -o {out_dir}"
" {snakemake.input})"
" {log}"
)
TRIM_GALORE-SE¶
Trim unpaired reads using trim_galore.
Software dependencies¶
- trim-galore ==0.4.3
Example¶
This wrapper can be used in the following way:
rule trim_galore_se:
input:
"reads/{sample}.fastq.gz"
output:
"trimmed/{sample}_trimmed.fq.gz",
"trimmed/{sample}.fastq.gz_trimming_report.txt"
params:
extra="--illumina -q 20"
log:
"logs/trim_galore/{sample}.log"
wrapper:
"0.65.0/bio/trim_galore/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os.path
log = snakemake.log_fmt_shell()
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError(
"The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead."
)
# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m
# Check that all output files are in the same directory
out_dir = os.path.dirname(snakemake.output[0])
for file_path in snakemake.output[1:]:
assert out_dir == os.path.dirname(file_path), (
"trim_galore can only output files to a single directory."
" Please indicate only one directory for the output files."
)
shell(
"(trim_galore"
" {snakemake.params.extra}"
" -o {out_dir}"
" {snakemake.input})"
" {log}"
)
TRIMMOMATIC¶
For trimmomatic, the following wrappers are available:
TRIMMOMATIC PE¶
Trim paired-end reads with trimmomatic. (De)compress with pigz.
Software dependencies¶
- trimmomatic ==0.36
- pigz ==2.3.4
Example¶
This wrapper can be used in the following way:
rule trimmomatic_pe:
input:
r1="reads/{sample}.1.fastq.gz",
r2="reads/{sample}.2.fastq.gz"
output:
r1="trimmed/{sample}.1.fastq.gz",
r2="trimmed/{sample}.2.fastq.gz",
# reads where trimming entirely removed the mate
r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
compression_level="-9"
threads:
32
wrapper:
"0.65.0/bio/trimmomatic/pe"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/pe
Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
potential_threads_per_process = available_threads // (
1 + gzipped_input_files + gzipped_output_files
)
if potential_threads_per_process > 0:
# decompressing pigz creates at most 4 threads
pigz_input_threads = (
min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
)
pigz_output_threads = (
(available_threads - pigz_input_threads * gzipped_input_files)
// (1 + gzipped_output_files)
if gzipped_output_files != 0
else 0
)
trimmomatic_threads = (
available_threads
- pigz_input_threads * gzipped_input_files
- pigz_output_threads * gzipped_output_files
)
else:
# not enough threads for pigz
pigz_input_threads = 0
pigz_output_threads = 0
trimmomatic_threads = available_threads
return trimmomatic_threads, pigz_input_threads, pigz_output_threads
def compose_input_gz(filename, threads):
if filename.endswith(".gz") and threads > 0:
return "<(pigz -p {threads} --decompress --stdout {filename})".format(
threads=threads, filename=filename
)
return filename
def compose_output_gz(filename, threads, compression_level):
if filename.endswith(".gz") and threads > 0:
return ">(pigz -p {threads} {compression_level} > {filename})".format(
threads=threads, compression_level=compression_level, filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
snakemake.output.r1,
snakemake.output.r1_unpaired,
snakemake.output.r2,
snakemake.output.r2_unpaired,
]
trimmomatic_threads, input_threads, output_threads = distribute_threads(
input_files, output_files, snakemake.threads
)
input_r1, input_r2 = [
compose_input_gz(filename, input_threads) for filename in input_files
]
output_r1, output_r1_unp, output_r2, output_r2_unp = [
compose_output_gz(filename, output_threads, compression_level)
for filename in output_files
]
shell(
"trimmomatic PE -threads {trimmomatic_threads} {extra} "
"{input_r1} {input_r2} "
"{output_r1} {output_r1_unp} "
"{output_r2} {output_r2_unp} "
"{trimmer} "
"{log}"
)
TRIMMOMATIC SE¶
Trim single-end reads with trimmomatic. (De)compress with pigz.
Software dependencies¶
- trimmomatic ==0.36
- pigz ==2.3.4
Example¶
This wrapper can be used in the following way:
rule trimmomatic:
input:
"reads/{sample}.fastq.gz" # input and output can be uncompressed or compressed
output:
"trimmed/{sample}.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
# optional compression levels from -0 to -9 and -11
compression_level="-9"
threads:
32
wrapper:
"0.65.0/bio/trimmomatic/se"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/se
Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_file, output_file, available_threads):
gzipped_input_files = 1 if input_file.endswith(".gz") else 0
gzipped_output_files = 1 if output_file.endswith(".gz") else 0
potential_threads_per_process = available_threads // (
1 + gzipped_input_files + gzipped_output_files
)
if potential_threads_per_process > 0:
# decompressing pigz creates at most 4 threads
pigz_input_threads = (
min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
)
pigz_output_threads = (
(available_threads - pigz_input_threads * gzipped_input_files)
// (1 + gzipped_output_files)
if gzipped_output_files != 0
else 0
)
trimmomatic_threads = (
available_threads
- pigz_input_threads * gzipped_input_files
- pigz_output_threads * gzipped_output_files
)
else:
# not enough threads for pigz
pigz_input_threads = 0
pigz_output_threads = 0
trimmomatic_threads = available_threads
return trimmomatic_threads, pigz_input_threads, pigz_output_threads
def compose_input_gz(filename, threads):
if filename.endswith(".gz") and threads > 0:
return "<(pigz -p {threads} --decompress --stdout {filename})".format(
threads=threads, filename=filename
)
return filename
def compose_output_gz(filename, threads, compression_level):
if filename.endswith(".gz") and threads > 0:
return ">(pigz -p {threads} {compression_level} > {filename})".format(
threads=threads, compression_level=compression_level, filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Distribute threads
trimmomatic_threads, input_threads, output_threads = distribute_threads(
snakemake.input[0], snakemake.output[0], snakemake.threads
)
# Collect files
input = compose_input_gz(snakemake.input[0], input_threads)
output = compose_output_gz(snakemake.output[0], output_threads, compression_level)
shell(
"trimmomatic SE -threads {trimmomatic_threads} {extra} {input} {output} {trimmer} {log}"
)
TRINITY¶
Generate transcriptome assembly with Trinity
Software dependencies¶
- trinity ==2.8.4
Example¶
This wrapper can be used in the following way:
rule trinity:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
output:
"trinity_out_dir/Trinity.fasta"
log:
'logs/trinity/trinity.log'
params:
extra=""
threads: 4
wrapper:
"0.65.0/bio/trinity"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Trinity."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
max_memory = snakemake.params.get("max_memory", "10G")
# allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = (
[snakemake.input.left]
if isinstance(snakemake.input.left, str)
else snakemake.input.left
)
right = snakemake.input.get("right")
if right:
right = (
[snakemake.input.right]
if isinstance(snakemake.input.right, str)
else snakemake.input.right
)
assert len(left) >= len(
right
), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
input_str_left = " --left " + ",".join(left)
input_str_right = " --right " + ",".join(right)
else:
input_str_left = " --single " + ",".join(left)
input_str_right = ""
input_cmd = " ".join([input_str_left, input_str_right])
# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
if "fq" in left[0] or "fastq" in left[0]:
seqtype = "fq"
elif "fa" in left[0] or "fasta" in left[0]:
seqtype = "fa"
else: # assertion is redundant - warning or error instead?
assert (
seqtype is not None
), "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"
outdir = path.dirname(snakemake.output[0])
assert "trinity" in outdir, "output directory name must contain 'trinity'"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"Trinity {input_cmd} --CPU {snakemake.threads} "
" --max_memory {max_memory} --seqType {seqtype} "
" --output {outdir} {snakemake.params.extra} "
" {log}"
)
TXIMPORT¶
Import and summarize transcript-level estimates for both transcript-level and gene-level analysis.
Software dependencies¶
- bioconductor-tximport==1.14.0
- r-readr==1.3.1
- r-jsonlite==1.6
Example¶
This wrapper can be used in the following way:
rule tximport:
input:
quant = expand("quant/A/quant.sf")
# Optional transcript/gene links as described in tximport
# tx2gene = /path/to/tx2gene
output:
txi = "txi.RDS"
params:
extra = "type='salmon', txOut=TRUE"
wrapper:
"0.65.0/bio/tximport"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Add any tximport options in the params, they will be transmitted through the R wrapper. Supplementary options will cause unknown parameters error.
Authors¶
- Thibault Dayris
Code¶
#!/bin/R
# Loading library
base::library("tximport"); # Perform actual count importation in R
base::library("readr"); # Read faster!
base::library("jsonlite"); # Importing inferential replicates
# Cast input paths as character to avoid errors
samples_paths <- sapply( # Sequentially apply
snakemake@input[["quant"]], # ... to all quantification paths
function(quant) as.character(quant) # ... a cast as character
);
# Collapse path into a character vector
samples_paths <- base::paste0(samples_paths, collapse = '", "');
# Building function arguments
extra <- base::paste0('files = c("', samples_paths, '")');
# Check if user provided optional transcript to gene table
if ("tx_to_gene" %in% names(snakemake@input)) {
tx2gene <- readr::read_tsv(snakemake@input[["tx_to_gene"]]);
extra <- base::paste(
extra, # Foreward existing arguments
", tx2gene = ", # Argument name
"tx2gene" # Add tx2gene to parameters
);
}
# Add user defined arguments
if ("extra" %in% names(snakemake@params)) {
if (snakemake@params[["extra"]] != "") {
extra <- base::paste(
extra, # Foreward existing parameters
snakemake@params[["extra"]], # Add user parameters
sep = ", " # Field separator
);
}
}
print(extra);
# Perform tximport work
txi <- base::eval( # Evaluate the following
base::parse( # ... parsed expression
text = base::paste0(
"tximport::tximport(", extra, ");" # ... of tximport and its arguments
)
)
);
# Save results
base::saveRDS( # Save R object
object = txi, # The txi object
file = snakemake@output[["txi"]] # Output path is provided by Snakemake
);
UCSC¶
For ucsc, the following wrappers are available:
BEDGRAPHTOBIGWIG¶
Convert *.bedGraph file to *.bw file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Software dependencies¶
- ucsc-bedgraphtobigwig == 377
Example¶
This wrapper can be used in the following way:
rule bedGraphToBigWig:
input:
bedGraph="{sample}.bedGraph",
chromsizes="genome.chrom.sizes"
output:
"{sample}.bw"
log:
"logs/{sample}.bed-graph_to_big-wig.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/ucsc/bedGraphToBigWig"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.bedGraph to *.bw conversion using UCSC bedGraphToBigWig tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"bedGraphToBigWig {extra}"
" {snakemake.input.bedGraph} {snakemake.input.chromsizes}"
" {snakemake.output} {log}"
)
FATOTWOBIT¶
Convert *.fa file to *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Software dependencies¶
- ucsc-fatotwobit == 377
Example¶
This wrapper can be used in the following way:
# Example: from *.fa file
rule faToTwoBit_fa:
input:
"{sample}.fa"
output:
"{sample}.2bit"
log:
"logs/{sample}.fa_to_2bit.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/ucsc/faToTwoBit"
# Example: from *.fa.gz file
rule faToTwoBit_fa_gz:
input:
"{sample}.fa.gz"
output:
"{sample}.2bit"
log:
"logs/{sample}.fa-gz_to_2bit.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/ucsc/faToTwoBit"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC faToTwoBit tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("faToTwoBit {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITINFO¶
Generate *.chorom.sizes file by *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Software dependencies¶
- ucsc-twobitinfo == 377
Example¶
This wrapper can be used in the following way:
rule twoBitInfo:
input:
"{sample}.2bit"
output:
"{sample}.chrom.sizes"
log:
"logs/{sample}.chrom.sizes.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/ucsc/twoBitInfo"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitInfo tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("twoBitInfo {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITTOFA¶
Convert *.2bit file to *.fa file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Software dependencies¶
- ucsc-twobittofa == 377
Example¶
This wrapper can be used in the following way:
rule twoBitToFa:
input:
"{sample}.2bit"
output:
"{sample}.fa"
log:
"logs/{sample}.2bit_to_fa.log"
params:
"" # optional params string
wrapper:
"0.65.0/bio/ucsc/twoBitToFa"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitToFa tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("twoBitToFa {extra} {snakemake.input} {snakemake.output} {log}")
UMIS¶
For umis, the following wrappers are available:
UMIS BAMTAG¶
Convert a BAM/SAM with fastqtransformed read names to have UMI and
Software dependencies¶
- umis ==1.0.3
- samtools ==1.9
Example¶
This wrapper can be used in the following way:
rule umis_bamtag:
input:
"data/{sample}.bam"
output:
"data/{sample}.annotated.bam"
log:
"logs/umis/bamtag/{sample}.log"
params:
extra=""
threads: 1
wrapper:
"0.65.0/bio/umis/bamtag"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
raise ValueError("Only expecting one output file: " + str(output_file) + "!")
in_pipe = ""
if bam_input.endswith(".sam"):
in_pipe = "cat "
else:
in_pipe = "samtools view -h "
out_pipe = ""
if not output_file.endswith(".sam"):
out_pipe = " | samtools view -S -b - "
shell(
" {in_pipe} {bam_input} | " " umis bamtag -" " {out_pipe} > {output_file}" " {log}"
)
VARSCAN¶
For varscan, the following wrappers are available:
VARSCAN MPILEUP2INDEL¶
Detect indel in NGS data from mpileup files
Software dependencies¶
- varscan ==2.4.3
Example¶
This wrapper can be used in the following way:
rule mpileup_to_vcf:
input:
"mpileup/{sample}.mpileup.gz"
output:
"vcf/{sample}.vcf"
message:
"Calling Indel with Varscan2"
threads: # Varscan does not take any threading information
1 # However, mpileup might have to be unzipped.
# Keep threading value to one for unzipped mpileup input
# Set it to two for zipped mipileup files
log:
"logs/varscan_{sample}.log"
wrapper:
"0.65.0/bio/varscan/mpileup2indel"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.
If so, it’s recommended to use two threads:
- 1 for varscan itself
- 1 for zcat
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Varscan2 mpileup2indel"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
" cat {} ".format(snakemake.input[0])
if not snakemake.input[0].endswith("gz")
else " zcat {} ".format(snakemake.input[0])
)
# Building output directories
makedirs(op.dirname(snakemake.output[0]))
shell(
"varscan mpileup2indel " # Tool and its subprocess
"{extra} " # Extra parameters
"<( {pileup} ) "
"> {snakemake.output[0]} " # Path to vcf file
"{log}" # Logging behaviour
)
VARSCAN MPILEUP2SNP¶
Detect variants in NGS data from Samtools mpileup
Software dependencies¶
- varscan ==2.4.3
Example¶
This wrapper can be used in the following way:
rule mpileup_to_vcf:
input:
"mpileup/{sample}.mpileup.gz"
output:
"vcf/{sample}.vcf"
message:
"Calling SNP with Varscan2"
threads: # Varscan does not take any threading information
1 # However, mpileup might have to be unzipped.
# Keep threading value to one for unzipped mpileup input
# Set it to two for zipped mipileup files
log:
"logs/varscan_{sample}.log"
wrapper:
"0.65.0/bio/varscan/mpileup2snp"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.
If so, it’s recommended to use two threads:
- 1 for varscan itself
- 1 for zcat
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Varscan2 mpileup2snp"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
" cat {} ".format(snakemake.input[0])
if not snakemake.input[0].endswith("gz")
else " zcat {} ".format(snakemake.input[0])
)
# Building output directories
makedirs(op.dirname(snakemake.output[0]))
shell(
"varscan mpileup2snp " # Tool and its subprocess
"{extra} " # Extra parameters
"<( {pileup} ) "
"> {snakemake.output[0]} " # Path to vcf file
"{log}" # Logging behaviour
)
VARSCAN SOMATIC¶
Varscan Somatic calls variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.
Software dependencies¶
- varscan ==2.4.3
Example¶
This wrapper can be used in the following way:
rule varscan_somatic:
input:
# A pair of pileup files can be used *instead* of the mpileup
# normal_pileup = ""
# tumor_pileup = ""
mpileup = "mpileup/{sample}.mpileup.gz"
output:
snp = "vcf/{sample}.snp.vcf",
indel = "vcf/{sample}.indel.vcf"
message:
"Calling somatic variants {wildcards.sample}"
threads:
1
params:
extra = ""
wrapper:
"0.65.0/bio/varscan/somatic"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for varscan somatic"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
# Defining logging and gathering extra parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Building output dirs
makedirs(op.dirname(snakemake.output.snp))
makedirs(op.dirname(snakemake.output.indel))
# Output prefix
prefix = op.splitext(snakemake.output.snp)[0]
# Searching for input files
pileup_pair = ["normal_pileup", "tumor_pileup"]
in_pileup = ""
mpileup = ""
if "mpileup" in snakemake.input.keys():
# Case there is a mpileup with both normal and tumor
in_pileup = snakemake.input.mpileup
mpileup = "--mpileup 1"
elif all(pileup in snakemake.input.keys() for pileup in pileup_pair):
# Case there are two separate pileup files
in_pileup = " {snakemake.input.normal_pileup}" " {snakemakeinput.tumor_pileup} "
else:
raise KeyError("Could not find either a mpileup, or a pair of pileup files")
shell(
"varscan somatic" # Tool and its subcommand
" {in_pileup}" # Path to input file(s)
" {prefix}" # Path to output
" {extra}" # Extra parameters
" {mpileup}"
" --output-snp {snakemake.output.snp}" # Path to snp output file
" --output-indel {snakemake.output.indel}" # Path to indel output file
)
VCFTOOLS¶
For vcftools, the following wrappers are available:
VCFTOOLS FILTER¶
Filter vcf files using vcftools
Software dependencies¶
- vcftools ==0.1.16
Example¶
This wrapper can be used in the following way:
rule filter_vcf:
input:
"{sample}.vcf"
output:
"{sample}.filtered.vcf"
params:
extra="--chr 1 --recode-INFO-all"
wrapper:
"0.65.0/bio/vcftools/filter"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
input_flag = "--gzvcf"
output = " > " + snakemake.output[0]
if output.endswith(".gz"):
output = " | gzip" + output
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"vcftools "
"{input_flag} "
"{snakemake.input} "
"{extra} "
"--recode "
"--stdout "
"{output} "
"{log}"
)
VEMBRANE¶
Vembrane allows to simultaneously filter variants based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. If no annotation entry remains, the entire variant is deleted. https://github.com/vembrane/vembrane
Software dependencies¶
- vembrane =0.4.1
Example¶
This wrapper can be used in the following way:
rule vembrane:
input:
vcf="in.vcf",
output:
vcf="filtered/out.vcf"
params:
expression="POS > 4000",
extra=""
log:
"logs/vembrane.log"
wrapper:
"0.65.0/bio/vembrane"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for vembrane"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"vembrane" # Tool and its subcommand
" {extra}" # Extra parameters
' "{snakemake.params.expression}"'
" {snakemake.input.vcf}" # Path to input vcf file
" > {snakemake.output.vcf}" # Path to output vcf file
" {log}" # Logging behaviour
)
VEP¶
For vep, the following wrappers are available:
VEP ANNOTATE¶
Annotate variant calls with VEP.
Software dependencies¶
- ensembl-vep =100
- bcftools =1.9
Example¶
This wrapper can be used in the following way:
rule annotate_variants:
input:
calls="variants.bcf", # .vcf, .vcf.gz or .bcf
cache="resources/vep/cache",
plugins="resources/vep/plugins",
output:
calls="variants.annotated.bcf", # .vcf, .vcf.gz or .bcf
stats="variants.html"
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
plugins=["LoFtool"],
extra="--everything" # optional: extra arguments
log:
"logs/vep/annotate.log"
threads: 4
wrapper:
"0.65.0/bio/vep/annotate"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
def get_only_child_dir(path):
children = [child for child in path.iterdir() if child.is_dir()]
assert (
len(children) == 1
), "Invalid VEP cache directory, only a single entry is allowed, make sure that cache was created with the snakemake VEP cache wrapper"
return children[0]
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fork = "--fork {}".format(snakemake.threads) if snakemake.threads > 1 else ""
stats = snakemake.output.stats
cache = snakemake.input.cache
plugins = snakemake.input.plugins
entrypath = get_only_child_dir(get_only_child_dir(Path(cache)))
species = entrypath.parent.name
release, build = entrypath.name.split("_")
load_plugins = " ".join(map("--plugin {}".format, snakemake.params.plugins))
if snakemake.output.calls.endswith(".vcf.gz"):
fmt = "z"
elif snakemake.output.calls.endswith(".bcf"):
fmt = "b"
else:
fmt = "v"
shell(
"(bcftools view {snakemake.input.calls} | "
"vep {extra} {fork} "
"--format vcf "
"--vcf "
"--cache "
"--cache_version {release} "
"--species {species} "
"--assembly {build} "
"--dir_cache {cache} "
"--dir_plugins {plugins} "
"--offline "
"{load_plugins} "
"--output_file STDOUT "
"--stats_file {stats} | "
"bcftools view -O{fmt} > {snakemake.output.calls}) {log}"
)
VEP DOWNLOAD CACHE¶
Download VEP cache for given species, build and release.
Software dependencies¶
- ensembl-vep
Example¶
This wrapper can be used in the following way:
rule get_vep_cache:
output:
directory("resources/vep/cache")
params:
species="saccharomyces_cerevisiae",
build="R64-1-1",
release="98"
log:
"logs/vep/cache.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.65.0/bio/vep/cache"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"vep_install --AUTO cf "
"--SPECIES {snakemake.params.species} "
"--ASSEMBLY {snakemake.params.build} "
"--CACHE_VERSION {snakemake.params.release} "
"--CACHEDIR {snakemake.output} "
"--CONVERT "
"--NO_UPDATE {log}"
)
VEP DOWNLOAD PLUGINS¶
Download VEP plugins.
Software dependencies¶
- python =3
Example¶
This wrapper can be used in the following way:
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
wrapper:
"0.65.0/bio/vep/plugins"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import sys
from pathlib import Path
from urllib.request import urlretrieve
from zipfile import ZipFile
from tempfile import NamedTemporaryFile
if snakemake.log:
sys.stderr = open(snakemake.log[0], "w")
outdir = Path(snakemake.output[0])
outdir.mkdir()
with NamedTemporaryFile() as tmp:
urlretrieve(
"https://github.com/Ensembl/VEP_plugins/archive/release/{release}.zip".format(
release=snakemake.params.release
),
tmp.name,
)
with ZipFile(tmp.name) as f:
for member in f.infolist():
memberpath = Path(member.filename)
if len(memberpath.parts) == 1:
# skip root dir
continue
targetpath = outdir / memberpath.relative_to(memberpath.parts[0])
if member.is_dir():
targetpath.mkdir()
else:
with open(targetpath, "wb") as out:
out.write(f.read(member.filename))