The Snakemake Wrappers repository¶
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Usage¶
The general strategy is to include a wrapper into your workflow via the wrapper directive, e.g.
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"0.2.0/bio/samtools/sort"
Here, Snakemake will automatically download and use the corresponding wrapper files from https://github.com/snakemake/snakemake-wrappers/tree/0.2.0/bio/samtools/sort.
Thereby, 0.2.0
can be replaced with the version tag you want to use, or a commit id.
This ensures reproducibility since changes in the wrapper implementation will only be propagated to your workflow if you update that version tag.
Each wrapper defines required software packages and versions in an environment.yaml
file.
In combination with the --use-conda
flag of Snakemake, this will be deployed automatically.
Alternatively, for example for development, the wrapper directive can also point to full URLs, including the local file://
.
For this to work, you need to provide the (remote) path to the directory containing the wrapper.*
and environment.yaml
files.
For the above example, the explicit GitHub URL to specify would need to be the /raw/
version of the directory:
rule samtools_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"-m 4G"
threads: 8
wrapper:
"https://github.com/snakemake/snakemake-wrappers/raw/0.2.0/bio/samtools/sort"
Contributing¶
We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute refer to the contributing guide.
Wrappers¶
Wrappers allow to quickly use popular tools and libraries in Snakemake workflows.
The menu on the left (expand by clicking (+) if necessary), lists all available wrappers.
ADAPTERREMOVAL¶
rapid adapter trimming, identification, and read merging.
URL: https://adapterremoval.readthedocs.io/en/latest/
Example¶
This wrapper can be used in the following way:
rule adapterremoval_se:
input:
sample=["reads/se/{sample}.fastq"]
output:
fq="trimmed/se/{sample}.fastq.gz", # trimmed reads
discarded="trimmed/se/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/se/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/se/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA",
extra="",
threads: 1
wrapper:
"v2.2.1/bio/adapterremoval"
rule adapterremoval_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
fq1="trimmed/pe/{sample}_R1.fastq.gz", # trimmed mate1 reads
fq2="trimmed/pe/{sample}_R2.fastq.gz", # trimmed mate2 reads
collapsed="trimmed/pe/{sample}.collapsed.fastq.gz", # overlapping mate-pairs which have been merged into a single read
collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz", # collapsed reads that were quality trimmed
singleton="trimmed/pe/{sample}.singleton.fastq.gz", # mate-pairs for which the mate has been discarded
discarded="trimmed/pe/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/pe/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/pe/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
extra="--collapse --collapse-deterministic",
threads: 2
wrapper:
"v2.2.1/bio/adapterremoval"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All output files, except for ‘settings’, must be compressed the same way (gz, or bz2).
Software dependencies¶
adapterremoval=2.3.3
Input/Output¶
Input:
sample
: [‘raw fastq file with R1 reads’, ‘raw fastq file with R2 reads (PE only)’]
Output:
fq
: path to single fastq file (SE only)fq1
: path to fastq R1 (PE only)fq2
: path to fastq R2 (PE only)singleton
: fastq file with singleton reads (PE only; PE reads for which the mate has been discarded)collapsed
: fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read)collapsed_trunc
: fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed)discarded
: fastq file with discarded reads (reads that did not pass filters)settings
: settings and stats file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
import re
extra = snakemake.params.get("extra", "") + " "
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Check input files
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
# Input files
if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra:
reads = "--file1 {}".format(snakemake.input.sample)
else:
reads = "--file1 {} --file2 {}".format(*snakemake.input.sample)
# Gzip or Bzip compressed output?
compress_out = ""
if all(
[
Path(value).suffix == ".gz"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--gzip"
elif all(
[
Path(value).suffix == ".bz2"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--bzip2"
else:
raise ValueError(
"all output files (except for 'settings') must be compressed the same way"
)
# Output files
if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra:
trimmed = f"--output1 {snakemake.output.fq}"
else:
trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}"
# Output singleton files
singleton = snakemake.output.get("singleton", None)
if singleton:
trimmed += f" --singleton {singleton}"
# Output collapsed PE reads
collapsed = snakemake.output.get("collapsed", None)
if collapsed:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsed {collapsed}"
# Output collapsed and truncated PE reads
collapsed_trunc = snakemake.output.get("collapsed_trunc", None)
if collapsed_trunc:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed_trunc specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsedtruncated {collapsed_trunc}"
shell(
"(AdapterRemoval --threads {snakemake.threads} "
"{reads} "
"{adapters} "
"{extra} "
"{compress_out} "
"{trimmed} "
"--discarded {snakemake.output.discarded} "
"--settings {snakemake.output.settings}"
") {log}"
)
ARRIBA¶
Detect gene fusions from chimeric STAR output
URL: https://github.com/suhrig/arriba
Example¶
This wrapper can be used in the following way:
rule arriba:
input:
# STAR bam containing chimeric alignments
bam="{sample}.bam",
# path to reference genome
genome="genome.fasta",
# path to annotation gtf
annotation="annotation.gtf",
# optional arriba blacklist file
custom_blacklist=[],
output:
# approved gene fusions
fusions="fusions/{sample}.tsv",
# discarded gene fusions
discarded="fusions/{sample}.discarded.tsv", # optional
log:
"logs/arriba/{sample}.log",
params:
# required when blacklist or known_fusions is set
genome_build="GRCh38",
# strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
# only set blacklist input-file or blacklist-param
default_blacklist=False, # optional
default_known_fusions=True, # optional
# file containing information from structural variant analysis
sv_file="", # optional
# optional parameters
extra="-i 1,2",
threads: 1
wrapper:
"v2.2.1/bio/arriba"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
This tool/wrapper does not handle multi threading.
Software dependencies¶
arriba=2.4.0
Input/Output¶
Input:
bam
: Path to bam formatted alignment file from STARgenome
: Path to fasta formatted genome sequenceannotation
: Path to GTF formatted genome annotation
Output:
fusions
: Path to output fusion file
Params¶
known_fusions
: Path to known fusions file, see official documentation on known fusions for more information.blacklist
: Path to blacklist file, see official documentation on blacklist for more information.sv_file
: Path to structural variations calls from WGS, see official documentation on SV for more information.extra
: Other optional parameters
Authors¶
- Jan Forster
- Felix Mölder
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
import json
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
discarded_cmd = "-O " + discarded_fusions
else:
discarded_cmd = ""
database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba")
build = snakemake.params.get("genome_build", None)
blacklist_input = snakemake.input.get("custom_blacklist")
default_blacklist = snakemake.params.get("default_blacklist", False)
default_known_fusions = snakemake.params.get("default_known_fusions", False)
if default_blacklist or default_known_fusions:
if not build:
raise ValueError(
"Please provide a genome build when using blacklist- or known_fusion-filtering"
)
arriba_vers = [
entry["version"]
for entry in json.load(os.popen("conda list --json"))
if entry["name"] == "arriba"
][0]
if blacklist_input and not default_blacklist:
blacklist_cmd = "-b " + blacklist_input
elif not blacklist_input and default_blacklist:
blacklist_dict = {
"GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
blacklist_path = os.path.join(database_dir, blacklist_dict[build])
blacklist_cmd = "-b " + blacklist_path
elif not blacklist_input and not default_blacklist:
blacklist_cmd = "-f blacklist"
else:
raise ValueError(
"custom_blacklist input file and default_blacklist parameter option defined. Please set only one of both."
)
if default_known_fusions:
fusions_dict = {
"GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
known_fusions_path = os.path.join(database_dir, fusions_dict[build])
known_cmd = "-k " + known_fusions_path
else:
known_cmd = ""
sv_file = snakemake.params.get("sv_file")
if sv_file:
sv_cmd = "-d " + sv_file
else:
sv_cmd = ""
shell(
"arriba "
"-x {snakemake.input.bam} "
"-a {snakemake.input.genome} "
"-g {snakemake.input.annotation} "
"{blacklist_cmd} "
"{known_cmd} "
"{sv_cmd} "
"-o {snakemake.output.fusions} "
"{discarded_cmd} "
"{extra} "
"{log}"
)
ART¶
For art, the following wrappers are available:
ART_PROFILER_ILLUMINA¶
Use the art profiler to create a base quality score profile for Illumina read data from a fastq file.
URL: https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm
Example¶
This wrapper can be used in the following way:
rule art_profiler_illumina:
input:
"data/{sample}.fq",
output:
"profiles/{sample}.txt"
log:
"logs/art_profiler_illumina/{sample}.log"
params: ""
threads: 2
wrapper:
"v2.2.1/bio/art/profiler_illumina"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Your input file must have one of the following extensions: fastq, fastq.gz, fq or fq.gz
Software dependencies¶
art=2016.06.05
Input/Output¶
Input:
- Path to fastq-formatted input file (first place in the input list of files)
Output:
- Path to txt formatted profile (first place in the output list of files)
Params¶
Extra parameters (no keyword mapped parameter)
:
Authors¶
- David Laehnemann
- Victoria Sack
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
import tempfile
import re
# Create temporary directory that will only contain the symbolic link to the
# input file, in order to sanely work with the art_profiler_illumina cli
with tempfile.TemporaryDirectory() as temp_input:
# ensure that .fastq and .fastq.gz input files work, as well
filename = os.path.basename(snakemake.input[0]).replace(".fastq", ".fq")
# figure out the exact file extension after the above substitution
ext = re.search("fq(\.gz)?$", filename)
if ext:
fq_extension = ext.group(0)
else:
raise IOError(
"Incompatible extension: This art_profiler_illumina "
"wrapper requires input files with one of the following "
"extensions: fastq, fastq.gz, fq or fq.gz. Please adjust "
"your input and the invocation of the wrapper accordingly."
)
os.symlink(
# snakemake paths are relative, but the symlink needs to be absolute
os.path.abspath(snakemake.input[0]),
# the following awkward file name generation has reasons:
# * the file name needs to be unique to the execution of the
# rule, as art will create and mv temporary files with its basename
# in the output directory, which causes utter confusion when
# executing instances of the rule in parallel
# * temp file name cannot have any read infixes before the file
# extension, because otherwise art does read enumeration magic
# that messes up output file naming
os.path.join(
temp_input,
filename.replace(
"." + fq_extension, "_preventing_art_magic_spacer." + fq_extension
),
),
)
# include output folder name in the profile_name command line argument and
# strip off the file extension, as art will add its own ".txt"
profile_name = os.path.join(
os.path.dirname(snakemake.output[0]), filename.replace("." + fq_extension, "")
)
shell(
"( art_profiler_illumina {snakemake.params} {profile_name}"
" {temp_input} {fq_extension} {snakemake.threads} ) 2> {snakemake.log}"
)
ASSEMBLY-STATS¶
Generates report of summary statistics for a genome assembly
URL: https://github.com/sanger-pathogens/assembly-stats
Example¶
This wrapper can be used in the following way:
rule run_assembly_stats:
input:
#Input assembly
assembly="{sample}.fasta",
output:
#Assembly statistics
assembly_stats="{sample}_stats.txt",
params:
# Tab delimited output, with a header, is set as the default. Other options are available:
# -l <int>
# Minimum length cutoff for each sequence.
# Sequences shorter than the cutoff will be ignored [1]
# -s
# Print 'grep friendly' output
# -t
# Print tab-delimited output
# -u
# Print tab-delimited output with no header line
# If you want to add multiple options just delimit them with a space.
# Note that you can only pick one output format
# Check https://github.com/sanger-pathogens/assembly-stats for more details
extra="-t",
log:
"logs/{sample}.assembly-stats.log",
threads: 1
wrapper:
"v2.2.1/bio/assembly-stats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
This tool/wrapper does not handle multi threading
Software dependencies¶
assembly-stats=1.0.1
Input/Output¶
Input:
assembly
: Genomic assembly (fasta format)
Output:
assembly_stats
: Assembly statistics (format of your choosing, default = tab-delimited)
Params¶
extra
: Optional parameters, see assembly-stats official documentation
Authors¶
- Pathogen Informatics, Wellcome Sanger Institute (assembly-stats tool) - https://github.com/sanger-pathogens
- Max Cummins (Snakemake wrapper [unaffiliated with Wellcome Sanger Institute])
Code¶
__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"assembly-stats"
" {snakemake.params.extra}"
" {snakemake.input.assembly}"
" > {snakemake.output.assembly_stats}"
" {log}"
)
BAMTOOLS¶
For bamtools, the following wrappers are available:
BAMTOOLS FILTER¶
Filters BAM files. For more information about bamtools see bamtools documentation
URL: https://github.com/pezmaster31/bamtools
Example¶
This wrapper can be used in the following way:
rule bamtools_filter:
input:
"{sample}.bam"
output:
"filtered/{sample}.bam"
params:
# optional parameters
tags = [ "NM:<4", "MQ:>=10" ], # list of key:value pair strings
min_size = "-2000",
max_size = "2000",
min_length = "10",
max_length = "20",
# to add more optional parameters (see bamtools filter --help):
additional_params = "-mapQuality \">=0\" -isMapped \"true\""
log:
"logs/bamtools/filtered/{sample}.log"
wrapper:
"v2.2.1/bio/bamtools/filter"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
A complete usage documentation is available here: https://raw.githubusercontent.com/wiki/pezmaster31/bamtools/Tutorial_Toolkit_BamTools-1.0.pdf
This tool/wrapper does not handle multi threading
Software dependencies¶
bamtools=2.5.2
Input/Output¶
Input:
- bam files (.bam), must be in first position
Output:
- bam file (.bam), must be in first position
Params¶
tags
: filtering tagsmin_size
: minimum insert sizemax_size
: maximum insert sizemin_length
: minimum read lengthmax_length
: maximum read lengthadditional_params
: Other filtering and optional parameters
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# extract arguments
params = ""
extra_limits = ""
tags = snakemake.params.get("tags")
min_size = snakemake.params.get("min_size")
max_size = snakemake.params.get("max_size")
min_length = snakemake.params.get("min_length")
max_length = snakemake.params.get("max_length")
additional_params = snakemake.params.get("additional_params")
if tags and tags is not None:
params = params + " " + " ".join(map('-tag "{}"'.format, tags))
if min_size and min_size is not None:
params = params + ' -insertSize ">=' + min_size + '"'
if max_size and max_size is not None:
extra_limits = extra_limits + ' -insertSize "<=' + max_size + '"'
else:
if max_size and max_size is not None:
params = params + ' -insertSize "<=' + max_size + '"'
if min_length and min_length is not None:
params = params + ' -length ">=' + min_length + '"'
if max_length and max_length is not None:
extra_limits = extra_limits + ' -length "<=' + max_length + '"'
else:
if max_length and max_length is not None:
params = params + ' -length "<=' + max_length + '"'
if additional_params and additional_params is not None:
params = params + " " + additional_params
if extra_limits:
params = params + " | bamtools filter" + extra_limits
shell(
"(bamtools filter"
" -in {snakemake.input[0]}" + params + " -out {snakemake.output[0]}) {log}"
)
BAMTOOLS FILTER WITH JSON¶
Filters BAM files with JSON-script for filtering parameters and rules. For more information about bamtools see bamtools documentation
URL: https://github.com/pezmaster31/bamtools
Example¶
This wrapper can be used in the following way:
rule bamtools_filter_json:
input:
"{sample}.bam"
output:
"filtered/{sample}.bam"
params:
json="filtering-rules.json",
region="" # optional parameter for defining a specific region, e.g. "chr1:500..chr3:750"
log:
"logs/bamtools/filtered/{sample}.log"
wrapper:
"v2.2.1/bio/bamtools/filter_json"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
A complete usage documentation is available here: https://raw.githubusercontent.com/wiki/pezmaster31/bamtools/Tutorial_Toolkit_BamTools-1.0.pdf
This tool/wrapper does not handle multi threading
Software dependencies¶
bamtools=2.5.2
Input/Output¶
Input:
- bam files (.bam), must be in first position
Output:
- bam file (.bam), must be in first position
Params¶
json
: Path to filter file, json formatted.region
: see documentation for more information about multiple formats.
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
region = snakemake.params.get("region")
region_param = ""
if region and region is not None:
region_param = ' -region "' + region + '"'
shell(
"(bamtools filter"
" -in {snakemake.input[0]}"
" -out {snakemake.output[0]}"
+ region_param
+ " -script {snakemake.params.json}) {log}"
)
BAMTOOLS SPLIT¶
Split bam file into sub files, default by reference
URL: https://github.com/pezmaster31/bamtools
Example¶
This wrapper can be used in the following way:
rule bamtools_split:
input:
"mapped/{sample}.bam",
output:
"mapped/{sample}.REF_xx.bam",
params:
extra="-reference",
log:
"logs/bamtoos_split/{sample}.log",
wrapper:
"v2.2.1/bio/bamtools/split"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
A complete usage documentation is available here: https://raw.githubusercontent.com/wiki/pezmaster31/bamtools/Tutorial_Toolkit_BamTools-1.0.pdf
This tool/wrapper does not handle multi threading
Software dependencies¶
bamtools=2.5.2
Input/Output¶
Input:
- bam file, this must be the only file in input.
Output:
- multiple bam file multiple formats.
Params¶
extra
: Optional parameters
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
if len(snakemake.input) != 1:
raise ValueError("One bam input file expected, got: " + str(len(snakemake.input)))
shell("bamtools split -in {snakemake.input} {extra} {log}")
BAMTOOLS STATS¶
Use bamtools to collect statistics from a BAM file. For more information about bamtools see bamtools documentation and bamtools source code.
URL: https://github.com/pezmaster31/bamtools
Example¶
This wrapper can be used in the following way:
rule bamtools_stats:
input:
"{sample}.bam"
output:
"{sample}.bamstats"
params:
"-insert" # optional summarize insert size data
log:
"logs/bamtools/stats/{sample}.log"
wrapper:
"v2.2.1/bio/bamtools/stats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
A complete usage documentation is available here: https://raw.githubusercontent.com/wiki/pezmaster31/bamtools/Tutorial_Toolkit_BamTools-1.0.pdf This tool/wrapper does not handle multi threading
Software dependencies¶
bamtools=2.5.2
Input/Output¶
Input:
- bam files (.bam), must be in first position
Output:
- bamstats file (.bamstats), must be in first position
Params¶
Optional parameters as first and only value.
:
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(bamtools stats {snakemake.params} -in {snakemake.input[0]} > {snakemake.output[0]}) {log}"
)
BARRNAP¶
BAsic Rapid Ribosomal RNA Predictor
URL: https://github.com/tseemann/barrnap
Example¶
This wrapper can be used in the following way:
rule barrnap:
input:
fasta="{sample}.fasta",
output:
gff="{sample}.gff",
fasta="{sample}_hits.fasta",
params:
kingdom="bac",
extra="",
threads: 1
log:
"logs/barrnap/{sample}.log",
wrapper:
"v2.2.1/bio/barrnap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Multiple threads can be used during nhmmer search.
Software dependencies¶
barrnap=0.9
Input/Output¶
Input:
fasta
: query fasta file
Output:
gff
: The rRNA locations in GFF3 format.fasta
: Optional. Fasta file with the hit sequences.
Params¶
extra
: additional parameterskingdom
: database to use, either Bacteria:bac, Archaea:arc, Eukaryota:euk or Metazoan Mitochondria:mito.
Authors¶
- Curro Campuzano Jiménez
Code¶
"""Snakemake wrapper for barrnap."""
__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
kingdom = snakemake.params.get("kingdom", "bac")
fasta_out = snakemake.output.get("fasta")
if fasta_out:
extra += f" -o {fasta_out}"
shell(
"barrnap"
" --threads {snakemake.threads}"
" -k {kingdom}"
" {extra}"
" < {snakemake.input.fasta}"
" > {snakemake.output.gff}"
" {log}"
)
BAZAM¶
Bazam is a smarter way to realign reads from one genome to another. If you’ve tried to use Picard SAMtoFASTQ or samtools bam2fq before and ended up unsatisfied with complicated, long running inefficient pipelines, bazam might be what you wanted. Bazam will output FASTQ in a form that can stream directly into common aligners such as BWA or Bowtie2, so that you can quickly and easily realign reads without extraction to any intermediate format. Bazam can target a specific region of the genome, specified as a region or a gene name if you prefer.
URL: https://github.com/ssadedin/bazam
Example¶
This wrapper can be used in the following way:
rule bazam_interleaved:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
output:
reads="results/reads/{sample}.fastq.gz",
resources:
# suggestion according to:
# https://github.com/ssadedin/bazam/blob/c5988daf4cda4492e3d519c94f2f1e2022af5efe/README.md?plain=1#L46-L55
mem_mb=lambda wildcards, input: max([0.2 * input.size_mb, 200]),
log:
"logs/bazam/{sample}.log",
wrapper:
"v2.2.1/bio/bazam"
rule bazam_separated:
input:
bam="mapped/{sample}.cram",
bai="mapped/{sample}.cram.crai",
reference="genome.fasta",
output:
r1="results/reads/{sample}.r1.fastq.gz",
r2="results/reads/{sample}.r2.fastq.gz",
resources:
# suggestion according to:
# https://github.com/ssadedin/bazam/blob/c5988daf4cda4492e3d519c94f2f1e2022af5efe/README.md?plain=1#L46-L55
mem_mb=lambda wildcards, input: max([0.4 * input.size_mb, 200]),
log:
"logs/bazam/{sample}.log",
wrapper:
"v2.2.1/bio/bazam"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bazam=1.0.1
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
bam
: Path to mapping file (BAM/CRAM formatted)reference
: Optional path to reference genome sequence (FASTA formatted). Required for CRAM input.
Output:
reads
: Path to realigned reads (single-ended or interleaved) (FASTQ formatted) ORr1
: Path to upstream reads (FASTQ formatted) ANDr2
: Path to downstream reads (FASTQ formatted)
Params¶
extra
: Optional parameters passed to bazam
Authors¶
- Christopher Schröder
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2022, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
bam = snakemake.input.bam
# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")
if bam.endswith(".cram"):
if not (reference := snakemake.input.get("reference", "")):
raise ValueError(
"input 'reference' is required when working with CRAM input files"
)
reference_cmd = f"-Dsamjdk.reference_fasta={reference}"
else:
reference_cmd = ""
# Extract arguments.
if reads := snakemake.output.get("reads", ""):
out_cmd = f"-o {reads}"
elif (r1 := snakemake.output.get("r1", "")) and (r2 := snakemake.output.get("r2", "")):
out_cmd = f"-r1 {r1} -r2 {r2}"
else:
raise ValueError("either 'reads' or 'r1' and 'r2' must be specified in output")
shell("(bazam {java_opts} {reference_cmd} {extra} -bam {bam} {out_cmd}) {log}")
BBTOOLS¶
For bbtools, the following wrappers are available:
BBDUK¶
Run BBDuk.
URL: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/
Example¶
This wrapper can be used in the following way:
rule bbduk_se:
input:
sample=["reads/se/{sample}.fastq"],
adapters="reads/adapt.fas",
output:
trimmed="trimmed/se/{sample}.fastq.gz",
singleton="trimmed/se/{sample}.single.fastq.gz",
discarded="trimmed/se/{sample}.discarded.fastq.gz",
stats="trimmed/se/{sample}.stats.txt",
log:
"logs/bbduk/se/{sample}.log"
params:
extra = lambda w, input: "ref={},adapters,artifacts ktrim=r k=23 mink=11 hdist=1 tpe tbo trimpolygright=10 minlen=25 maxns=30 entropy=0.5 entropywindow=50 entropyk=5".format(input.adapters),
resources:
mem_mb=4000,
threads: 7
wrapper:
"v2.2.1/bio/bbtools/bbduk"
rule bbduk_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"],
adapters="reads/adapt.fas",
output:
trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
singleton="trimmed/pe/{sample}.single.fastq",
discarded="trimmed/pe/{sample}.discarded.fastq",
stats="trimmed/pe/{sample}.stats.txt",
log:
"logs/bbduk/pe/{sample}.log"
params:
extra = lambda w, input: "ref={},adapters,artifacts ktrim=r k=23 mink=11 hdist=1 tpe tbo trimpolygright=10 minlen=25 maxns=30 entropy=0.5 entropywindow=50 entropyk=5".format(input.adapters),
resources:
mem_mb=4000,
threads: 7
wrapper:
"v2.2.1/bio/bbtools/bbduk"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
Software dependencies¶
bbmap=39.01
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
sample
: list of raw R1 and (if PE) R2 fastq file(s)
Output:
trimmed
: list of trimmed R1 and (if PE) R2 fastq file(s)singleton
: fastq file with singleton reads (optional)discarded
: fastq file with discarded reads (optional)stats
: stats file (optional)
Params¶
extra
: additional program argumentsadapters
: Literal adapters sequences
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "in={}".format(snakemake.input.sample)
trimmed = "out={}".format(snakemake.output.trimmed)
else:
reads = "in={} in2={}".format(*snakemake.input.sample)
trimmed = "out={} out2={}".format(*snakemake.output.trimmed)
singleton = snakemake.output.get("singleton", "")
if singleton:
singleton = f"outs={singleton}"
discarded = snakemake.output.get("discarded", "")
if discarded:
discarded = f"outm={discarded}"
stats = snakemake.output.get("stats", "")
if stats:
stats = f"stats={stats}"
shell(
"bbduk.sh {java_opts} t={snakemake.threads} "
"{reads} "
"{adapters} "
"{extra} "
"{trimmed} {singleton} {discarded} "
"{stats} "
"{log}"
)
LOGLOG¶
Run LogLog to estimate memory requirements.
URL: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbnorm-guide/
Example¶
This wrapper can be used in the following way:
rule loglog_se:
input:
sample=["reads/se/{sample}.fastq"],
log:
"logs/se/{sample}.log",
params:
extra="buckets=2048 seed=1234",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/loglog"
rule loglog_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"],
log:
"logs/pe/{sample}.log",
params:
extra="buckets=2048 seed=1234",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/loglog"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
Software dependencies¶
bbmap=39.01
snakemake-wrapper-utils=0.5.3
Params¶
extra
: additional program arguments
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "in={}".format(snakemake.input.sample)
else:
reads = "in={} in2={}".format(*snakemake.input.sample)
shell("loglog.sh {java_opts} {reads} {extra} {log}")
TADPOLE¶
Run Tadpole.
URL: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/tadpole-guide/
Example¶
This wrapper can be used in the following way:
rule tadpole_correct_se:
input:
sample=["reads/se/{sample}.fastq"],
output:
out="out/correct_se/{sample}.fastq.gz",
discarded="out/correct_se/{sample}.discarded.fastq.gz",
log:
"logs/correct_se/{sample}.log",
params:
mode="correct",
extra="",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/tadpole"
rule tadpole_correct_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"],
output:
out=["out/correct_pe/{sample}.1.fastq", "out/correct_pe/{sample}.2.fastq"],
discarded="out/correct_pe/{sample}.discarded.fastq",
log:
"logs/correct_pe/{sample}.log",
params:
mode="correct",
extra="",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/tadpole"
rule tadpole_extend_se:
input:
sample=["reads/se/{sample}.fastq"],
output:
out="out/extend_se/{sample}.fastq.gz",
discarded="out/extend_se/{sample}.discarded.fastq.gz",
log:
"logs/extend_se/{sample}.log",
params:
mode="extend",
extra="",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/tadpole"
rule tadpole_extend_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"],
output:
out=["out/extend_pe/{sample}.1.fastq", "out/extend_pe/{sample}.2.fastq"],
discarded="out/extend_pe/{sample}.discarded.fastq",
log:
"logs/extend_pe/{sample}.log",
params:
mode="extend",
extra="",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/bbtools/tadpole"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
Software dependencies¶
bbmap=39.01
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
sample
: list of R1 and (if PE) R2 fastq file(s)extra
: kmer data, but not for error-correction or extension (optional)
Output:
trimmed
: trimmed fastq file with R1 reads, trimmed fastq file with R2 reads (PE only, optional)discarded
: fastq file with discarded reads (optional)
Params¶
mode
: Run mode (one of contig, extend, correct, insert, or discard; mandatory)extra
: additional program arguments
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
assert snakemake.params.mode in ["contig", "extend", "correct", "insert", "discard"]
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "in={}".format(snakemake.input.sample)
out = "out={}".format(snakemake.output.out)
else:
reads = "in={} in2={}".format(*snakemake.input.sample)
out = "out={} out2={}".format(*snakemake.output.out)
in_extra = snakemake.input.get("extra", "")
if in_extra:
reads += f" extra={in_extra}"
discarded = snakemake.output.get("discarded", "")
if discarded:
out += f" outd={discarded}"
shell(
"tadpole.sh {java_opts}"
" threads={snakemake.threads}"
" mode={snakemake.params.mode}"
" {reads}"
" {extra}"
" {out}"
" {log}"
)
BCFTOOLS¶
For bcftools, the following wrappers are available:
BCFTOOLS CALL¶
Call variants with bcftools call.
URL: http://www.htslib.org/doc/bcftools.html#call
Example¶
This wrapper can be used in the following way:
rule bcftools_call:
input:
pileup="{sample}.pileup.bcf",
output:
calls="{sample}.calls.bcf",
params:
uncompressed_bcf=False,
caller="-m", # valid options include -c/--consensus-caller or -m/--multiallelic-caller
extra="--ploidy 1 --prior 0.001",
log:
"logs/bcftools_call/{sample}.log",
wrapper:
"v2.2.1/bio/bcftools/call"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
- Michael Hall
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
class CallerOptionError(Exception):
pass
valid_caller_opts = {"-c", "--consensus-caller", "-m", "--multiallelic-caller"}
caller_opt = snakemake.params.get("caller", "")
if caller_opt.strip() not in valid_caller_opts:
raise CallerOptionError(
"bcftools call expects either -m/--multiallelic-caller or "
"-c/--consensus-caller as caller option."
)
shell(
"bcftools call"
" {bcftools_opts}"
" {caller_opt}"
" {extra}"
" {snakemake.input[0]}"
" {log}"
)
BCFTOOLS CONCAT¶
Concatenate vcf/bcf files with bcftools.
URL: http://www.htslib.org/doc/bcftools.html#concat
Example¶
This wrapper can be used in the following way:
rule bcftools_concat:
input:
calls=["a.bcf", "b.bcf"],
output:
"all.bcf",
log:
"logs/all.log",
params:
uncompressed_bcf=False,
extra="", # optional parameters for bcftools concat (except -o)
threads: 4
resources:
mem_mb=10,
wrapper:
"v2.2.1/bio/bcftools/concat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param alllows for additional program arguments (not –threads, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.16
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bcftools concat {bcftools_opts} {extra} {snakemake.input.calls} {log}")
BCFTOOLS FILTER¶
filter vcf/bcf file.
URL: http://www.htslib.org/doc/bcftools.html#filter
Example¶
This wrapper can be used in the following way:
rule bcf_filter_sample:
input:
"{prefix}.bcf", # input bcf/vcf needs to be first input
samples="samples.txt", # other inputs, e.g. sample files, are optional
output:
"{prefix}.filter_sample.vcf",
log:
"log/{prefix}.filter_sample.vcf.log",
params:
filter=lambda w, input: f"--exclude 'GT[@{input.samples}]=\"0/1\"'",
extra="",
wrapper:
"v2.2.1/bio/bcftools/filter"
rule bcf_filter_o_vcf:
input:
"{prefix}.bcf",
output:
"{prefix}.filter.vcf",
log:
"log/{prefix}.filter.vcf.log",
params:
filter="-i 'QUAL > 5'",
extra="",
wrapper:
"v2.2.1/bio/bcftools/filter"
rule bcf_filter_o_vcf_gz:
input:
"{prefix}.bcf",
output:
"{prefix}.filter.vcf.gz",
log:
"log/{prefix}.filter.vcf.gz.log",
params:
filter="-i 'QUAL > 5'",
extra="",
wrapper:
"v2.2.1/bio/bcftools/filter"
rule bcf_filter_o_bcf:
input:
"{prefix}.bcf",
output:
"{prefix}.filter.bcf",
log:
"log/{prefix}.filter.bcf.log",
params:
filter="-i 'QUAL > 5'",
extra="",
wrapper:
"v2.2.1/bio/bcftools/filter"
rule bcf_filter_o_uncompressed_bcf:
input:
"{prefix}.bcf",
output:
"{prefix}.filter.uncompressed.bcf",
log:
"log/{prefix}.filter.uncompressed.bcf.log",
params:
uncompressed_bcf=True,
filter="-i 'QUAL > 5'",
extra="",
wrapper:
"v2.2.1/bio/bcftools/filter"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
- Nikos Tsardakas Renhuldt
- Filipe G. Vieira
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(
snakemake, parse_ref=False, parse_samples=False, parse_memory=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
filter = snakemake.params.get("filter", "")
if len(snakemake.output) > 1:
raise Exception("Only one output file expected, got: " + str(len(snakemake.output)))
shell(
"bcftools filter"
" {bcftools_opts}"
" {filter}"
" {extra}"
" {snakemake.input[0]}"
" {log}"
)
BCFTOOLS INDEX¶
Index vcf/bcf file.
URL: http://www.htslib.org/doc/bcftools.html#index
Example¶
This wrapper can be used in the following way:
rule bcftools_index:
input:
"a.bcf",
output:
"a.bcf.csi",
log:
"index/a.log",
params:
extra="", # optional parameters for bcftools index
wrapper:
"v2.2.1/bio/bcftools/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not –threads, -o/–output).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Jan Forster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(
snakemake, parse_ref=False, parse_output_format=False, parse_memory=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if "--tbi" in extra or "--csi" in extra:
raise ValueError(
"You have specified index format (`--tbi/--csi`) in `params.extra`; this is automatically infered from the first output file."
)
if snakemake.output[0].endswith(".tbi"):
extra += " --tbi"
elif snakemake.output[0].endswith(".csi"):
extra += " --csi"
else:
raise ValueError("invalid index file format ('.tbi', '.csi').")
shell("bcftools index {bcftools_opts} {extra} {snakemake.input[0]} {log}")
BCFTOOLS MERGE¶
Merge vcf/bcf files with bcftools.
URL: http://www.htslib.org/doc/bcftools.html#merge
Example¶
This wrapper can be used in the following way:
rule bcftools_merge:
input:
calls=["a.bcf", "b.bcf"],
output:
"all.bcf",
log:
"all.log"
params:
uncompressed_bcf=False,
extra="", # optional parameters for bcftools concat (except -o)
wrapper:
"v2.2.1/bio/bcftools/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
- Filipe G. Vieira
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bcftools merge {bcftools_opts} {extra} {snakemake.input} {log}")
BCFTOOLS MPILEUP¶
Generate VCF or BCF containing genotype likelihoods for one or multiple alignment (BAM or CRAM) files.
URL: http://www.htslib.org/doc/bcftools.html#mpileup
Example¶
This wrapper can be used in the following way:
rule bcftools_mpileup:
input:
alignments="mapped/{sample}.bam",
ref="genome.fasta", # this can be left out if --no-reference is in options
index="genome.fasta.fai",
output:
pileup="pileups/{sample}.pileup.bcf",
params:
uncompressed_bcf=False,
extra="--max-depth 100 --min-BQ 15",
log:
"logs/bcftools_mpileup/{sample}.log",
wrapper:
"v2.2.1/bio/bcftools/mpileup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -f/–fasta-ref, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Michael Hall
- Filipe G. Vieira
Code¶
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
extra = snakemake.params.get("extra", "")
bcftools_opts = get_bcftools_opts(
snakemake, parse_ref=("--no-reference" not in extra), parse_memory=False
)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
class MissingReferenceError(Exception):
pass
shell("bcftools mpileup {bcftools_opts} {extra} {snakemake.input[0]} {log}")
BCFTOOLS NORM¶
Left-align and normalize indels, check if REF alleles match the reference, split multiallelic sites into multiple rows; recover multiallelics from multiple rows.
URL: http://www.htslib.org/doc/bcftools.html#norm
Example¶
This wrapper can be used in the following way:
rule norm_vcf:
input:
"{prefix}.bcf",
#ref="genome.fasta" # optional reference (will be translated into the -f option)
output:
"{prefix}.norm.vcf", # can also be .bcf, corresponding --output-type parameter is inferred automatically
log:
"{prefix}.norm.log",
params:
extra="--rm-dup none", # optional
#uncompressed_bcf=False,
wrapper:
"v2.2.1/bio/bcftools/norm"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -f/–fasta-ref, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Dayne Filer
- Filipe G. Vieira
Code¶
__author__ = "Dayne Filer"
__copyright__ = "Copyright 2019, Dayne Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bcftools norm {bcftools_opts} {extra} {snakemake.input[0]} {log}")
BCFTOOLS REHEADER¶
Change header or sample names of vcf/bcf file.
URL: http://www.htslib.org/doc/bcftools.html#reheader
Example¶
This wrapper can be used in the following way:
rule bcftools_reheader:
input:
vcf="a.bcf",
## new header, can be omitted if "samples" is set
header="header.txt",
## file containing new sample names, can be omitted if "header" is set
samples="samples.tsv",
output:
"a.reheader.bcf",
log:
"reheader.log",
params:
uncompressed_bcf=False,
extra="", # optional parameters for bcftools reheader
view_extra="", # optional parameters for bcftools view
threads: 2
wrapper:
"v2.2.1/bio/bcftools/reheader"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, -O/–output-type, or -T/–temp-prefix).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Jan Forster
- Filipe G. Vieira
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
view_extra = snakemake.params.get("view_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
## Extract arguments
header = snakemake.input.get("header", "")
if header:
header = f"-h {header}"
samples = snakemake.input.get("samples", "")
if samples:
samples = f"-s {samples}"
with tempfile.TemporaryDirectory() as tmpdir:
tmp_prefix = Path(tmpdir) / "bcftools_reheader."
shell(
"(bcftools reheader"
" --threads {snakemake.threads}"
" {header}"
" {samples}"
" {extra}"
" --temp-prefix {tmp_prefix}"
" {snakemake.input[0]}"
"| bcftools view"
" {bcftools_opts}"
" {view_extra}"
") {log}"
)
BCFTOOLS SORT¶
Sort vcf/bcf file.
URL: http://www.htslib.org/doc/bcftools.html#sort
Example¶
This wrapper can be used in the following way:
rule bcftools_sort:
input:
"{sample}.bcf",
output:
"{sample}.sorted.bcf",
log:
"logs/bcftools/sort/{sample}.log",
params:
# Set to True, in case you want uncompressed BCF output
uncompressed_bcf=False,
# Extra arguments
extras="",
resources:
mem_mb=8000,
wrapper:
"v2.2.1/bio/bcftools/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, -O/–output-type, -m/–max-mem, or -T/–temp-dir).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"bcftools sort"
" {bcftools_opts}"
" {extra}"
" --temp-dir {tmpdir}"
" {snakemake.input[0]}"
" {log}"
)
BCFTOOLS STATS¶
Generate VCF stats using bcftools stats.
URL: http://www.htslib.org/doc/bcftools.html#stats
Example¶
This wrapper can be used in the following way:
rule bcf_stats:
input:
"{prefix}",
output:
"{prefix}.stats.txt",
log:
"{prefix}.bcftools.stats.log",
params:
"",
wrapper:
"v2.2.1/bio/bcftools/stats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not –threads, -f/–fasta-ref, or -o/–output).
- For more information see, http://www.htslib.org/doc/bcftools.html#stats
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- William Rowell
- Filipe G. Vieira
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(
snakemake, parse_output=False, parse_output_format=False, parse_memory=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"bcftools stats"
" {bcftools_opts}"
" {extra}"
" {snakemake.input[0]}"
" > {snakemake.output[0]}"
" {log}"
)
BCFTOOLS VIEW¶
View vcf/bcf file in a different format.
URL: http://www.htslib.org/doc/bcftools.html#view
Example¶
This wrapper can be used in the following way:
rule bcf_view_sample_file:
input:
"{prefix}.bcf", # input bcf/vcf needs to be first input
index="{prefix}.bcf.csi", # other inputs are optional
samples="samples.txt",
output:
"{prefix}.view_sample.vcf",
log:
"log/{prefix}.view_sample.vcf.log",
params:
# optional extra parameters
extra=lambda w, input: f"-S {input.samples}",
wrapper:
"v2.2.1/bio/bcftools/view"
rule bcf_view_o_vcf:
input:
"{prefix}.bcf",
output:
"{prefix}.view.vcf",
log:
"log/{prefix}.view.vcf.log",
params:
extra="",
wrapper:
"v2.2.1/bio/bcftools/view"
rule bcf_view_o_vcf_gz:
input:
"{prefix}.bcf",
output:
"{prefix}.view.vcf.gz",
log:
"log/{prefix}.view.vcf.gz.log",
params:
extra="",
wrapper:
"v2.2.1/bio/bcftools/view"
rule bcf_view_o_bcf:
input:
"{prefix}.bcf",
output:
"{prefix}.view.bcf",
log:
"log/{prefix}.view.bcf.log",
params:
extra="",
wrapper:
"v2.2.1/bio/bcftools/view"
rule bcf_view_o_uncompressed_bcf:
input:
"{prefix}.bcf",
output:
"{prefix}.view.uncompressed.bcf",
log:
"log/{prefix}.view.uncompressed.bcf.log",
params:
uncompressed_bcf=True,
extra="",
wrapper:
"v2.2.1/bio/bcftools/view"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param allows to specify that a BCF output should be uncompressed (ignored otherwise).
- The extra param allows for additional program arguments (not –threads, -o/–output, or -O/–output-type).
Software dependencies¶
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
- Nikos Tsardakas Renhuldt
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bcftools view {bcftools_opts} {extra} {snakemake.input[0]} {log}")
BEDTOOLS¶
For bedtools, the following wrappers are available:
BAMTOBED¶
Conversion utility that converts sequence alignments in BAM format into BED, BED12, and/or BEDPE records.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/bamtobed.html
Example¶
This wrapper can be used in the following way:
rule bamtobed:
input:
"{sample}.bam",
output:
"{sample}.bed",
log:
"logs/bamtobed/{sample}.log",
params:
extra="-bedpe", # optional parameters
wrapper:
"v2.2.1/bio/bedtools/bamtobed"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- This program/wrapper does not handle multi-threading.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
- BAM file, this must be the first file in the input file list
Output:
- BED file, this must be the first file in the output file list
Params¶
extra
: additional program arguments (except -i)
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"(bamToBed"
" {extra}"
" -i {snakemake.input[0]}"
" > {snakemake.output[0]}"
") {log}"
)
COMPLEMENTBED¶
Maps all regions of the genome which are not covered by the input.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/complement.html
Example¶
This wrapper can be used in the following way:
rule bedtools_complement_bed:
input:
in_file="a.bed",
genome="dummy.genome"
output:
"results/bed-complement/a.complement.bed"
params:
## Add optional parameters
extra="-L"
log:
"logs/a.complement.bed.log"
wrapper:
"v2.2.1/bio/bedtools/complement"
rule bedtools_complement_vcf:
input:
in_file="a.vcf",
genome="dummy.genome"
output:
"results/vcf-complement/a.complement.vcf"
params:
## Add optional parameters
extra="-L"
log:
"logs/a.complement.vcf.log"
wrapper:
"v2.2.1/bio/bedtools/complement"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- This program/wrapper does not handle multi-threading.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
in_file
: interval files (BED/GFF/VCF)genome
: genome file
Output:
- complemented BED/GFF/VCF file
Params¶
extra
: additional program arguments (except -i and -g)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(bedtools complement"
" {extra}"
" -i {snakemake.input.in_file}"
" -g {snakemake.input.genome}"
" > {snakemake.output[0]})"
" {log}"
)
COVERAGEBED¶
Returns the depth and breadth of coverage of features from B on the intervals in A.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
Example¶
This wrapper can be used in the following way:
rule coverageBed:
input:
a="bed/{sample}.bed",
b="mapped/{sample}.bam"
output:
"stats/{sample}.cov"
log:
"logs/coveragebed/{sample}.log"
params:
extra="" # optional parameters
threads: 8
wrapper:
"v2.2.1/bio/bedtools/coveragebed"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- This program/wrapper does not handle multi-threading.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
a
: Path to the feature file (BAM/BED/GFF/VCF). This file is compared to b (see below)b
: Path or list of paths to file(s) (BAM/BED/GFF/VCF).
Output:
- Path to the coverage file.
Params¶
extra
: additional program arguments (except -a and -b)
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
input_a = snakemake.input.a
input_b = snakemake.input.b
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one file: " + str(output_file) + "!")
shell(
"coverageBed"
" -a {input_a}"
" -b {input_b}"
" {extra_params}"
" > {output_file}"
" {log}"
)
GENOMECOVERAGEBED¶
Computes the coverage of a feature file as histograms, per-base reports or BEDGRAPH summaries among a given genome.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html
Example¶
This wrapper can be used in the following way:
rule genomecov_bam:
input:
"bam_input/{sample}.sorted.bam"
output:
"genomecov_bam/{sample}.genomecov"
log:
"logs/genomecov_bam/{sample}.log"
params:
"-bg" # optional parameters
wrapper:
"v2.2.1/bio/bedtools/genomecov"
rule genomecov_bed:
input:
# for genome file format please see:
# https://bedtools.readthedocs.io/en/latest/content/general-usage.html#genome-file-format
bed="bed_input/{sample}.sorted.bed",
ref="bed_input/genome_file"
output:
"genomecov_bed/{sample}.genomecov"
log:
"logs/genomecov_bed/{sample}.log"
params:
"-bg" # optional parameters
wrapper:
"v2.2.1/bio/bedtools/genomecov"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
- BED/GFF/VCF files grouped by chromosome and genome file (genome file format) OR
- BAM files sorted by position.
ref
: Path to genome file, this must come after the other files
Output:
- genomecov (.genomecov)
Params¶
extra
: additional program arguments
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
genome = ""
input_file = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
input_file = "-ibam " + snakemake.input[0]
if len(snakemake.input) > 1:
if (os.path.splitext(snakemake.input[0])[-1]) == ".bed":
input_file = "-i " + snakemake.input.get("bed")
genome = "-g " + snakemake.input.get("ref")
shell(
"(genomeCoverageBed"
" {snakemake.params}"
" {input_file}"
" {genome}"
" > {snakemake.output[0]}) {log}"
)
INTERSECTBED¶
Intersect BED/BAM/VCF files with bedtools.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
left="A.bed",
right="B.bed"
output:
"A_B.intersected.bed"
params:
## Add optional parameters
extra="-wa -wb" ## In this example, we want to write original entries in A and B for each overlap.
log:
"logs/intersect/A_B.log"
wrapper:
"v2.2.1/bio/bedtools/intersect"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
left
: Path to the left region file. Each feature in left region file is compared to right region(s) file(s) in search of overlaps. (BAM/BED/GFF/VCF formatted)right
: Path or list of paths to region(s) file(s) (BAM/BED/GFF/VCF formatted)
Output:
- Path to the intersection.
Params¶
extra
: additional program arguments (except -a (left) and -b (right))
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(bedtools intersect"
" {extra}"
" -a {snakemake.input.left}"
" -b {snakemake.input.right}"
" > {snakemake.output})"
" {log}"
)
MERGEBED¶
Merge entries in one or multiple BED/BAM/VCF/GFF files with bedtools.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/merge.html
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
# Multiple bed-files can be added as list
"A.bed"
output:
"A.merged.bed"
params:
## Add optional parameters
extra="-c 1 -o count" ## In this example, we want to count how many input lines we merged per output line
log:
"logs/merge/A.log"
wrapper:
"v2.2.1/bio/bedtools/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Warning: If multiple files are provided in input, then this wrapper requires exactly 3 threads. Else, it requires exactly one thread.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
- Path or list of paths to interval(s) file(s) (BED/GFF/VCF/BAM)
Output:
- Path to merged interval(s) file.
Params¶
extra
: additional program arguments (except for -i)
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster, Felix Mölder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if len(snakemake.input) > 1:
if all(f.endswith(".gz") for f in snakemake.input):
cat = "zcat"
elif all(not f.endswith(".gz") for f in snakemake.input):
cat = "cat"
else:
raise ValueError("Input files must be all compressed or uncompressed.")
shell(
"({cat} {snakemake.input} | "
"sort -k1,1 -k2,2n | "
"bedtools merge {extra} "
"-i stdin > {snakemake.output}) "
" {log}"
)
else:
shell(
"( bedtools merge"
" {extra}"
" -i {snakemake.input}"
" > {snakemake.output})"
" {log}"
)
SLOPBED¶
Increase the size of each feature in a BED/BAM/VCF by a specified factor.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/slop.html
Example¶
This wrapper can be used in the following way:
rule bedtools_merge:
input:
"A.bed"
output:
"A.slop.bed"
params:
## Genome file, tab-seperated file defining the length of every contig
genome="genome.txt",
## Add optional parameters
extra = "-b 10" ## in this example, we want to increase the feature by 10 bases to both sides
log:
"logs/slop/A.log"
wrapper:
"v2.2.1/bio/bedtools/slop"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Extra parameters requires either -b or (-l and -r)
- This program/wrapper does not handle multi-threading.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
- Path to an interval file (BED/GFF/VCF)
Output:
- Path to the expanded intervals file
Params¶
genome
: Path to a genome fileextra
: additional program arguments (except for -i or -g)
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(bedtools slop"
" {extra}"
" -i {snakemake.input[0]}"
" -g {snakemake.params.genome}"
" > {snakemake.output})"
" {log}"
)
SORTBED¶
Sorts bed, vcf or gff files by chromosome and other criteria.
URL: https://bedtools.readthedocs.io/en/latest/content/tools/sort.html
Example¶
This wrapper can be used in the following way:
rule bedtools_sort:
input:
in_file="a.bed"
output:
"results/bed-sorted/a.sorted.bed"
params:
## Add optional parameters for sorting order
extra="-sizeA"
log:
"logs/a.sorted.bed.log"
wrapper:
"v2.2.1/bio/bedtools/sort"
rule bedtools_sort_bed:
input:
in_file="a.bed",
# an optional sort file can be set as genomefile by the variable genome or
# as fasta index file by the variable faidx
genome="dummy.genome"
output:
"results/bed-sorted/a.sorted_by_file.bed"
params:
## Add optional parameters
extra=""
log:
"logs/a.sorted.bed.log"
wrapper:
"v2.2.1/bio/bedtools/sort"
rule bedtools_sort_vcf:
input:
in_file="a.vcf",
# an optional sort file can be set either as genomefile by the variable genome or
# as fasta index file by the variable faidx
faidx="genome.fasta.fai"
output:
"results/vcf-sorted/a.sorted_by_file.vcf"
params:
## Add optional parameters
extra=""
log:
"logs/a.sorted.vcf.log"
wrapper:
"v2.2.1/bio/bedtools/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- This program/wrapper does not handle multi-threading.
Software dependencies¶
bedtools=2.31.0
Input/Output¶
Input:
in_file
: Path to interval file (BED/GFF/VCF formatted)genome
: optional a tab separating file that determines the sorting order and contains the chromosome names in the first columnfaidx
: optional a fasta index file
Output:
- Path to the sorted interval file (BED/GFF/VCF formatted)
Params¶
extra
: additional program arguments (except for -i, -g, or –faidx)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
genome = snakemake.input.get("genome", "")
faidx = snakemake.input.get("faidx", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if genome:
extra += " -g {}".format(genome)
elif faidx:
extra += " -faidx {}".format(faidx)
shell(
"(bedtools sort"
" {extra}"
" -i {snakemake.input.in_file}"
" > {snakemake.output[0]})"
" {log}"
)
BELLEROPHON¶
Filter mapped reads where the mapping spans a junction, retaining the 5-prime read.
URL: https://github.com/davebx/bellerophon/
Example¶
This wrapper can be used in the following way:
rule bellerophon_sam:
input:
fwd="test_1500_forward.bam",
rev="test_1500_reverse.bam",
output:
bam="out.sam",
log:
"logs/bellerophon.log",
params:
extra="--quality 20",
sorting="none", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="--no-PG", # optional: extra arguments for samtools/picard
threads: 2
wrapper:
"v2.2.1/bio/bellerophon"
rule bellerophon_bam:
input:
fwd="test_1500_forward.bam",
rev="test_1500_reverse.bam",
output:
bam="out.bam",
log:
"logs/bellerophon.log",
params:
extra="--quality 20",
sorting="coordinate", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="--no-PG", # optional: extra arguments for samtools/picard
threads: 2
wrapper:
"v2.2.1/bio/bellerophon"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The sort param allows to enable sorting (‘none’, ‘queryname’ or ‘coordinate’).
- The sort_extra allows for extra arguments for samtools.
- The extra param allows for additional program arguments.
Software dependencies¶
bellerophon=1.0
samtools=1.17
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- Forward reads (BAM format)
- Reverse reads (BAM format)
Output:
- SAM/BAM/CRAM file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(snakemake, parse_output=False)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")
pipe_cmd = ""
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to output format using samtools view.
pipe_cmd = f"| samtools view -h {sort_extra} {samtools_opts}"
elif sort in ["coordinate", "queryname"]:
# Add name flag if needed.
if sort == "queryname":
sort_extra += " -n"
# Sort alignments.
pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"
else:
raise ValueError(f"Unexpected value for params.sort: {sort}")
shell(
"(bellerophon"
" --threads {snakemake.threads}"
" --forward {snakemake.input.fwd}"
" --reverse {snakemake.input.rev}"
" {extra}"
" --output /dev/stdout"
" {pipe_cmd}"
" > {snakemake.output[0]}"
") {log}"
)
BENCHMARK¶
For benchmark, the following wrappers are available:
CHM-EVAL¶
Evaluate given VCF file with chm-eval for benchmarking variant calling.
URL: https://github.com/lh3/CHM-eval
Example¶
This wrapper can be used in the following way:
rule chm_eval:
input:
kit="resources/chm-eval-kit",
vcf="{sample}.vcf",
output:
summary="chm-eval/{sample}.summary", # summary statistics
bed="chm-eval/{sample}.err.bed.gz", # bed file with errors
params:
extra="",
build="38",
log:
"logs/chm-eval/{sample}.log",
wrapper:
"v2.2.1/bio/benchmark/chm-eval"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
perl=5.32.1
Input/Output¶
Input:
kit
: Path to annotation directoryvcf
: Path to VCF to evaluate (can be gzipped)
Output:
summary
: Path to statistics and evaluationsbed
: Path to list of errors (BED formatted)
Params¶
build
: Genome build. Either 37 or 38.extra
: Optional parameters besides -g
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
kit = snakemake.input.kit
vcf = snakemake.input.vcf
build = snakemake.params.build
extra = snakemake.params.get("extra", "")
if not snakemake.output[0].endswith(".summary"):
raise ValueError("Output file must end with .summary")
out = snakemake.output[0][:-8]
shell("({kit}/run-eval -g {build} -o {out} {extra} {vcf} | sh) {log}")
CHM-EVAL-KIT¶
Download CHM-eval kit for benchmarking variant calling.
URL: https://github.com/lh3/CHM-eval
Example¶
This wrapper can be used in the following way:
rule chm_eval_kit:
output:
directory("resources/chm-eval-kit"),
params:
# Tag and version must match, see https://github.com/lh3/CHM-eval/releases.
tag="v0.5",
version="20180222",
log:
"logs/chm-eval-kit.log",
cache: "omit-software"
wrapper:
"v2.2.1/bio/benchmark/chm-eval-kit"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
curl
Params¶
tag
: Release tag, see git official repositoryversion
: Release version, see git official repository
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = (
"https://github.com/lh3/CHM-eval/releases/"
"download/{tag}/CHM-evalkit-{version}.tar"
).format(version=snakemake.params.version, tag=snakemake.params.tag)
os.makedirs(snakemake.output[0])
shell(
"(curl -L {url} | tar --strip-components 1 -C {snakemake.output[0]} -xf - &&"
"(cd {snakemake.output[0]}; chmod +x htsbox run-eval k8)) {log}"
)
CHM-EVAL-SAMPLE¶
Download CHM-eval sample for benchmarking variant calling.
URL: https://github.com/lh3/CHM-eval
Example¶
This wrapper can be used in the following way:
rule chm_eval_sample:
output:
bam="resources/chm-eval-sample.bam",
bai="resources/chm-eval-sample.bam.bai"
params:
# Optionally only grab the first 100 records.
# This is for testing, remove next line to grab all records.
first_n=100
log:
"logs/chm-eval-sample.log"
wrapper:
"v2.2.1/bio/benchmark/chm-eval-sample"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools=1.17
curl
Params¶
first_n
: Optional parameter for grab only the first n elements.
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
url = "ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR134/ERR1341796/CHM1_CHM13_2.bam"
pipefail = ""
fmt = "-b"
prefix = snakemake.params.get("first_n", "")
if prefix:
prefix = "| head -n {} | samtools view -h -b".format(prefix)
fmt = "-h"
pipefail = "set +o pipefail"
shell(
"""
{pipefail}
{{
samtools view {fmt} {url} {prefix} > {snakemake.output.bam}
samtools index {snakemake.output.bam}
}} {log}
"""
)
else:
shell(
"""
{{
curl -L {url} > {snakemake.output.bam}
samtools index {snakemake.output.bam}
}} {log}
"""
)
BGZIP¶
Block compression/decompression utility
URL: https://github.com/samtools/htslib
Example¶
This wrapper can be used in the following way:
rule bgzip:
input:
"{prefix}.vcf",
output:
"{prefix}.vcf.gz",
params:
extra="", # optional
threads: 1
log:
"logs/bgzip/{prefix}.log",
wrapper:
"v2.2.1/bio/bgzip"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
htslib=1.17
Input/Output¶
Input:
- file to be compressed or decompressed
Output:
- compressed or decompressed output
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"""
(bgzip -c {extra} --threads {snakemake.threads} \
{snakemake.input} > {snakemake.output}) {log}
"""
)
BIOBAMBAM2¶
For biobambam2, the following wrappers are available:
BIOBAMBAM2 BAMSORMADUP¶
Mark PCR and optical duplicates, followed with sorting, with BioBamBam2 tools.
URL: https://gitlab.com/german.tischler/biobambam2
Example¶
This wrapper can be used in the following way:
rule mark_duplicates_bamsormadup:
input:
"mapped/{sample}.bam",
output:
bam="dedup/{sample}.bam",
index="dedup/{sample}.bai",
metrics="dedup/{sample}.metrics.txt",
log:
"logs/{sample}.log",
params:
extra="SO=coordinate",
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/biobambam2/bamsormadup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
biobambam=2.0.183
Input/Output¶
Input:
- Path to SAM/BAM/CRAM file, this must be the first file in the input file list.
- Path to reference (for CRAM output)
Output:
- Path to SAM/BAM/CRAM file with marked duplicates. This must be the fist output file in the output file list.
index
: Path to BAM index file (optional)metrics
: Path to metrics file (optional)
Params¶
extra
: additional program arguments (not inputformat, outputformat or tmpfile).
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import os
import random
import tempfile
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
# File formats
in_name, in_format = os.path.splitext(snakemake.input[0])
in_format = in_format.lstrip(".")
out_name, out_format = os.path.splitext(snakemake.output[0])
out_format = out_format.lstrip(".")
index = snakemake.output.get("index", "")
if index:
index = f"indexfilename={index}"
metrics = snakemake.output.get("metrics", "")
if metrics:
metrics = f"M={metrics}"
with tempfile.TemporaryDirectory() as tmpdir:
# This folder must not exist; it is created by BamSorMaDup
tmpdir_bamsormadup = Path(tmpdir) / "bamsormadup_{:06d}".format(
random.randrange(10**6)
)
shell(
"bamsormadup threads={snakemake.threads}"
" inputformat={in_format}"
" tmpfile={tmpdir_bamsormadup}"
" outputformat={out_format}"
" {index} {metrics} {extra}"
" < {snakemake.input[0]} > {snakemake.output[0]}"
" {log}"
)
BISMARK¶
For bismark, the following wrappers are available:
BAM2NUC¶
Calculate mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence composition (see https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc).
Example¶
This wrapper can be used in the following way:
# Nucleotide stats for genome is required for further stats for BAM file
rule bam2nuc_for_genome:
input:
genome_fa="indexes/{genome}/{genome}.fa.gz"
output:
"indexes/{genome}/genomic_nucleotide_frequencies.txt"
log:
"logs/indexes/{genome}/genomic_nucleotide_frequencies.txt.log"
wrapper:
"v2.2.1/bio/bismark/bam2nuc"
# Nucleotide stats for BAM file
rule bam2nuc_for_bam:
input:
genome_fa="indexes/{genome}/{genome}.fa.gz",
bam="bams/{sample}_{genome}.bam"
output:
report="bams/{sample}_{genome}.nucleotide_stats.txt"
log:
"logs/{sample}_{genome}.nucleotide_stats.txt.log"
wrapper:
"v2.2.1/bio/bismark/bam2nuc"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.1
samtools=1.17
Input/Output¶
Input:
genome_fa
: Path to genome in FastA format (e.g. *.fa, *.fasta, *.fa.gz, *.fasta.gz). All genomes FastA from it’s parent folder will be takenbam
: Optional BAM or CRAM file (or multiple space separated files). If bam arg isn’t provided, option –genomic_composition_only will be used to generate genomic composition table genomic_nucleotide_frequencies.txt.
Output:
- Genome nucleotide frequencies genomic_nucleotide_frequencies.txt will be generated in ‘genome_fa’ directory, optional output.
report
: Report file (or space separated files), pattern ‘{bam_file_name}.nucleotide_stats.txt’.
Params¶
extra
: Any additional args
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for bam2nuc tool that calculates mono- and di-nucleotide coverage of the reads and compares them with average genomic sequence
composition."""
# https://github.com/FelixKrueger/Bismark/blob/master/bam2nuc
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
cmdline_args = ["bam2nuc {extra}"]
genome_fa = snakemake.input.get("genome_fa", None)
if not genome_fa:
raise ValueError("bismark/bam2nuc: Error 'genome_fa' input not specified.")
genome_folder = os.path.dirname(genome_fa)
cmdline_args.append("--genome_folder {genome_folder:q}")
bam = snakemake.input.get("bam", None)
if bam:
cmdline_args.append("{bam}")
bams = bam if isinstance(bam, list) else [bam]
report = snakemake.output.get("report", None)
if not report:
raise ValueError("bismark/bam2nuc: Error 'report' output isn't specified.")
reports = report if isinstance(report, list) else [report]
if len(reports) != len(bams):
raise ValueError(
"bismark/bam2nuc: Error number of paths in output:report ({} files)"
" should be same as in input:bam ({} files).".format(
len(reports), len(bams)
)
)
output_dir = os.path.dirname(reports[0])
if any(output_dir != os.path.dirname(p) for p in reports):
raise ValueError(
"bismark/bam2nuc: Error all reports should be in same directory:"
" {}".format(output_dir)
)
if output_dir:
cmdline_args.append("--dir {output_dir:q}")
else:
cmdline_args.append("--genomic_composition_only")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
# Move outputs into proper position.
if bam:
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
expected_2_actual_paths = []
for bam_path, report_path in zip(bams, reports):
bam_name = os.path.basename(bam_path)
bam_basename = os.path.splitext(bam_name)[0]
expected_2_actual_paths.append(
(
report_path,
os.path.join(
output_dir, "{}.nucleotide_stats.txt".format(bam_basename)
),
)
)
for exp_path, actual_path in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK¶
Align BS-Seq reads using Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark).
Example¶
This wrapper can be used in the following way:
# Example: Pair-ended reads
rule bismark_pe:
input:
fq_1="reads/{sample}.1.fastq",
fq_2="reads/{sample}.2.fastq",
genome="indexes/{genome}/{genome}.fa",
bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
output:
bam="bams/{sample}_{genome}_pe.bam",
report="bams/{sample}_{genome}_PE_report.txt",
nucleotide_stats="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
bam_unmapped_1="bams/{sample}_{genome}_unmapped_reads_1.fq.gz",
bam_unmapped_2="bams/{sample}_{genome}_unmapped_reads_2.fq.gz",
ambiguous_1="bams/{sample}_{genome}_ambiguous_reads_1.fq.gz",
ambiguous_2="bams/{sample}_{genome}_ambiguous_reads_2.fq.gz"
log:
"logs/bams/{sample}_{genome}.log"
params:
# optional params string, e.g: -L32 -N0 -X400 --gzip
# Useful options to tune:
# (for bowtie2)
# -N: The maximum number of mismatches permitted in the "seed", i.e. the first L base pairs
# of the read (deafault: 1)
# -L: The "seed length" (deafault: 28)
# -I: The minimum insert size for valid paired-end alignments. ~ min fragment size filter (for
# PE reads)
# -X: The maximum insert size for valid paired-end alignments. ~ max fragment size filter (for
# PE reads)
# --gzip: Gzip intermediate fastq files
# --ambiguous --unmapped
# -p: bowtie2 parallel execution
# --multicore: bismark parallel execution
# --temp_dir: tmp dir for intermediate files instead of output directory
extra=' --ambiguous --unmapped --nucleotide_coverage',
basename='{sample}_{genome}'
wrapper:
"v2.2.1/bio/bismark/bismark"
# Example: Single-ended reads
rule bismark_se:
input:
fq="reads/{sample}.fq.gz",
genome="indexes/{genome}/{genome}.fa",
bismark_indexes_dir="indexes/{genome}/Bisulfite_Genome",
genomic_freq="indexes/{genome}/genomic_nucleotide_frequencies.txt"
output:
bam="bams/{sample}_{genome}.bam",
report="bams/{sample}_{genome}_SE_report.txt",
nucleotide_stats="bams/{sample}_{genome}.nucleotide_stats.txt",
bam_unmapped="bams/{sample}_{genome}_unmapped_reads.fq.gz",
ambiguous="bams/{sample}_{genome}_ambiguous_reads.fq.gz"
log:
"logs/bams/{sample}_{genome}.log",
params:
# optional params string
extra=' --ambiguous --unmapped --nucleotide_coverage',
basename='{sample}_{genome}'
wrapper:
"v2.2.1/bio/bismark/bismark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.1
samtools=1.17
Input/Output¶
Input:
- In SE mode one reads file with keay ‘fq=…’
- In PE mode two reads files with keys ‘fq_1=…’, ‘fq_2=…’
bismark_indexes_dir
: The path to the folder Bisulfite_Genome created by the Bismark_Genome_Preparation script, e.g. ‘indexes/hg19/Bisulfite_Genome’
Output:
bam
: Bam file. Output file will be renamed if differs from default NAME_pe.bam or NAME_se.bamreport
: Aligning report file. Output file will be renamed if differs from default NAME_PE_report.txt or NAME_SE_report.txtnucleotide_stats
: Optional nucleotides report file. Output file will be renamed if differs from default NAME_pe.nucleotide_stats.txt or NAME_se.nucleotide_stats.txt
Params¶
basename
: File base nameextra
: Any additional args
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for aligning methylation BS-Seq data using Bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from tempfile import TemporaryDirectory
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = os.path.basename(file_path)
split_ind = 2 if base.endswith(".gz") else 1
base = ".".join(base.split(".")[:-split_ind])
return base
extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark {extra} --bowtie2"]
outdir = os.path.dirname(snakemake.output.bam)
if outdir:
cmdline_args.append("--output_dir {outdir}")
genome_indexes_dir = os.path.dirname(snakemake.input.bismark_indexes_dir)
cmdline_args.append("{genome_indexes_dir}")
if not snakemake.output.get("bam", None):
raise ValueError("bismark/bismark: Error 'bam' output file isn't specified.")
if not snakemake.output.get("report", None):
raise ValueError("bismark/bismark: Error 'report' output file isn't specified.")
# basename
if snakemake.params.get("basename", None):
cmdline_args.append("--basename {snakemake.params.basename:q}")
basename = snakemake.params.basename
else:
basename = None
# reads input
single_end_mode = snakemake.input.get("fq", None)
if single_end_mode:
# for SE data, you only have to specify read1 input by -i or --in1, and
# specify read1 output by -o or --out1.
cmdline_args.append("--se {snakemake.input.fq:q}")
mode_prefix = "se"
if basename is None:
basename = basename_without_ext(snakemake.input.fq)
else:
# for PE data, you should also specify read2 input by -I or --in2, and
# specify read2 output by -O or --out2.
cmdline_args.append("-1 {snakemake.input.fq_1:q} -2 {snakemake.input.fq_2:q}")
mode_prefix = "pe"
if basename is None:
# default basename
basename = basename_without_ext(snakemake.input.fq_1) + "_bismark_bt2"
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
# Move outputs into proper position.
expected_2_actual_paths = [
(
snakemake.output.bam,
os.path.join(
outdir, "{}{}.bam".format(basename, "" if single_end_mode else "_pe")
),
),
(
snakemake.output.report,
os.path.join(
outdir,
"{}_{}_report.txt".format(basename, "SE" if single_end_mode else "PE"),
),
),
(
snakemake.output.get("nucleotide_stats", None),
os.path.join(
outdir,
"{}{}.nucleotide_stats.txt".format(
basename, "" if single_end_mode else "_pe"
),
),
),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for exp_path, actual_path in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BISMARK2BEDGRAPH¶
Generate bedGraph and coverage files from positional methylation files created by bismark_methylation_extractor (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph).
Example¶
This wrapper can be used in the following way:
# Example for CHG+CHH summary coverage:
rule bismark2bedGraph_noncpg:
input:
"meth/CHG_context_{sample}.txt.gz",
"meth/CHH_context_{sample}.txt.gz"
output:
bedGraph="meth_non_cpg/{sample}_non_cpg.bedGraph.gz",
cov="meth_non_cpg/{sample}_non_cpg.bismark.cov.gz"
log:
"logs/meth_non_cpg/{sample}_non_cpg.log"
params:
extra="--CX"
wrapper:
"v2.2.1/bio/bismark/bismark2bedGraph"
# Example for CpG only coverage
rule bismark2bedGraph_cpg:
input:
"meth/CpG_context_{sample}.txt.gz"
output:
bedGraph="meth_cpg/{sample}_CpG.bedGraph.gz",
cov="meth_cpg/{sample}_CpG.bismark.cov.gz"
log:
"logs/meth_cpg/{sample}_CpG.log"
wrapper:
"v2.2.1/bio/bismark/bismark2bedGraph"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.0
samtools=1.17
Input/Output¶
Input:
- Files generated by bismark_methylation_extractor, e.g. CpG_context*.txt.gz, CHG_context*.txt.gz, CHH_context*.txt.gz. By default only CpG file is required, if ‘–CX’ option is output is build by merged input files.
Output:
bedGraph
: Bismark methylation level track, *.bedGraph.gz (0-based start, 1-based end coordintates, i.e. end offset exclusive)cov
: Optional bismark coverage file *.bismark.cov.gz, file name is calculated by bedGraph name (1-based start and end, i.e. end offset inclusive)
Params¶
extra
: Any additional args, e.g. ‘–CX’, ‘–ample_memory’, ‘ –buffer_size 10G’, etc.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark bismark2bedGraph tool."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2bedGraph
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
bedGraph = snakemake.output.get("bedGraph", "")
if not bedGraph:
raise ValueError("bismark/bismark2bedGraph: Please specify bedGraph output path")
params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark2bedGraph {params_extra}"]
dir_name = os.path.dirname(bedGraph)
if dir_name:
cmdline_args.append("--dir {dir_name}")
fname = os.path.basename(bedGraph)
cmdline_args.append("--output {fname}")
cmdline_args.append("{snakemake.input}")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
BISMARK2REPORT¶
Generate graphical HTML report from Bismark reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2report).
Example¶
This wrapper can be used in the following way:
# Example: Pair-ended reads
rule bismark2report_pe:
input:
alignment_report="bams/{sample}_{genome}_PE_report.txt",
nucleotide_report="bams/{sample}_{genome}_pe.nucleotide_stats.txt",
dedup_report="bams/{sample}_{genome}_pe.deduplication_report.txt",
mbias_report="meth/{sample}_{genome}_pe.deduplicated.M-bias.txt",
splitting_report="meth/{sample}_{genome}_pe.deduplicated_splitting_report.txt"
output:
html="qc/meth/{sample}_{genome}.bismark2report.html",
log:
"logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
params:
skip_optional_reports=True
wrapper:
"v2.2.1/bio/bismark/bismark2report"
# Example: Single-ended reads
rule bismark2report_se:
input:
alignment_report="bams/{sample}_{genome}_SE_report.txt",
nucleotide_report="bams/{sample}_{genome}.nucleotide_stats.txt",
dedup_report="bams/{sample}_{genome}.deduplication_report.txt",
mbias_report="meth/{sample}_{genome}.deduplicated.M-bias.txt",
splitting_report="meth/{sample}_{genome}.deduplicated_splitting_report.txt"
output:
html="qc/meth/{sample}_{genome}.bismark2report.html",
log:
"logs/qc/meth/{sample}_{genome}.bismark2report.html.log",
params:
skip_optional_reports=True
wrapper:
"v2.2.1/bio/bismark/bismark2report"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.0
samtools=1.17
Input/Output¶
Input:
alignment_report
: Alignment report (if not specified bismark will try to find it current directory)nucleotide_report
: Optional Bismark nucleotide coverage report (if not specified bismark will try to find it current directory)dedup_report
: Optional deduplication report (if not specified bismark will try to find it current directory)splitting_report
: Optional Bismark methylation extractor report (if not specified bismark will try to find it current directory)mbias_report
: Optional Bismark methylation extractor report (if not specified bismark will try to find it current directory)
Output:
html
: Output HTML file path, if batch mode isn’t used.html_dir
: Output dir path for HTML reports if batch mode is used
Params¶
skip_optional_reports
: Use ‘true’ of ‘false’ to not look for optional reports not mentioned in input section (passes ‘none’ to bismark2report)extra
: Any additional args
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper to generate graphical HTML report from Bismark reports."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2report
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
def answer2bool(v):
return str(v).lower() in ("yes", "true", "t", "1")
extra = snakemake.params.get("extra", "")
cmds = ["bismark2report {extra}"]
# output
html_file = snakemake.output.get("html", "")
output_dir = snakemake.output.get("html_dir", None)
if output_dir is None:
if html_file:
output_dir = os.path.dirname(html_file)
else:
if html_file:
raise ValueError(
"bismark/bismark2report: Choose one: 'html=...' for a single dir or 'html_dir=...' for batch processing."
)
if output_dir is None:
raise ValueError(
"bismark/bismark2report: Output file or directory not specified. "
"Use 'html=...' for a single dir or 'html_dir=...' for batch "
"processing."
)
if output_dir:
cmds.append("--dir {output_dir:q}")
if html_file:
html_file_name = os.path.basename(html_file)
cmds.append("--output {html_file_name:q}")
# reports
reports = [
"alignment_report",
"dedup_report",
"splitting_report",
"mbias_report",
"nucleotide_report",
]
skip_optional_reports = answer2bool(
snakemake.params.get("skip_optional_reports", False)
)
for report_name in reports:
path = snakemake.input.get(report_name, "")
if path:
locals()[report_name] = path
cmds.append("--{0} {{{1}:q}}".format(report_name, report_name))
elif skip_optional_reports:
cmds.append("--{0} 'none'".format(report_name))
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")
# run shell command:
shell(" ".join(cmds))
BISMARK2SUMMARY¶
Generate summary graphical HTML report from several Bismark text report files reports (see https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary).
Example¶
This wrapper can be used in the following way:
import os
rule bismark2summary:
input:
bam=["bams/a_genome_pe.bam", "bams/b_genome.bam"],
# Bismark `bismark2summary` discovers reports automatically based
# on files available in bam file containing folder
#
# If your per BAM file reports aren't in the same folder
# you will need an additional task which symlinks all reports
# (E.g. your splitting report generated by `bismark_methylation_extractor`
# tool is in `meth` folder, and alignment related reports in `bams` folder)
# These dependencies are here just to ensure that corresponding rules
# has already finished at rule execution time, otherwise some reports
# will be missing.
dependencies=[
"bams/a_genome_PE_report.txt",
"bams/a_genome_pe.deduplication_report.txt",
# for example splitting report is missing for 'a' sample
"bams/b_genome_SE_report.txt",
"bams/b_genome.deduplication_report.txt",
"bams/b_genome.deduplicated_splitting_report.txt"
]
output:
html="qc/{experiment}.bismark2summary.html",
txt="qc/{experiment}.bismark2summary.txt"
log:
"logs/qc/{experiment}.bismark2summary.log"
wrapper:
"v2.2.1/bio/bismark/bismark2summary"
rule bismark2summary_prepare_symlinks:
input:
"meth/b_genome.deduplicated_splitting_report.txt",
output:
temp("bams/b_genome.deduplicated_splitting_report.txt"),
log:
"qc/bismark2summary_prepare_symlinks.symlinks.log"
run:
wd = os.getcwd()
shell("echo 'Making symlinks' > {log}")
for source, target in zip(input, output):
target_dir = os.path.dirname(target)
target_name = os.path.basename(target)
log_path = os.path.join(wd, log[0])
abs_src_path = os.path.abspath(source)
shell("cd {target_dir} && ln -f -s {abs_src_path} {target_name} >> {log_path} 2>&1")
shell("echo 'Done' >> {log}")
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.0
samtools=1.17
Input/Output¶
Input:
bam
: One or several (space separated) BAM file paths (aligned bam files with bismark reports in same folder). Also, it is recommended to add dependencies for all required reports using rules order or specifing them in input section using any other keys. E.g. deduplicaton report could be missing if rule only depends on aligned bam file. If you add dependency on deduplicated bam file bismark2report will fail because it expects input files to be initial aligned files with aligning report in same directory.
Output:
html
: Output HTML report path (e.g. ‘bismark_summary_report.html’).txt
: Output txt table path (e.g. ‘bismark_summary_report.txt’). Should have same as ‘html’ report but with suffix ‘.txt’.
Params¶
extra
: Any additional argstitle
: Optional report custom title.
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper to generate summary graphical HTML report from several Bismark text report files."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark2summary
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
cmds = ["bismark2summary {extra}"]
# basename
bam = snakemake.input.get("bam", None)
if not bam:
raise ValueError(
"bismark/bismark2summary: Please specify aligned BAM file path"
" (one or several) using 'bam=..'"
)
html = snakemake.output.get("html", None)
txt = snakemake.output.get("txt", None)
if not html or not txt:
raise ValueError(
"bismark/bismark2summary: Please specify both 'html=..' and"
" 'txt=..' paths in output section"
)
basename, ext = os.path.splitext(html)
if ext.lower() != ".html":
raise ValueError(
"bismark/bismark2summary: HTML report file should end"
" with suffix '.html' but was {} ({})".format(ext, html)
)
suggested_txt = basename + ".txt"
if suggested_txt != txt:
raise ValueError(
"bismark/bismark2summary: Expected '{}' TXT report, "
"but was: '{}'".format(suggested_txt, txt)
)
cmds.append("--basename {basename:q}")
# title
title = snakemake.params.get("title", None)
if title:
cmds.append("--title {title:q}")
cmds.append("{bam}")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmds.append("{log}")
# run shell command:
shell(" ".join(cmds))
BISMARK_GENOME_PREPARATION¶
Generate indexes for Bismark (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation).
Example¶
This wrapper can be used in the following way:
# For *.fa file
rule bismark_genome_preparation_fa:
input:
"indexes/{genome}/{genome}.fa"
output:
directory("indexes/{genome}/Bisulfite_Genome")
log:
"logs/indexes/{genome}/Bisulfite_Genome.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/bismark/bismark_genome_preparation"
# Fo *.fa.gz file:
rule bismark_genome_preparation_fa_gz:
input:
"indexes/{genome}/{genome}.fa.gz"
output:
directory("indexes/{genome}/Bisulfite_Genome")
log:
"logs/indexes/{genome}/Bisulfite_Genome.log"
params:
extra="" # optional params string
wrapper:
"v2.2.1/bio/bismark/bismark_genome_preparation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.1
samtools=1.17
Input/Output¶
Input:
- path to genome *.fa (or *.fasta, *.fa.gz, *.fasta.gz) file
Output:
- No ouptut, generates bismark indexes in parent directory of input file
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark indexes preparing using bismark_genome_preparation."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_genome_preparation
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
input_dir = path.dirname(snakemake.input[0])
params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("bismark_genome_preparation --verbose --bowtie2 {params_extra} {input_dir} {log}")
BISMARK_METHYLATION_EXTRACTOR¶
Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).
Example¶
This wrapper can be used in the following way:
rule bismark_methylation_extractor:
input: "bams/{sample}.bam"
output:
mbias_r1="qc/meth/{sample}.M-bias_R1.png",
# Only for PE BAMS:
# mbias_r2="qc/meth/{sample}.M-bias_R2.png",
mbias_report="meth/{sample}.M-bias.txt",
splitting_report="meth/{sample}_splitting_report.txt",
# 1-based start, 1-based end ('inclusive') methylation info: % and counts
methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
# BedGraph with methylation percentage: 0-based start, end exclusive
methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",
# Primary output files: methylation status at each read cytosine position: (extremely large)
read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
# * You could merge CHG, CHH using: --merge_non_CpG
read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
log:
"logs/meth/{sample}.log"
params:
output_dir="meth", # optional output dir
extra="--gzip --comprehensive --bedGraph" # optional params string
wrapper:
"v2.2.1/bio/bismark/bismark_methylation_extractor"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bismark=0.24.1
samtools=1.17
perl-gdgraph=1.54
Input/Output¶
Input:
- Input BAM file aligned by Bismark
Output:
- Depends on bismark options passed to params.extra, optional for this wrapper
mbias_report
: M-bias report, *.M-bias.txt (if key is provided, the out file will be renamed to this name)mbias_r1
: M-Bias plot for R1, *.M-bias_R1.png (if key is provided, the out file will be renamed to this name)mbias_r2
: M-Bias plot for R2, *.M-bias_R2.png (if key is provided, the out file will be renamed to this name)splitting_report
: Splitting report, *_splitting_report.txt (if key is provided, the out file will be renamed to this name)methylome_CpG_cov
: Bismark coverage file for CpG context, *.bismark.cov.gz (if key is provided, the out file will be renamed to this name)methylome_CpG_mlevel_bedGraph
: Bismark methylation level track, *.bedGraph.gzread_base_meth_state_cpg
: Per read CpG base methylation info, CpG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)read_base_meth_state_chg
: Per read CpG base methylation info, CHG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)read_base_meth_state_chh
: Per read CpG base methylation info, CHH_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
Params¶
output_dir
: Output directory (current dir is used if not specified)ignore
: Number of bases to trim at 5’ end in R1 (see bismark_methylation_extractor documentation), optional argumentignore_3prime
: Number of bases to trim at 3’ end in R1 (see bismark_methylation_extractor documentation), optional argumentignore_r2
: Number of bases to trim at 5’ end in R2 (see bismark_methylation_extractor documentation), optional argumentignore_3prime_r2
: Number of bases to trim at 3’ end in R2 (see bismark_methylation_extractor documentation), optional argumentextra
: Any additional args
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]
# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
cmdline_args.append("-o {output_dir:q}")
# trimming options
trimming_options = [
"ignore", # meth_bias_r1_5end
"ignore_3prime", # meth_bias_r1_3end
"ignore_r2", # meth_bias_r2_5end
"ignore_3prime_r2", # meth_bias_r2_3end
]
for key in trimming_options:
value = snakemake.params.get(key, None)
if value:
cmdline_args.append("--{} {}".format(key, value))
# Input
cmdline_args.append("{snakemake.input}")
# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")
# run
shell(" ".join(cmdline_args))
key2prefix_suffix = [
("mbias_report", ("", ".M-bias.txt")),
("mbias_r1", ("", ".M-bias_R1.png")),
("mbias_r2", ("", ".M-bias_R2.png")),
("splitting_report", ("", "_splitting_report.txt")),
("methylome_CpG_cov", ("", ".bismark.cov.gz")),
("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for key, (prefix, suffix) in key2prefix_suffix:
exp_path = snakemake.output.get(key, None)
if exp_path:
if len(snakemake.input) != 1:
raise ValueError(
"bismark/bismark_methylation_extractor: Error: only one BAM file is"
" expected in input, but was <{}>".format(snakemake.input)
)
bam_file = snakemake.input[0]
bam_name = os.path.basename(bam_file)
bam_wo_ext = os.path.splitext(bam_name)[0]
actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
if exp_path != actual_path:
shell("mv {actual_path:q} {exp_path:q} {log_append}")
DEDUPLICATE_BISMARK¶
Deduplicate Bismark Bam Files
URL: https://github.com/FelixKrueger/Bismark/
Example¶
This wrapper can be used in the following way:
rule deduplicate_bismark:
input: "bams/{sample}.bam"
output:
bam="bams/{sample}.deduplicated.bam",
report="bams/{sample}.deduplication_report.txt",
log:
"logs/bams/{sample}.deduplicated.log",
params:
extra="" # optional params string
wrapper:
"v2.2.1/bio/bismark/deduplicate_bismark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bismark=0.24.1
bowtie2=2.5.1
samtools=1.17
Input/Output¶
Input:
- path to one or multiple *.bam files aligned by Bismark, if multiple passed ‘–multiple’ argument will be added automatically.
Output:
bam
: Result bam file path. The file will be renamed if differs from NAME.deduplicated.bam for given ‘NAME.bam’ input.report
: Result report path. The file will be renamed if differs from NAME.deduplication_report.txt for given ‘NAME.bam’ input.
Params¶
extra
: Additional deduplicate_bismark args
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for Bismark aligned reads deduplication using deduplicate_bismark."""
# https://github.com/FelixKrueger/Bismark/blob/master/deduplicate_bismark
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
bam_path = snakemake.output.get("bam", None)
report_path = snakemake.output.get("report", None)
if not bam_path or not report_path:
raise ValueError(
"bismark/deduplicate_bismark: Please specify both 'bam=..' and 'report=..' paths in output section"
)
output_dir = os.path.dirname(bam_path)
if output_dir != os.path.dirname(report_path):
raise ValueError(
"bismark/deduplicate_bismark: BAM and Report files expected to have the same parent directory"
" but was {} and {}".format(bam_path, report_path)
)
arg_output_dir = "--output_dir '{}'".format(output_dir) if output_dir else ""
arg_multiple = "--multiple" if len(snakemake.input) > 1 else ""
params_extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
"deduplicate_bismark {params_extra} --bam {arg_multiple}"
" {arg_output_dir} {snakemake.input} {log}"
)
# Move outputs into proper position.
fst_input_filename = os.path.basename(snakemake.input[0])
fst_input_basename = os.path.splitext(fst_input_filename)[0]
prefix = os.path.join(output_dir, fst_input_basename)
deduplicated_bam_actual_name = prefix + ".deduplicated.bam"
if arg_multiple:
# bismark does it exactly like this:
deduplicated_bam_actual_name = deduplicated_bam_actual_name.replace(
"deduplicated", "multiple.deduplicated", 1
)
expected_2_actual_paths = [
(bam_path, deduplicated_bam_actual_name),
(
report_path,
prefix + (".multiple" if arg_multiple else "") + ".deduplication_report.txt",
),
]
for exp_path, actual_path in expected_2_actual_paths:
if exp_path and (exp_path != actual_path):
shell("mv {actual_path:q} {exp_path:q} {log_append}")
BLAST¶
For blast, the following wrappers are available:
BLAST BLASTN¶
Blastn
performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.
Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.
Alignment view options Formatting
output option
Format
specifiers
Pairwise 0 Query-anchored showing identities 1 Query-anchored no identities 2 Flat query-anchored showing identities 3 Flat query-anchored no identities 4 BLAST XML 5 Tabular 6 available Tabular with comment lines 7 available Seqalign (Text ASN.1) 8 Seqalign (Binary ASN.1) 9 Comma-separated values 10 available BLAST archive (ASN.1) 11 Seqalign (JSON) 12 Multiple-file BLAST JSON 13 Multiple-file BLAST XML2 14 Single-file BLAST JSON 15 Single-file BLAST XML2 16 Sequence Alignment/Map (SAM) 17 Organism Report 18
Specifiers for formatting option 6,7 and 10:
Format
specifiers
qseqid Query Seq-id qgi Query GI qacc Query accesion qaccver Query accesion.version qlen Query sequence length sseqid Subject Seq-id sallseqid All subject Seq-id(s), separated by a ‘;’ sgi Subject GI sallgi All subject GIs sacc Subject accession saccver Subject accession.version sallacc All subject accessions slen Subject sequence length qstart Start of alignment in query qend End of alignment in query sstart Start of alignment in subject send End of alignment in subject qseq Aligned part of query sequence sseq Aligned part of subject sequence evalue Expect value bitscore Bit score score Raw score length Alignment length pident Percentage of identical matches nident Number of identical matches mismatch Number of mismatches positive Number of positive-scoring matches gapopen Number of gap openings gaps Total number of gaps ppos Percentage of positive-scoring matches frames Query and subject frames separated by a ‘/’ qframe Query frame sframe Subject frame btop Blast traceback operations (BTOP) staxid Subject Taxonomy ID ssciname Subject Scientific Name scomname Subject Common Name sblastname Subject Blast Name sskingdom Subject Super Kingdom staxids unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order) sscinames unique Subject Scientific Name(s), separated by a ‘;’ scomnames unique Subject Common Name(s), separated by a ‘;’ sblastnames unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order) sskingdoms unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order) stitle Subject Title salltitles All Subject Title(s), separated by a ‘<>’ sstrand Subject Strand qcovs Query Coverage Per Subject qcovhsp Query Coverage Per HSP qcovus Query Coverage Per Unique Subject (blastn only)
URL: https://blast.ncbi.nlm.nih.gov/
Example¶
This wrapper can be used in the following way:
rule blast_nucleotide:
input:
query = "{sample}.fasta",
blastdb=multiext("blastdb/blastdb",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
output:
"{sample}.blast.txt"
log:
"logs/{sample}.blast.log"
threads:
2
params:
# Usable options and specifiers for the different output formats are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
format="6 qseqid sseqid evalue",
extra=""
wrapper:
"v2.2.1/bio/blast/blastn"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
blast=2.14.0
Input/Output¶
Input:
query
: FASTA file OR bare sequence file (more information) OR identifiers (more information)blastdb
: Path to blast database
Output:
- Path to result file depending on the formatting option, different output files can be generated (see tables above)
Params¶
extra
: Optional parameters besides -query, -db, -num_threads and -out.
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]
if format:
out_format = " -outfmt '{}'".format(format)
shell(
"blastn"
" -query {snakemake.input.query}"
" {out_format}"
" {snakemake.params.extra}"
" -db {db_name}"
" -num_threads {snakemake.threads}"
" -out {snakemake.output[0]}"
)
BLAST MAKEBLASTDB FOR FASTA FILES¶
Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.
URL: https://blast.ncbi.nlm.nih.gov/
Example¶
This wrapper can be used in the following way:
rule blast_makedatabase_nucleotide:
input:
fasta="genome/{genome}.fasta"
output:
multiext("results/{genome}.fasta",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
log:
"logs/{genome}.log"
params:
"-input_type fasta -blastdb_version 5 -parse_seqids"
wrapper:
"v2.2.1/bio/blast/makeblastdb"
rule blast_makedatabase_protein:
input:
fasta="protein/{protein}.fasta"
output:
multiext("results/{protein}.fasta",
".pdb",
".phr",
".pin",
".pot",
".psq",
".ptf",
".pto"
)
log:
"logs/{protein}.log"
params:
"-input_type fasta -blastdb_version 5"
wrapper:
"v2.2.1/bio/blast/makeblastdb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
blast=2.14.0
Input/Output¶
Input:
fasta
: Path to FASTA file
Output:
- Path to database multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)
Params¶
Optional parameters basides `-in`, `-dtype`, and `-out`
:
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log
out = snakemake.output[0]
db_type = ""
(out_name, ext) = path.splitext(out)
if ext.startswith(".n"):
db_type = "nucl"
elif ext.startswith(".p"):
db_type = "prot"
shell(
"makeblastdb"
" -in {snakemake.input.fasta}"
" -dbtype {db_type}"
" {snakemake.params}"
" -logfile {log}"
" -out {out_name}"
)
BOWTIE2¶
For bowtie2, the following wrappers are available:
BOWTIE2¶
Map reads with bowtie2.
URL: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
Example¶
This wrapper can be used in the following way:
rule test_bowtie2:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext(
"index/genome",
".1.bt2",
".2.bt2",
".3.bt2",
".4.bt2",
".rev.1.bt2",
".rev.2.bt2",
),
# ref="genome.fasta", #Required for CRAM output
output:
"mapped/{sample}.bam",
# idx="",
# metrics="",
# unaligned="",
# unpaired="",
# unconcordant="",
# concordant="",
log:
"logs/bowtie2/{sample}.log",
params:
extra="", # optional parameters
threads: 8 # Use at least two threads
wrapper:
"v2.2.1/bio/bowtie2/align"
use rule test_bowtie2 as test_bowtie2_se_gz with:
input:
sample=["reads/{sample}.1.fastq.gz"],
idx=multiext(
"index/genome",
".1.bt2",
".2.bt2",
".3.bt2",
".4.bt2",
".rev.1.bt2",
".rev.2.bt2",
),
output:
"mapped_se_gz/{sample}.bam",
rule test_bowtie2_index:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext(
"index/genome",
".1.bt2",
".2.bt2",
".3.bt2",
".4.bt2",
".rev.1.bt2",
".rev.2.bt2",
),
output:
"mapped_idx/{sample}.bam",
idx="mapped_idx/{sample}.bam.bai",
metrics="mapped_idx/{sample}.metrics.txt",
unaligned="mapped_idx/{sample}.unaligned.sam",
unpaired="mapped_idx/{sample}.unpaired.sam",
# unconcordant="",
# concordant="",
log:
"logs/bowtie2/{sample}.log",
params:
extra="", # optional parameters
threads: 8 # Use at least two threads
wrapper:
"v2.2.1/bio/bowtie2/align"
rule test_bowtie2_cram:
input:
sample=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext(
"index/genome",
".1.bt2",
".2.bt2",
".3.bt2",
".4.bt2",
".rev.1.bt2",
".rev.2.bt2",
),
ref="genome.fasta",
output:
"mapped_idx/{sample}.cram",
# idx="",
# metrics="",
# unaligned="",
# unpaired="",
# unconcordant="",
# concordant="",
log:
"logs/bowtie2/{sample}.log",
params:
extra="", # optional parameters
threads: 8 # Use at least two threads
wrapper:
"v2.2.1/bio/bowtie2/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- This wrapper uses an inner pipe. Make sure to use at least two threads in your Snakefile.
Software dependencies¶
bowtie2=2.5.1
samtools=1.17
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
sample
: FASTQ file(s)idx
: Bowtie2 indexed reference indexref
: Optional path to genome sequence (FASTA)ref_fai
: Optional path to reference genome sequence index (FAI)
Output:
- SAM/BAM/CRAM file. This must be the first output file in the output file list.
idx
: Optional path to bam index.metrics
: Optional path to metrics file.unaligned
: Optional path to unaligned unpaired reads.unpaired
: Optional path to unpaired reads that aligned at least once.unconcordant
: Optional path to pairs that didn’t align concordantly.concordant
: Optional path to pairs that aligned concordantly at least once.
Params¶
extra
: additional program arguments (except for -x, -U, -1, -2, –interleaved, -b, –met-file, –un, –al, –un-conc, –al-conc, -f, –tab6, –tab5, -q, or -p/–threads)interleaved
: Input sample contains interleaved paired-end FASTQ/FASTA reads. False`(default) or `True.
Authors¶
- Johannes Köster
- Filipe G. Vieira
- Thibault Dayris
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
def get_format(path: str) -> str:
"""
Return file format since Bowtie2 reads files that
could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
"""
if path.endswith((".gz", ".bz2")):
return path.split(".")[-2].lower()
return path.split(".")[-1].lower()
bowtie2_threads = snakemake.threads - 1
if bowtie2_threads < 1:
raise ValueError(
f"This wrapper expected at least two threads, got {snakemake.threads}"
)
# Setting parse_threads to false since samtools performs only
# bam compression. Thus the wrapper would use *twice* the amount
# of threads reserved by user otherwise.
samtools_opts = get_samtools_opts(snakemake, parse_threads=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
reads = ""
if n == 1:
if get_format(snakemake.input.sample[0]) in ("bam", "sam"):
reads = f"-b {snakemake.input.sample}"
else:
if snakemake.params.get("interleaved", False):
reads = f"--interleaved {snakemake.input.sample}"
else:
reads = f"-U {snakemake.input.sample}"
else:
reads = "-1 {} -2 {}".format(*snakemake.input.sample)
if all(get_format(sample) in ("fastq", "fq") for sample in snakemake.input.sample):
extra += " -q "
elif all(get_format(sample) == "tab5" for sample in snakemake.input.sample):
extra += " --tab5 "
elif all(get_format(sample) == "tab6" for sample in snakemake.input.sample):
extra += " --tab6 "
elif all(
get_format(sample) in ("fa", "mfa", "fasta") for sample in snakemake.input.sample
):
extra += " -f "
metrics = snakemake.output.get("metrics")
if metrics:
extra += f" --met-file {metrics} "
unaligned = snakemake.output.get("unaligned")
if unaligned:
extra += f" --un {unaligned} "
unpaired = snakemake.output.get("unpaired")
if unpaired:
extra += f" --al {unpaired} "
unconcordant = snakemake.output.get("unconcordant")
if unconcordant:
extra += f" --un-conc {unconcordant} "
concordant = snakemake.output.get("concordant")
if concordant:
extra += f" --al-conc {concordant} "
index = os.path.commonprefix(snakemake.input.idx).rstrip(".")
shell(
"(bowtie2"
" --threads {bowtie2_threads}"
" {reads} "
" -x {index}"
" {extra}"
"| samtools view --with-header "
" {samtools_opts}"
" -"
") {log}"
)
BOWTIE2_BUILD¶
Map reads with bowtie2.
URL: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
Example¶
This wrapper can be used in the following way:
rule bowtie2_build:
input:
ref="genome.fasta",
output:
multiext(
"genome",
".1.bt2",
".2.bt2",
".3.bt2",
".4.bt2",
".rev.1.bt2",
".rev.2.bt2",
),
log:
"logs/bowtie2_build/build.log",
params:
extra="", # optional parameters
threads: 8
wrapper:
"v2.2.1/bio/bowtie2/build"
rule bowtie2_build_large:
input:
ref="genome.fasta",
output:
multiext(
"genome",
".1.bt2l",
".2.bt2l",
".3.bt2l",
".4.bt2l",
".rev.1.bt2l",
".rev.2.bt2l",
),
log:
"logs/bowtie2_build/build.log",
params:
extra="--large-index", # optional parameters
threads: 8
wrapper:
"v2.2.1/bio/bowtie2/build"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
bowtie2=2.5.1
Params¶
extra
: additional program arguments besides –threads and io options.
Authors¶
- Daniel Standage
- Filipe G. Vieira
Code¶
__author__ = "Daniel Standage"
__copyright__ = "Copyright 2020, Daniel Standage"
__email__ = "daniel.standage@nbacc.dhs.gov"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
index = os.path.commonprefix(snakemake.output).rstrip(".")
shell(
"bowtie2-build"
" --threads {snakemake.threads}"
" {extra}"
" {snakemake.input.ref}"
" {index}"
" {log}"
)
BUSCO¶
Assess assembly and annotation completeness with BUSCO
Example¶
This wrapper can be used in the following way:
rule run_busco:
input:
"protein.fasta",
output:
short_json="txome_busco/short_summary.json",
short_txt="txome_busco/short_summary.txt",
full_table="txome_busco/full_table.tsv",
miss_list="txome_busco/busco_missing.tsv",
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco.log",
params:
mode="proteins",
lineage="stramenopiles_odb10",
# optional parameters
extra="",
threads: 8
wrapper:
"v2.2.1/bio/busco"
rule run_busco_euk:
input:
"protein.fasta",
output:
out_dir=directory("txome_busco/euk"),
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco_euk.log",
params:
mode="proteins",
# optional parameters
extra="--auto-lineage-euk",
threads: 8
wrapper:
"v2.2.1/bio/busco"
rule run_busco_prok:
input:
"protein.fasta",
output:
out_dir=directory("txome_busco/prok"),
dataset_dir=directory("resources/busco_downloads"),
log:
"logs/proteins_busco_prok.log",
params:
mode="proteins",
# optional parameters
extra="--auto-lineage-prok",
threads: 8
wrapper:
"v2.2.1/bio/busco"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The lineage parameter sets the lineage dataset ot use (optional). In auto-lineage mode, output or dataset folders need to be retrieved as a whole, since it is not possible to infer the output file names (they depend on the best lineage match).
Software dependencies¶
busco=5.4.7
Input/Output¶
Input:
- Path to assembly fasta
Output:
out_dir
: Path to annotation quality filesdataset_dir
: Optional path to dataset directoryshort_txt
: Optional path to plain text results summary. Requires parameter lineage.short_json
: Optional path to json formatted results summary. Requires parameter lineage.full_table
: Optional path to TSV formatted results. Requires parameter lineage.miss_list
: Contains a list of missing BUSCOs. Requires parameter lineage.
Params¶
lineage
: Assembly lineage.mode
: Either genome, transcriptome, and proteinsextra
: Optional parameters besides mode –lineage, –cpu and IO files.
Authors¶
- Tessa Pierce
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for BUSCO assessment"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = snakemake.params.get("mode")
assert mode in [
"genome",
"transcriptome",
"proteins",
], "invalid run mode: only 'genome', 'transcriptome' or 'proteins' allowed"
lineage = lineage_opt = snakemake.params.get("lineage", "")
if lineage_opt:
lineage_opt = f"--lineage {lineage_opt}"
with tempfile.TemporaryDirectory() as tmpdir:
dataset_dir = snakemake.input.get("dataset_dir", "")
if not dataset_dir:
dataset_dir = f"{tmpdir}/dataset"
shell(
"busco"
" --cpu {snakemake.threads}"
" --in {snakemake.input}"
" --mode {mode}"
" {lineage_opt}"
" {extra}"
" --download_path {dataset_dir}"
" --out_path {tmpdir}"
" --out output"
" {log}"
)
if snakemake.output.get("short_txt"):
assert lineage, "parameter 'lineage' is required to output 'short_tsv'"
shell(
"cat {tmpdir}/output/short_summary.specific.{lineage}.output.txt > {snakemake.output.short_txt:q}"
)
if snakemake.output.get("short_json"):
assert lineage, "parameter 'lineage' is required to output 'short_json'"
shell(
"cat {tmpdir}/output/short_summary.specific.{lineage}.output.json > {snakemake.output.short_json:q}"
)
if snakemake.output.get("full_table"):
assert lineage, "parameter 'lineage' is required to output 'full_table'"
shell(
"cat {tmpdir}/output/run_{lineage}/full_table.tsv > {snakemake.output.full_table:q}"
)
if snakemake.output.get("miss_list"):
assert lineage, "parameter 'lineage' is required to output 'miss_list'"
shell(
"cat {tmpdir}/output/run_{lineage}/missing_busco_list.tsv > {snakemake.output.miss_list:q}"
)
if snakemake.output.get("out_dir"):
shell("mv {tmpdir}/output {snakemake.output.out_dir:q}")
if snakemake.output.get("dataset_dir"):
shell("mv {dataset_dir} {snakemake.output.dataset_dir:q}")
BUSTOOLS¶
For bustools, the following wrappers are available:
BUSTOOLS COUNT¶
BUS files can be converted into a barcode-feature matrix
URL: https://github.com/BUStools/bustools#count
Example¶
This wrapper can be used in the following way:
rule test_bustools_count:
input:
bus="file.bus",
ecmap="matrix.ec",
txnames="transcripts.txt",
genemap="t2g.txt",
output:
multiext(
"buscount",
".barcodes.txt",
".CUPerCell.txt",
".cu.txt",
".genes.txt",
".hist.txt",
".mtx",
),
threads: 1
params:
extra="",
log:
"bustools.log",
wrapper:
"v2.2.1/bio/bustools/count"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
When multiple bus files are provided, only one count matrix is returned.
When an output endswith: “.hist.txt”, then –hist parameter is automatically used.
When an output endswith: “.genes.txt”, then –genemap parameter is automatically used.
Software dependencies¶
bustools=0.43.0
Input/Output¶
Input:
bus
: Single bus-file, or List of bus-filesgenemap
: Transcript to gene mappingtxnames
: List of transcriptsecmap
: Equivalence classes for transcripts
Output:
- barcodes, equivalence classes, and count matrix
Params¶
extra
: Optional parameters, besides –output, –ecmap, and –genemap
Authors¶
Code¶
#!/usr/bin/env python3
# coding: utf-8
"""Snakemake wrapper for bustools count"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from os.path import commonprefix
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Get IO files and prefixes
bus_files = snakemake.input["bus"]
if isinstance(bus_files, list):
bus_files = " ".join(bus_files)
out_prefix = commonprefix(snakemake.output)[:-1]
# Fill extra parameters if needed
extra = snakemake.params.get("extra", "")
if any(outfile.endswith(".hist.txt") for outfile in snakemake.output):
if "--hist" not in extra:
extra += " --hist"
if any(outfile.endswith(".genes.txt") for outfile in snakemake.output):
if "--genecounts" not in extra:
extra += " --genecounts"
shell(
"bustools count {extra} "
"--output {out_prefix} "
"--genemap {snakemake.input.genemap} "
"--ecmap {snakemake.input.ecmap} "
"--txnames {snakemake.input.txnames} "
"{bus_files} "
"{log}"
)
BUSTOOLS SORT¶
Sort raw BUS output from pseudoalignment programs
URL: https://github.com/BUStools/bustools#sort
Example¶
This wrapper can be used in the following way:
rule test_bustools_sort:
input:
"file.bus",
output:
"sorted.bus",
threads: 1
resources:
mem_mb=765,
params:
extra="--umi",
log:
"bustools.log",
wrapper:
"v2.2.1/bio/bustools/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
–temp is automatically defined through resources.tmpdir
–memory is automatically defined through resources.mem_mb
Multiple bus files in input will result in a single bus file in output.
Software dependencies¶
bustools=0.42.0
snakemake-wrapper-utils=0.5.3
Params¶
extra
: Optional parameters
Authors¶
Code¶
#!/usr/bin/env python3
# coding: utf-8
"""Snakemake wrapper for bustools sort"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
from snakemake_wrapper_utils.snakemake import get_mem
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
bus_files = snakemake.input
if isinstance(bus_files, list):
bus_files = " ".join(bus_files)
mem = get_mem(snakemake, "MiB")
with TemporaryDirectory() as tempdir:
shell(
"bustools sort "
"--memory {mem} "
"--temp {tempdir} "
"--threads {snakemake.threads} "
"--output {snakemake.output[0]} "
"{bus_files} "
"{log}"
)
BUSTOOLS TEXT¶
convert bus to tsv files
URL: https://github.com/BUStools/bustools#text
Example¶
This wrapper can be used in the following way:
rule test_bustools_text:
input:
"file.bus",
output:
"file.tsv",
threads: 1
params:
extra="",
log:
"logs/bustools.log",
wrapper:
"v2.2.1/bio/bustools/text"
rule test_bustools_text_list:
input:
["file.bus", "file2.bus"],
output:
"file2.tsv",
threads: 1
params:
extra="--flags --pad",
log:
"logs/bustools.log",
wrapper:
"v2.2.1/bio/bustools/text"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
When multiple bus files are provided, only one TSV file is produced.
Software dependencies¶
bustools=0.42.0
Params¶
extra
: Optional parameters, besides –o/-output
Authors¶
Code¶
#!/usr/bin/env python3
# conding: utf-8
"""snakemake wrapper for bustool text"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
bus_files = snakemake.input[0]
if isinstance(bus_files, list):
bus_files = " ".join(bus_files)
shell("bustools text --output {snakemake.output[0]} {extra} {bus_files} {log}")
BWA¶
For bwa, the following wrappers are available:
BWA ALN¶
Map reads with bwa aln. For more information about BWA see BWA documentation.
Example¶
This wrapper can be used in the following way:
rule bwa_aln:
input:
fastq="reads/{sample}.{pair}.fastq",
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"sai/{sample}.{pair}.sai",
params:
extra="",
log:
"logs/bwa_aln/{sample}.{pair}.log",
threads: 8
wrapper:
"v2.2.1/bio/bwa/aln"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa aln."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
index = snakemake.input.idx
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
shell(
"bwa aln"
" {extra}"
" -t {snakemake.threads}"
" {index}"
" {snakemake.input.fastq}"
" > {snakemake.output[0]} {log}"
)
BWA INDEX¶
Creates a BWA index. For more information about BWA see BWA documentation.
Example¶
This wrapper can be used in the following way:
rule bwa_index:
input:
"{genome}.fasta",
output:
idx=multiext("{genome}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
log:
"logs/bwa_index/{genome}.log",
params:
algorithm="bwtsw",
wrapper:
"v2.2.1/bio/bwa/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os.path import splitext
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided!")
elif len(snakemake.input) > 1:
raise ValueError("Only one reference genome can be inputed!")
# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", splitext(snakemake.output.idx[0])[0])
if len(prefix) > 0:
prefix = "-p " + prefix
# Contrunction algorithm that will be used to build the database, default is bwtsw
construction_algorithm = snakemake.params.get("algorithm", "")
if len(construction_algorithm) != 0:
construction_algorithm = "-a " + construction_algorithm
shell(
"bwa index" " {prefix}" " {construction_algorithm}" " {snakemake.input[0]}" " {log}"
)
BWA MEM¶
Map reads using bwa mem, with optional sorting using samtools or picard.
URL: http://bio-bwa.sourceforge.net/bwa.shtml
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.2.1/bio/bwa/mem"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments for bwa-mem.
- The sorting param allows to enable sorting, and can be either ‘none’, ‘samtools’ or ‘picard’.
- The sort_extra allows for extra arguments for samtools/picard
Software dependencies¶
bwa=0.7.17
samtools=1.16.1
picard-slim=2.27.4
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
- Julian de Ruiter
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster, Julian de Ruiter"
__copyright__ = "Copyright 2016, Johannes Köster and Julian de Ruiter"
__email__ = "koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
from snakemake_wrapper_utils.samtools import get_samtools_opts
# Extract arguments.
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
samtools_opts = get_samtools_opts(snakemake, param_name="sort_extra")
java_opts = get_java_opts(snakemake)
index = snakemake.input.idx
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts}"
elif sort == "samtools":
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -T {tmpdir}"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = "picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin --TMP_DIR {tmpdir} --SORT_ORDER {sort_order} --OUTPUT {snakemake.output[0]}"
else:
raise ValueError(f"Unexpected value for params.sort ({sort})")
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER¶
Map reads using bwa mem, mark duplicates by samblaster and sort and index by sambamba.
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
bam="mapped/{sample}.bam",
index="mapped/{sample}.bam.bai",
log:
"logs/bwa_mem_sambamba/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort_extra="", # Extra args for sambamba.
threads: 8
wrapper:
"v2.2.1/bio/bwa/mem-samblaster"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
sambamba=1.0
samblaster=0.1.26
Authors¶
- Christopher Schröder
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")
index = snakemake.input.get("index", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or " "2 (paired-end) elements")
shell(
"(bwa mem"
" -t {snakemake.threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | samblaster"
" {samblaster_extra}"
" | sambamba view -S -f bam /dev/stdin"
" -t {snakemake.threads}"
" | sambamba sort /dev/stdin"
" -t {snakemake.threads}"
" -o {snakemake.output.bam}"
" {sort_extra}"
") {log}"
)
BWA SAMPE¶
Map paired-end reads with bwa sampe. For more information about BWA see BWA documentation.
Example¶
This wrapper can be used in the following way:
rule bwa_sampe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
log:
"logs/bwa_sampe/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/sampe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
samtools=1.17
picard-slim=3.0.0
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
index = snakemake.input.get("idx", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
# Check inputs.
if not len(snakemake.input.sai) == 2:
raise ValueError("input.sai must have 2 elements")
if not len(snakemake.input.fastq) == 2:
raise ValueError("input.fastq must have 2 elements")
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa sampe"
" {extra}"
" {index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}"
)
BWA SAMSE¶
Map single-end reads with bwa samse. For more information about BWA see BWA documentation.
Example¶
This wrapper can be used in the following way:
rule bwa_samse:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai",
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none", # optional: Enable sorting. Possible values: 'none', 'samtools' or 'picard'`
sort_order="queryname", # optional: Sort by 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
log:
"logs/bwa_samse/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/samse"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
samtools=1.17
picard-slim=3.0.0
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for bwa sampe."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
index = snakemake.input.get("idx", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa samse"
" {extra}"
" {index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}"
)
BWA SAM(SE/PE)¶
Map paired-end reads with either bwa samse or sampe.
Example¶
This wrapper can be used in the following way:
rule bwa_sam_pe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.pe.sam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none",
log:
"logs/bwa_sam_pe/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/samxe"
rule bwa_sam_se:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai",
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.se.sam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none",
log:
"logs/bwa_sam_se/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/samxe"
rule bwa_bam_pe:
input:
fastq=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
sai=["sai/{sample}.1.sai", "sai/{sample}.2.sai"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.pe.bam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none",
log:
"logs/bwa_bam_pe/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/samxe"
rule bwa_bam_se:
input:
fastq="reads/{sample}.1.fastq",
sai="sai/{sample}.1.sai",
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.se.bam",
params:
extra=r"-r '@RG\tID:{sample}\tSM:{sample}'", # optional: Extra parameters for bwa.
sort="none",
log:
"logs/bwa_bam_se/{sample}.log",
wrapper:
"v2.2.1/bio/bwa/samxe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, http://bio-bwa.sourceforge.net/bwa.shtml
Software dependencies¶
bwa=0.7.17
samtools=1.17
picard-slim=3.0.0
Authors¶
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for both bwa samse and sampe."""
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
index = snakemake.input.get("idx", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
# Check inputs.
fastq = (
snakemake.input.fastq
if isinstance(snakemake.input.fastq, list)
else [snakemake.input.fastq]
)
sai = (
snakemake.input.sai
if isinstance(snakemake.input.sai, list)
else [snakemake.input.sai]
)
if len(fastq) == 1 and len(sai) == 1:
alg = "samse"
elif len(fastq) == 2 and len(sai) == 2:
alg = "sampe"
else:
raise ValueError("input.fastq and input.sai must have 1 or 2 elements each")
# Extract output format
out_name, out_ext = path.splitext(snakemake.output[0])
out_ext = out_ext[1:].upper()
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to output format using samtools view.
pipe_cmd = (
"samtools view -h --output-fmt " + out_ext + " -o {snakemake.output[0]} -"
)
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Use prefix for temp.
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
# Define output format
sort_extra += " --output-fmt {}".format(out_ext)
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Run command.
shell(
"(bwa {alg}"
" {extra}"
" {index}"
" {snakemake.input.sai}"
" {snakemake.input.fastq}"
" | " + pipe_cmd + ") {log}"
)
BWA-MEM2¶
For bwa-mem2, the following wrappers are available:
BWA-MEM2 INDEX¶
Creates a bwa-mem2 index.
URL: https://github.com/bwa-mem2/bwa-mem2
Example¶
This wrapper can be used in the following way:
rule bwa_mem2_index:
input:
"{genome}",
output:
"{genome}.0123",
"{genome}.amb",
"{genome}.ann",
"{genome}.bwt.2bit.64",
"{genome}.pac",
log:
"logs/bwa-mem2_index/{genome}.log",
wrapper:
"v2.2.1/bio/bwa-mem2/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa-mem2=2.2.1
Authors¶
- Christopher Schröder
- Patrik Smeds
Code¶
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2020, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
raise ValueError("Please provide exactly one reference genome as input.")
valid_suffixes = {".0123", ".amb", ".ann", ".bwt.2bit.64", ".pac"}
def get_valid_suffix(path):
for suffix in valid_suffixes:
if path.endswith(suffix):
return suffix
prefixes = set()
for s in snakemake.output:
suffix = get_valid_suffix(s)
if suffix is None:
raise ValueError(f"{s} cannot be generated by bwa-mem2 index (invalid suffix).")
prefixes.add(s[: -len(suffix)])
if len(prefixes) != 1:
raise ValueError("Output files must share common prefix up to their file endings.")
(prefix,) = prefixes
shell("bwa-mem2 index -p {prefix} {snakemake.input[0]} {log}")
BWA-MEM2¶
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. Optional sorting using samtools or picard.
URL: https://github.com/bwa-mem2/bwa-mem2
Example¶
This wrapper can be used in the following way:
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem2/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.2.1/bio/bwa-mem2/mem"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments for bwa-mem2.
- The sorting param allows to enable sorting, and can be either ‘none’, ‘samtools’ or ‘picard’.
- The sort_extra allows for extra arguments for samtools/picard
Software dependencies¶
bwa-mem2=2.2.1
samtools=1.17
picard-slim=3.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
"Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
from snakemake_wrapper_utils.samtools import get_samtools_opts
# Extract arguments.
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
samtools_opts = get_samtools_opts(snakemake, param_name="sort_extra")
java_opts = get_java_opts(snakemake)
index = snakemake.input.get("index", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError(f"Unexpected value for sort_order ({sort_order})")
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts}"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -T {tmpdir}"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = "picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin --TMP_DIR {tmpdir} --SORT_ORDER {sort_order} --OUTPUT {snakemake.output[0]}"
else:
raise ValueError(f"Unexpected value for params.sort ({sort})")
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
)
BWA MEM SAMBLASTER¶
Map reads using bwa-mem2, mark duplicates by samblaster and sort and index by sambamba.
Example¶
This wrapper can be used in the following way:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
output:
bam="mapped/{sample}.bam",
index="mapped/{sample}.bam.bai",
log:
"logs/bwa_mem2_sambamba/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort_extra="-q", # Extra args for sambamba.
threads: 8
wrapper:
"v2.2.1/bio/bwa-mem2/mem-samblaster"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa-mem2=2.2.1
sambamba=1.0
samblaster=0.1.26
Authors¶
- Christopher Schröder
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort_extra = snakemake.params.get("sort_extra", "")
samblaster_extra = snakemake.params.get("samblaster_extra", "")
index = snakemake.input.get("index", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | samblaster"
" {samblaster_extra}"
" | sambamba view -S -f bam /dev/stdin"
" -t {snakemake.threads}"
" | sambamba sort /dev/stdin"
" -t {snakemake.threads}"
" -o {snakemake.output.bam}"
" {sort_extra}"
") {log}"
)
BWA-MEME¶
For bwa-meme, the following wrappers are available:
BWA-MEM2 INDEX¶
Creates a bwa-meme index.
Example¶
This wrapper can be used in the following way:
rule bwa_meme_index:
input:
"{genome}",
output:
multiext(
"{genome}",
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
),
log:
"logs/bwa-meme_index/{genome}.log",
threads: 8
wrapper:
"v2.2.1/bio/bwa-meme/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa-meme=1.0.6
Authors¶
- Christopher Schröder
- Patrik Smeds
Code¶
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2022, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
raise ValueError("Please provide exactly one reference genome as input.")
valid_suffixes = {
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
}
def get_valid_suffix(path):
for suffix in valid_suffixes:
if path.endswith(suffix):
return suffix
prefixes = set()
for s in snakemake.output:
suffix = get_valid_suffix(s)
if suffix is None:
raise ValueError(f"{s} cannot be generated by bwa-meme index (invalid suffix).")
prefixes.add(s[: -len(suffix)])
if len(prefixes) != 1:
raise ValueError("Output files must share common prefix up to their file endings.")
(prefix,) = prefixes
suffixarray = snakemake.input[0] + ".suffixarray_uint64"
dirname = path.dirname(suffixarray)
basename = path.basename(suffixarray)
num_models = snakemake.params.get("num_models", 268435456) # change only for testing!
if not dirname:
dirname = "."
shell(
"(bwa-meme index -a meme -p {prefix} {snakemake.input[0]} -t {snakemake.threads} && bwa-meme-train-prmi -t {snakemake.threads} --data-path {dirname} {suffixarray} {basename} pwl,linear,linear_spline {num_models}) {log}"
)
BWA-MEME¶
BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. It achieves up to 3.45× speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60×, memory accesses by 8.77× and LLC misses by 2.21×, while ensuring the identical SAM output to BWA-MEM2.
Example¶
This wrapper can be used in the following way:
rule bwa_meme_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
reference="genome.fasta",
idx=multiext(
"genome.fasta",
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
),
output:
"mapped/{sample}.cram", # Output can be .cram, .bam, or .sam
log:
"logs/bwa_meme/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}' -M",
sort="samtools", # Can be 'none' or 'samtools or picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools.
dedup="mark", # Can be 'none' (default), 'mark' or 'remove'.
dedup_extra="-M", # Extra args for samblaster.
exceed_thread_limit=True, # Set threads als for samtools sort / view (total used CPU may exceed threads!)
embed_ref=True, # Embed reference when writing cram.
threads: 8
wrapper:
"v2.2.1/bio/bwa-meme/mem"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa-meme=1.0.6
samtools=1.17
samblaster=0.1.26
mbuffer=20160228
picard-slim=3.0.0
Authors¶
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
"Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
embed_ref = snakemake.params.get("embed_ref", False)
# Option to set the threads of samtools sort and view to the snakemake limit.
# In theory, bwa and alternate and samtools view starts only when sort is
# finished, so that never more threads are used than the limit. But it can
# not always be guaranteed.
exceed_thread_limit = snakemake.params.get("exceed_thread_limit", False)
dedup = snakemake.params.get("dedup", "none")
dedup_extra = snakemake.params.get("dedup_extra", "")
# Detect output format.
if snakemake.output[0].endswith(".sam"):
output_format = "cram"
elif snakemake.output[0].endswith(".bam"):
output_format = "bam"
elif snakemake.output[0].endswith(".cram"):
output_format = "cram"
else:
raise ValueError("output file format must be .sam, .bam or .cram")
if embed_ref:
output_format += ",embed_ref"
if exceed_thread_limit:
samtools_threads = snakemake.threads
else:
samtools_threads = 1
reference = snakemake.input.get("reference")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -h -O {output_format} -o {snakemake.output[0]} -T {reference} -@ {samtools_threads} -"
elif sort == "samtools":
pipe_cmd = "samtools sort {sort_extra} -O {output_format} -o {snakemake.output[0]} --reference {reference} -@ {samtools_threads} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
# Sort alignments using samtools sort.
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} -I /dev/stdin"
" -O /dev/stdout -SO {sort_order} | samtools view -h -O {output_format} -o {snakemake.output[0]} -T {reference} -@ {samtools_threads} -"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Determine which pipe command to use for converting to bam or sorting.
if dedup == "none":
# Do not detect duplicates.
dedup_cmd = ""
elif dedup == "mark":
# Mark duplicates using samblaster.
dedup_cmd = "samblaster -q {dedup_extra} | "
elif dedup == "remove":
dedup_cmd = "samblaster -q -r {dedup_extra} | "
else:
raise ValueError("Unexpected value for params.dedup ({})".format(dedup))
shell(
"(bwa-meme mem -7"
" -t {snakemake.threads}"
" {extra}"
" {reference}"
" {snakemake.input.reads}"
" | mbuffer -q -m 2G "
" | " + dedup_cmd + pipe_cmd + ") {log}"
)
BWA-MEMX¶
For bwa-memx, the following wrappers are available:
BWA-MEM2 INDEX¶
Creates a bwa-mem, bwa-mem2 or bwa-meme index.
Example¶
This wrapper can be used in the following way:
rule bwa_mem_index:
input:
"{genome}",
output:
multiext(
"{genome}",
".amb",
".ann",
".bwt",
".pac",
".sa",
),
log:
"logs/bwa-mem_index/{genome}.log",
params:
bwa="bwa-mem",
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/index"
rule bwa_mem2_index:
input:
"{genome}",
output:
multiext(
"{genome}",
".0123",
".amb",
".ann",
".bwt.2bit.64",
".pac",
),
log:
"logs/bwa-mem2_index/{genome}.log",
params:
bwa="bwa-mem2",
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/index"
rule bwa_meme_index:
input:
"{genome}",
output:
multiext(
"{genome}",
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
),
log:
"logs/bwa-meme_index/{genome}.log",
params:
bwa="bwa-meme",
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
bwa-mem2=2.2.1
bwa-meme=1.0.6
Authors¶
- Christopher Schröder
- Patrik Smeds
Code¶
__author__ = "Christopher Schröder, Patrik Smeds"
__copyright__ = "Copyright 2022, Christopher Schröder, Patrik Smeds"
__email__ = "christopher.schroeder@tu-dortmund.de, patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
bwa = snakemake.params.get("bwa", "bwa-mem")
# Check inputs/arguments.
if len(snakemake.input) == 0:
raise ValueError("A reference genome has to be provided.")
elif len(snakemake.input) > 1:
raise ValueError("Please provide exactly one reference genome as input.")
if bwa == "bwa-mem":
valid_suffixes = {
".amb",
".ann",
".bwt",
".pac",
".sa",
}
cmd = "bwa index {prefix} {snakemake.input[0]}"
elif bwa == "bwa-mem2":
valid_suffixes = {
".0123",
".amb",
".ann",
".bwt.2bit.64",
".pac",
}
cmd = "bwa-mem2 index -p {prefix} {snakemake.input[0]}"
elif bwa == "bwa-meme":
valid_suffixes = {
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
}
cmd = "bwa-meme index -a meme -p {prefix} {snakemake.input[0]} -t {snakemake.threads} && bwa-meme-train-prmi -t {snakemake.threads} --data-path {dirname} {suffixarray} {basename} pwl,linear,linear_spline {num_models}"
else:
raise ValueError(
"Unexpected value for params.bwa ({}). Must be bwa-mem, bwa-mem2 or bwa-meme.".format(
bwa
)
)
def get_valid_suffix(path):
for suffix in valid_suffixes:
if path.endswith(suffix):
return suffix
prefixes = set()
for s in snakemake.output:
suffix = get_valid_suffix(s)
if suffix is None:
raise ValueError(f"{s} cannot be generated by bwa-meme index (invalid suffix).")
prefixes.add(s[: -len(suffix)])
if len(prefixes) != 1:
raise ValueError("Output files must share common prefix up to their file endings.")
(prefix,) = prefixes
suffixarray = snakemake.input[0] + ".suffixarray_uint64"
dirname = path.dirname(suffixarray)
basename = path.basename(suffixarray)
num_models = snakemake.params.get("num_models", 268435456) # change only for testing!
if not dirname:
dirname = "."
shell(f"({cmd}) {log}")
BWA-MEMX¶
Collection of bwa-mem, bwa-mem2 and bwa-meme.
Example¶
This wrapper can be used in the following way:
rule bwa_memx_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
reference="genome.fasta",
idx=multiext(
"genome.fasta",
".amb",
".ann",
".bwt",
".pac",
".sa",
),
output:
"mapped/mem/{sample}.cram", # Output can be .cram, .bam, or .sam
log:
"logs/bwa_memx/{sample}.log",
params:
bwa="bwa-mem", # Can be 'bwa-mem, bwa-mem2 or bwa-meme.
extra=r"-R '@RG\tID:{sample}\tSM:{sample}' -M",
sort="samtools", # Can be 'none' or 'samtools or picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools.
dedup="mark", # Can be 'none' (default), 'mark' or 'remove'.
dedup_extra="-M", # Extra args for samblaster.
exceed_thread_limit=True, # Set threads als for samtools sort / view (total used CPU may exceed threads!)
embed_ref=True, # Embed reference when writing cram.
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/mem"
rule bwa_memx_mem2:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
reference="genome.fasta",
idx=multiext(
"genome.fasta",
".0123",
".amb",
".ann",
".bwt.2bit.64",
".pac",
),
output:
"mapped/mem2/{sample}.cram",
log:
"logs/bwa_memx/{sample}.log",
params:
bwa="bwa-mem2",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}' -M",
sort="picard",
sort_order="queryname",
sort_extra="",
dedup="none",
dedup_extra="-M",
exceed_thread_limit=True,
embed_ref=True,
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/mem"
rule bwa_memx_meme:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
reference="genome.fasta",
idx=multiext(
"genome.fasta",
".0123",
".amb",
".ann",
".pac",
".pos_packed",
".suffixarray_uint64",
".suffixarray_uint64_L0_PARAMETERS",
".suffixarray_uint64_L1_PARAMETERS",
".suffixarray_uint64_L2_PARAMETERS",
),
output:
"mapped/meme/{sample}.cram",
log:
"logs/bwa_memx/{sample}.log",
params:
bwa="bwa-meme",
extra=r"-R '@RG\tID:{sample}\tSM:{sample}' -M",
sort="picard",
sort_order="coordinate",
sort_extra="",
dedup="remove",
dedup_extra="-M",
exceed_thread_limit=False,
embed_ref=False,
threads: 8
wrapper:
"v2.2.1/bio/bwa-memx/mem"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa=0.7.17
bwa-mem2=2.2.1
bwa-meme=1.0.6
samtools=1.17
samblaster=0.1.26
mbuffer=20160228
picard-slim=3.0.0
Authors¶
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
"Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
embed_ref = snakemake.params.get("embed_ref", False)
bwa = snakemake.params.get("bwa", "bwa-mem")
# Option to set the threads of samtools sort and view to the snakemake limit.
# In theory, bwa and alternate and samtools view starts only when sort is
# finished, so that never more threads are used than the limit. But it can
# not always be guaranteed.
exceed_thread_limit = snakemake.params.get("exceed_thread_limit", False)
dedup = snakemake.params.get("dedup", "none")
dedup_extra = snakemake.params.get("dedup_extra", "")
# Detect output format.
if snakemake.output[0].endswith(".sam"):
output_format = "cram"
elif snakemake.output[0].endswith(".bam"):
output_format = "bam"
elif snakemake.output[0].endswith(".cram"):
output_format = "cram"
else:
raise ValueError("output file format must be .sam, .bam or .cram")
if embed_ref:
output_format += ",embed_ref"
if exceed_thread_limit:
samtools_threads = snakemake.threads
else:
samtools_threads = 1
reference = snakemake.input.get("reference")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -h -O {output_format} -o {snakemake.output[0]} -T {reference} -@ {samtools_threads} -"
elif sort == "samtools":
pipe_cmd = "samtools sort {sort_extra} -O {output_format} -o {snakemake.output[0]} --reference {reference} -@ {samtools_threads} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
# Sort alignments using samtools sort.
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} -I /dev/stdin"
" -O /dev/stdout -SO {sort_order} | samtools view -h -O {output_format} -o {snakemake.output[0]} -T {reference} -@ {samtools_threads} -"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
# Determine which pipe command to use for converting to bam or sorting.
if dedup == "none":
# Do not detect duplicates.
dedup_cmd = ""
elif dedup == "mark":
# Mark duplicates using samblaster.
dedup_cmd = "samblaster -q {dedup_extra} | "
elif dedup == "remove":
dedup_cmd = "samblaster -q -r {dedup_extra} | "
else:
raise ValueError("Unexpected value for params.dedup ({})".format(dedup))
if bwa == "bwa-mem":
bwa_cmd = "bwa mem"
elif bwa == "bwa-mem2":
bwa_cmd = "bwa-mem2 mem"
elif bwa == "bwa-meme":
bwa_cmd = "bwa-meme mem -7"
else:
raise ValueError(
"Unexpected value for params.bwa ({}). Must be bwa-mem, bwa-mem2 or bwa-meme.".format(
bwa
)
)
shell(
" ({bwa_cmd}"
" -t {snakemake.threads}"
" {extra}"
" {reference}"
" {snakemake.input.reads}"
" | mbuffer -q -m 2G "
" | " + dedup_cmd + pipe_cmd + ") {log}"
)
CAIROSVG¶
Convert SVG files with cairosvg.
Example¶
This wrapper can be used in the following way:
rule:
input:
"{prefix}.svg"
output:
"{prefix}.{fmt,(pdf|png)}"
wrapper:
"v2.2.1/utils/cairosvg"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cairosvg=2.7.0
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
_, ext = os.path.splitext(snakemake.output[0])
if ext not in (".png", ".pdf", ".ps", ".svg"):
raise ValueError("invalid file extension: '{}'".format(ext))
fmt = ext[1:]
shell("cairosvg -f {fmt} {snakemake.input[0]} -o {snakemake.output[0]}")
CLUSTALO¶
Multiple alignment of nucleic acid and protein sequences.
Example¶
This wrapper can be used in the following way:
rule clustalo:
input:
"{sample}.fa"
output:
"{sample}.msa.fa"
params:
extra=""
log:
"logs/clustalo/test/{sample}.log"
threads: 8
wrapper:
"v2.2.1/bio/clustalo"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
clustalo=1.2.4
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for clustal omega."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"clustalo {extra}"
" --threads={snakemake.threads}"
" --in {snakemake.input[0]}"
" --out {snakemake.output[0]} "
" {log}"
)
COOLPUP.PY¶
Pileup features for a resolution in an .mcool file
URL: https://github.com/open2c/coolpuppy
Example¶
This wrapper can be used in the following way:
rule coolpuppy:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
features="CN.mm9.toy_features.bed", ## Feature file
expected="CN.mm9.toy_expected.tsv", ## Expected file
view="CN.mm9.toy_regions.bed", ## File with the region names and coordinates
output:
"CN_{resolution,[0-9]+}.clpy",
params:
## Add optional parameters
features_format="bed", ## Format of the features file
extra="--local", ## Add extra parameters
threads: 2
log:
"logs/CN_{resolution}_coolpuppy.log",
wrapper:
"v2.2.1/bio/coolpuppy"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
coolpuppy
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- a file with features to pileup
- (optional) file with expected
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A file (.clpy, HDF5-based format) with the pileup. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
expected = snakemake.input.get("expected", "")
if expected:
expected = f"--expected {expected}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
resolution = snakemake.params.get("resolution", snakemake.wildcards.get("resolution"))
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
shell(
"(coolpup.py"
" {snakemake.input.cooler}::resolutions/{resolution}"
" {snakemake.input.features}"
" {expected}"
" --features-format {snakemake.params.features_format}"
" {view}"
" -p {snakemake.threads}"
" {extra}"
" -o {snakemake.output}) {log}"
)
COOLTOOLS¶
For cooltools, the following wrappers are available:
COOLTOOLS DOTS¶
Calculate cis eigenvectors for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_dots:
input:
cooler="small_test.mcool", ## Multiresolution cooler file
expected="test_expected.tsv", ## Expected file
view="test_view.txt", ## File with the region names and coordinates
output:
"HFF_{resolution,[0-9]+}.dots.bedpe",
params:
extra="", ## Add extra parameters
threads: 4
log:
"logs/HFF_{resolution}_dots.log",
wrapper:
"v2.2.1/bio/cooltools/dots"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- an expected file
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A .bedpe file with coordinates of detected dots. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
expected = snakemake.input.get("expected", "")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError(
"Please specify ressolution either as a wildcard or as a parameter"
)
shell(
"(cooltools dots"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {expected} "
" {view} "
" -p {snakemake.threads} "
" {extra} "
" -o {snakemake.output}) {log}"
)
COOLTOOLS EIGS_CIS¶
Calculate cis eigenvectors for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_eigs_cis:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
view="mm9_view.txt", ## File with the region names and coordinates
track="mm9_1000000_gc.bed",
output:
vecs="CN_{resolution,[0-9]+}.cis.vecs.tsv",
lam="CN_{resolution,[0-9]+}.cis.lam.tsv",
bigwig="CN_{resolution,[0-9]+}.cis.bw",
params:
## Add optional parameters
track_col_name="GC",
extra="",
log:
"logs/CN_{resolution}_cis_eigs.log",
wrapper:
"v2.2.1/bio/cooltools/eigs_cis"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
[“Output files can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.”]
Software dependencies¶
ucsc-bedgraphtobigwig
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- (optional) phasing track file
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- vecs
- lams
- bigwig
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputtrack_col_name
: Name of the column in the track file to useextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
track = snakemake.input.get("track", "")
track_col_name = snakemake.params.get("track_col_name", "")
if track and track_col_name:
track = f"--phasing-track {track}::{track_col_name}"
elif track:
track = f"--phasing-track {track}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
bigwig = snakemake.output.get("bigwig", "")
if bigwig:
bigwig = "--bigwig"
resolution = snakemake.params.get("resolution", snakemake.wildcards.get("resolution"))
assert (
resolution
), "Please specify resolution either as a `wildcard` or as a `parameter`"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"cooltools eigs-cis"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {track}"
" {view} "
" {bigwig}"
" {extra} "
" -o {tmpdir}/out"
" {log}"
)
shell("mv {tmpdir}/out.cis.vecs.tsv {snakemake.output.vecs}")
shell("mv {tmpdir}/out.cis.lam.txt {snakemake.output.lam}")
if bigwig:
shell("mv {tmpdir}/out.cis.bw {snakemake.output.bigwig}")
COOLTOOLS EIGS_TRANS¶
Calculate trans eigenvectors for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_eigs_trans:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
track="mm9_1000000_gc.bed",
output:
vecs="CN_{resolution,[0-9]+}.trans.vecs.tsv",
lam="CN_{resolution,[0-9]+}.trans.lam.tsv",
bigwig="CN_{resolution,[0-9]+}.trans.bw",
params:
## Add optional parameters
track_col_name="GC",
extra="",
log:
"logs/CN_{resolution}_trans_eigs.log",
wrapper:
"v2.2.1/bio/cooltools/eigs_trans"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
[“Output files can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.”]
Software dependencies¶
ucsc-bedgraphtobigwig
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- (optional) phasing track file
Output:
- vecs
- lams
- bigwig
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputtrack_col_name
: Name of the column in the track file to useextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
## Extract arguments
# view = snakemake.input.get("view", "") # Not yet implemented
# if view:
# view = f"--view {view}"
view = ""
track = snakemake.input.get("track", "")
track_col_name = snakemake.params.get("track_col_name", "")
if track and track_col_name:
track = f"--phasing-track {track}::{track_col_name}"
elif track:
track = f"--phasing-track {track}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
bigwig = snakemake.output.get("bigwig", "")
if bigwig:
bigwig = "--bigwig"
resolution = snakemake.params.get("resolution", snakemake.wildcards.get("resolution"))
assert (
resolution
), "Please specify resolution either as a `wildcard` or as a `parameter`"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"cooltools eigs-trans"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {track}"
" {view} " # Not yet implemented, hardcoded to ""
" {bigwig}"
" {extra} "
" -o {tmpdir}/out"
" {log}"
)
shell("mv {tmpdir}/out.trans.vecs.tsv {snakemake.output.vecs}")
shell("mv {tmpdir}/out.trans.lam.txt {snakemake.output.lam}")
if bigwig:
shell("mv {tmpdir}/out.trans.bw {snakemake.output.bigwig}")
COOLTOOLS EXPECTED_CIS¶
Calculate cis expected for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_expected_cis:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
view="mm9_view.txt", ## File with the region names and coordinates
output:
"CN_{resolution,[0-9]+}.cis.expected.tsv",
params:
## Add optional parameters
extra="", ## File with the chromosome names and lengths
threads: 4
log:
"logs/CN_{resolution}_cis_expected.log",
wrapper:
"v2.2.1/bio/cooltools/expected_cis"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A .tsv file with mean interaction frequency at each diagonal. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
shell(
"(cooltools expected-cis"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {view} "
" {extra} "
" -p {snakemake.threads} "
" -o {snakemake.output}) {log}"
)
COOLTOOLS EXPECTED_TRANS¶
Calculate trans expected for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_expected_trans:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
view="mm9_view.txt", ## File with the region names and coordinates
output:
"{sample}_{resolution,[0-9]+}.trans.expected.tsv",
params:
## Add optional parameters
extra="",
threads: 4
log:
"logs/{sample}_{resolution}_trans_expected.log",
wrapper:
"v2.2.1/bio/cooltools/expected_trans"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A .tsv file with mean interaction frequency between chromosomes. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
shell(
"(cooltools expected-trans"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {view} "
" -p {snakemake.threads} "
" {extra} "
" -o {snakemake.output}) {log}"
)
COOLTOOLS GENOME_BINNIFY¶
Split chromosomes into equal sized bins
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_genome_binnify:
input:
chromsizes="hg38_chromsizes.txt", ## Chromsizes file
output:
"hg38_1000000_bins.bed",
params:
binsize=1000000,
threads: 1
log:
"logs/binnify.log",
wrapper:
"v2.2.1/bio/cooltools/genome/binnify"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.2
Input/Output¶
Input:
- a chromsizes file
Output:
- A .bed file with bin coordinates. Can have a {binsize} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
binsize
: size of bins in bp. Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
binsize = snakemake.params.get("binsize", snakemake.wildcards.get("binsize", 0))
if not binsize:
raise ValueError("Please specify binsize either as a wildcard or as a parameter")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(cooltools genome binnify"
" {snakemake.input.chromsizes} {binsize} "
" {extra} "
" > {snakemake.output}) {log}"
)
COOLTOOLS GENOME_GC¶
Calculate GC content for a genome in bins
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_genome_gc:
input:
bins="ASM584v2/bins_100000.bed", # 100000 bins
fasta="ASM584v2/ASM584v2.fa", # genome fasta for E. coli
output:
"gc_100000.tsv",
params:
extra="",
threads: 1
log:
"logs/gc.log",
wrapper:
"v2.2.1/bio/cooltools/genome/gc"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.2
Input/Output¶
Input:
- .bed file with bin coordinates
- fasta file with the genome sequence
Output:
- A tsv file with GC content in bins
Params¶
extra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(cooltools genome gc"
" {snakemake.input.bins} {snakemake.input.fasta} {extra} > {snakemake.output})"
" {log} "
)
COOLTOOLS INSULATION¶
Calculate insulation score for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_insulation:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
view="mm9_view.txt", ## File with the region names and coordinates
output:
"CN_{resolution,[0-9]+}.insulation.tsv",
params:
## Add optional parameters
window=[10000000, 12000000], ## In this example, we test with two window sizes
chunksize=20000000, ## How many pixels are loaded in memory at once
threads: 4 ## Number of threads to use
log:
"logs/CN_{resolution}_insulation.log",
wrapper:
"v2.2.1/bio/cooltools/insulation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A .tsv file with insulation score and called boundaries for all window sizes. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
window
: Window size for insulation score calculation, in bp. Can be a list of multiple sizes, then all are calculated in one goresolution
: Optional, can be instead specified as a wildcard in the outputchunksize
: How many pixels to process in each chunkextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
import sndhdr
from snakemake.shell import shell
## Extract arguments
window = snakemake.params.get("window", "")
if isinstance(window, list):
window = " ".join([str(w) for w in window])
else:
window = str(window)
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
chunksize = snakemake.params.get("chunksize", 20000000)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
shell(
"(cooltools insulation"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {window} --chunksize {chunksize} "
" {view} "
" -p {snakemake.threads} "
" {extra} "
" -o {snakemake.output}) {log}"
)
COOLTOOLS PILEUP¶
Pileup features for a resolution in an .mcool file
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_pileup:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
features="CN.mm9.toy_features.bed", ## Feature file
expected="CN.mm9.toy_expected.tsv", ## Expected file
view="CN.mm9.toy_regions.bed", ## File with the region names and coordinates
output:
"CN_{resolution,[0-9]+}.pileup.npz",
params:
## Add optional parameters
features_format="bed", ## Format of the features file
extra="--aggregate mean", ## Add extra parameters
threads: 4
log:
"logs/CN_{resolution}_pileup.log",
wrapper:
"v2.2.1/bio/cooltools/pileup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- a file with features to pileup
- (optional) file with expected
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- A file (.npz or .h5) with piled up snippets. Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
expected = snakemake.input.get("expected", "")
if expected:
expected = f"--expected {expected}"
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
shell(
"(cooltools pileup"
" {snakemake.input.cooler}::resolutions/{resolution}"
" {snakemake.input.features}"
" {expected}"
" --features-format {snakemake.params.features_format}"
" {view}"
" -p {snakemake.threads}"
" {extra}"
" -o {snakemake.output}) {log}"
)
COOLTOOLS SADDLE¶
Calculate a saddle for a resolution in an .mcool file using a track
URL: https://github.com/open2c/cooltools
Example¶
This wrapper can be used in the following way:
rule cooltools_saddle:
input:
cooler="CN.mm9.1000kb.mcool", ## Multiresolution cooler file
track="CN_1000000.eigs.tsv", ## Track file
expected="CN_1000000.cis.expected.tsv", ## Expected file
view="mm9_view.txt", ## File with the region names and coordinates
output:
saddle="CN_{resolution,[0-9]+}.saddledump.npz",
digitized_track="CN_{resolution,[0-9]+}.digitized.tsv",
fig="CN_{resolution,[0-9]+}.saddle.pdf",
params:
## Add optional parameters
range="--qrange 0.01 0.99",
extra="",
log:
"logs/CN_{resolution}_saddle.log",
wrapper:
"v2.2.1/bio/cooltools/saddle"
# Note that in this test files are edited to remove
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
cooltools=0.5.4
Input/Output¶
Input:
- a multiresolution cooler file (.mcool)
- track file
- expected file
- (optional) view, a bed-style file with region coordinates and names to use for analysis
Output:
- Saves a binary .npz file with saddles and extra information about it, and a track file with digitized values. Can also save saddle plots using extra –fig argument. All output files have the same prefix, taken from the first output argument (i.e. enough to give one output argument). Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.
Params¶
range
: What range of values from the track to use. Typically used to ignore outliers. –qrange 0 1 will use all data (default) –qrange 0.01 0.99 will ignore first and last percentile –range 0 5 will use values from 0 to 5resolution
: Optional, can be instead specified as a wildcard in the outputextra
: Any additional arguments to pass
Authors¶
- Ilya Flyamer
Code¶
__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
import tempfile
## Extract arguments
view = snakemake.input.get("view", "")
if view:
view = f"--view {view}"
track = snakemake.input.get("track", "")
track_col_name = snakemake.params.get("track_col_name", "")
if track and track_col_name:
track = f"{track}::{track_col_name}"
expected = snakemake.input.get("expected", "")
range = snakemake.params.get("range", "--qrange 0 1")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
resolution = snakemake.params.get(
"resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
raise ValueError("Please specify resolution either as a wildcard or as a parameter")
fig = snakemake.output.get("fig", "")
if fig:
ext = path.splitext(fig)[1][1:]
fig = f"--fig {ext}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(cooltools saddle"
" {snakemake.input.cooler}::resolutions/{resolution} "
" {track} "
" {expected} "
" {view} "
" {range} "
" {fig} "
" {extra} "
" -o {tmpdir}/out)"
" {log}"
)
shell("mv {tmpdir}/out.saddledump.npz {snakemake.output.saddle}")
shell("mv {tmpdir}/out.digitized.tsv {snakemake.output.digitized_track}")
if fig:
shell("mv {tmpdir}/out.{ext} {snakemake.output.fig}")
CUTADAPT¶
For cutadapt, the following wrappers are available:
CUTADAPT-PE¶
Trim paired-end reads using cutadapt.
URL: https://github.com/marcelm/cutadapt
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
output:
fastq1="trimmed/{sample}.1.fastq",
fastq2="trimmed/{sample}.2.fastq",
qc="trimmed/{sample}.qc.txt",
params:
# https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
adapters="-a AGAGCACACGTCTGAACTCCAGTCAC -g AGATCGGAAGAGCACACGT -A AGAGCACACGTCTGAACTCCAGTCAC -G AGATCGGAAGAGCACACGT",
# https://cutadapt.readthedocs.io/en/stable/guide.html#
extra="--minimum-length 1 -q 20",
log:
"logs/cutadapt/{sample}.log",
threads: 4 # set desired number of threads here
wrapper:
"v2.2.1/bio/cutadapt/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- The adapters param allows for separatelly specifying adapter options (optional).
Software dependencies¶
cutadapt=4.4
Input/Output¶
Input:
- two (paired-end) fastq files
Output:
- two trimmed (paired-end) fastq files
- text file containing trimming statistics
Authors¶
- Julian de Ruiter
- David Laehnemann
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
n = len(snakemake.input)
assert n == 2, "Input must contain 2 (paired-end) elements."
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
assert (
extra != "" or adapters != ""
), "No options provided to cutadapt. Please use 'params: adapters=' or 'params: extra='."
shell(
"cutadapt"
" --cores {snakemake.threads}"
" {adapters}"
" {extra}"
" -o {snakemake.output.fastq1}"
" -p {snakemake.output.fastq2}"
" {snakemake.input}"
" > {snakemake.output.qc} {log}"
)
CUTADAPT-SE¶
Trim single-end reads using cutadapt.
URL: https://github.com/marcelm/cutadapt
Example¶
This wrapper can be used in the following way:
rule cutadapt:
input:
"reads/{sample}.fastq",
output:
fastq="trimmed/{sample}.fastq",
qc="trimmed/{sample}.qc.txt",
params:
adapters="-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
extra="-q 20",
log:
"logs/cutadapt/{sample}.log",
threads: 4 # set desired number of threads here
wrapper:
"v2.2.1/bio/cutadapt/se"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- The adapters param allows for separatelly specifying adapter options (optional).
Software dependencies¶
cutadapt=4.4
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming single-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
n = len(snakemake.input)
assert n == 1, "Input must contain 1 (single-end) element."
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
assert (
extra != "" or adapters != ""
), "No options provided to cutadapt. Please use 'params: adapters=' or 'params: extra='."
shell(
"cutadapt"
" --cores {snakemake.threads}"
" {adapters}"
" {extra}"
" -o {snakemake.output.fastq}"
" {snakemake.input[0]}"
" > {snakemake.output.qc} {log}"
)
DADA2¶
For dada2, the following wrappers are available:
DADA2_ADD_SPECIES¶
DADA2
Adding species-level annotation using dada2 addSpecies
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_add_species:
input:
taxtab="results/dada2/taxa.RDS", # Taxonomic assignments
refFasta="resources/example_species_assignment.fa.gz" # Reference FASTA
output:
"results/dada2/taxa-sp.RDS", # Taxonomic + Species assignments
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/add-species/add-species.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/add-species"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
taxa
: RDS file containing the taxonomic assignmentsrefFasta
: A string with the path to the FASTA reference database
Output:
- The input RDS file augmented by the species-level annotation
Params¶
optional arguments for ``addSpecies()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for adding species-level
# annotation using dada2 assignTaxonomy function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
taxtab = readRDS(snakemake@input[["taxtab"]]),
refFasta = snakemake@input[["refFasta"]]
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::addSpecies()")
}
# Learn errors rates for both read types
taxa.sp<-do.call(addSpecies, args)
# Store the taxonomic assignments as a RDS file
saveRDS(taxa.sp, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_ASSIGN_SPECIES¶
DADA2
Classifying sequences against a reference database using dada2 assignSpecies
function. Optional parameters are documented in the manual and an example of the function can be found in the dedicated section of the DADA2 website.
Example¶
This wrapper can be used in the following way:
rule dada2_assign_species:
input:
seqs="results/dada2/seqTab.nochim.RDS", # Chimera-free sequence table
refFasta="resources/species.fasta" # Reference FASTA for Genus-Species taxonomy
output:
"results/dada2/genus-species-taxa.RDS" # Genus-Species taxonomic assignments
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# allowMultiple=True
log:
"logs/dada2/assign-species/assign-species.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/assign-species"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
seqs
: RDS file with the chimera-free sequence tablerefFasta
: A string with the path to the genus-species FASTA reference database
Output:
- RDS file containing the genus and species taxonomic assignments
Params¶
optional arguments for ``assignTaxonomy()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for exact matching of sequences against
# a genus-species reference database using dada2 assignSpecies function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
seqs = readRDS(snakemake@input[["seqs"]]),
refFasta = snakemake@input[["refFasta"]]
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::assignSpecies()")
}
# Perform Genus-Species taxonomic assignments
taxa<-do.call(assignSpecies, args)
# Store the taxonomic assignments as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_ASSIGN_TAXONOMY¶
DADA2
Classifying sequences against a reference database using dada2 assignTaxonomy
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_assign_taxonomy:
input:
seqs="results/dada2/seqTab.nochim.RDS", # Chimera-free sequence table
refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
output:
"results/dada2/taxa.RDS" # Taxonomic assignments
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/assign-taxonomy/assign-taxonomy.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/assign-taxonomy"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
seqs
: RDS file with the chimera-free sequence tablerefFasta
: A string with the path to the FASTA reference database
Output:
- RDS file containing the taxonomic assignments
Params¶
optional arguments for ``assignTaxonomy()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for classifying sequences against
# a reference database using dada2 assignTaxonomy function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
seqs = readRDS(snakemake@input[["seqs"]]),
refFasta = snakemake@input[["refFasta"]],
multithread=snakemake@threads
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::assignTaxonomy()")
}
# Learn errors rates for both read types
taxa<-do.call(assignTaxonomy, args)
# Store the taxonomic assignments as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_COLLAPSE_NOMISMATCH¶
DADA2
Combine together sequences that are identical up to shifts and/or indels using dada2 collapseNoMismatch
function. Optional parameters are documented in the manual. While the function is not included in the tutorial, feel free to browse the dada2 issues for showcases.
Example¶
This wrapper can be used in the following way:
rule dada2_collapse_nomismatch:
input:
"results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
output:
"results/dada2/seqTab.collapsed.RDS"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/collapse-nomismatch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- RDS file with the chimera-free sequence table
Output:
- RDS file with the sequence table where the needed sequences were collapsed
Params¶
optional arguments for ``collapseNoMismatch()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for combining together sequences that are identical
# up to shifts and/or indels using dada2 collapseNoMismatch function
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
seqtab = readRDS(snakemake@input[[1]])
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::collapseNoMismatch()")
}
# Collapse sequences
taxa<-do.call(collapseNoMismatch, args)
# Store the resulting table as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_DEREPLICATE_FASTQ¶
DADA2
Dereplication of FASTQ files using dada2 derepFastq
function. Optional parameters are documented in the manual and though the function is not introduced explicitly in the tutorial it is used in under the hood in the learnErrors section.
Example¶
This wrapper can be used in the following way:
rule dada2_dereplicate_fastq:
input:
# Quality filtered FASTQ file
"filtered/{fastq}.fastq"
output:
# Dereplicated sequences stored as `derep-class` object in a RDS file
"uniques/{fastq}.RDS"
log:
"logs/dada2/dereplicate-fastq/{fastq}.log"
wrapper:
"v2.2.1/bio/dada2/dereplicate-fastq"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Params¶
optional arguments for ``derepFastq()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for dereplicating FASTQ files using dada2 derepFastq function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list( fls = unlist(snakemake@input))
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::derepFastq()")
}
# Dereplicate
uniques<-do.call(derepFastq, args)
# Store as RDS file
saveRDS(uniques,snakemake@output[[1]])
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_FILTER_TRIM¶
DADA2
Quality filtering of single or paired-end reads using dada2 filterAndTrim
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_filter_trim_se:
input:
# Single-end files without primers sequences
fwd="trimmed/{sample}.1.fastq.gz"
output:
filt="filtered-se/{sample}.1.fastq.gz",
stats="reports/dada2/filter-trim-se/{sample}.tsv"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
params:
# Set the maximum expected errors tolerated in filtered reads
maxEE=1,
# Set the number of kept bases to 7 for the toy example
truncLen=7,
# Set minLen to 1 for the toy example but default is 20
minLen=1
log:
"logs/dada2/filter-trim-se/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/filter-trim"
rule dada2_filter_trim_pe:
input:
# Paired-end files without primers sequences
fwd="trimmed/{sample}.1.fastq",
rev="trimmed/{sample}.2.fastq"
output:
filt="filtered-pe/{sample}.1.fastq.gz",
filt_rev="filtered-pe/{sample}.2.fastq.gz",
stats="reports/dada2/filter-trim-pe/{sample}.tsv"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
params:
# Set the maximum expected errors tolerated in filtered reads
maxEE=1,
# Set the number of kept bases in forward and reverse reads
# respectively to 7 for the toy example
truncLen=[7,6],
# Set minLen to 1 for the toy example but default is 20
minLen=1
log:
"logs/dada2/filter-trim-pe/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/filter-trim"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
fwd
: a forward FASTQ file (potentially compressed) without primer sequencesrev
: an (optional) reverse FASTQ file (potentially compressed) without primer sequences
Output:
filt
: a compressed filtered forward FASTQ filefilt_rev
: an (optional) compressed filtered reverse FASTQ filestats
: a .tsv file with the number of processed and filtered reads per sample
Params¶
optional arguments for ``filterAndTrim()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for filtering single or paired-end reads using dada2 filterAndTrim function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
fwd = snakemake@input[["fwd"]],
filt = snakemake@output[["filt"]],
multithread=snakemake@threads
)
# Test if paired end input is passed
if(!is.null(snakemake@input[["rev"]]) & !is.null(snakemake@output[["filt_rev"]])){
args<-c(args,
rev = snakemake@input[["rev"]],
filt.rev = snakemake@output[["filt_rev"]]
)
}
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Check if 'compress=' option is passed
if(!is.null(extra[["compress"]])){
stop("Remove the `compress=` option from `params`.\n",
"The `compress` option is implicitly set here from the file extension.")
} else {
# Check if output files are given as compressed files
# ex: in se version, all(TRUE, NULL) gives TRUE
compressed <- c(
endsWith(args[["filt"]], '.gz'),
if(is.null(args[["filt.rev"]])) NULL else {endsWith(args[["filt.rev"]], 'gz')}
)
if ( all(compressed) ) {
extra[["compress"]] <- TRUE
} else if ( any(compressed) ) {
stop("Either all or no fastq output should be compressed. Please check `output.filt` and `output.filt_rev` for consistency.")
} else {
extra[["compress"]] <- FALSE
}
}
# Add them to the list of arguments
args<-c(args, extra)
} else {
message("No optional parameters. Using default parameters from dada2::filterAndTrim()")
}
# Call the function with arguments
filt.stats<-do.call(filterAndTrim, args)
# Write processed reads report
write.table(filt.stats, snakemake@output[["stats"]], sep="\t", quote=F)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_LEARN_ERRORS¶
DADA2
Learning error rates separately on paired-end data using dada2 learnErrors
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule learn_pe:
# Run twice dada2_learn_errors: on forward and on reverse reads
input: expand("results/dada2/model_{orientation}.RDS", orientation=[1,2])
rule dada2_learn_errors:
input:
# Quality filtered and trimmed forward FASTQ files (potentially compressed)
expand("filtered/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
output:
err="results/dada2/model_{orientation}.RDS",# save the error model
plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# randomize=True
log:
"logs/dada2/learn-errors/learn-errors_{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/learn-errors"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- A list of quality filtered and trimmed forward FASTQ files (potentially compressed)
Output:
err
: RDS file with the stored error modelplot
: plot observed vs estimated errors rates
Params¶
optional arguments for ``learnErrors()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for learning error rates on sequence data using dada2 learnErrors function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
fls = snakemake@input,
multithread=snakemake@threads
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using defaults parameters from dada2::learnErrors()")
}
# Learn errors rates for both read types
err<-do.call(learnErrors, args)
# Plot estimated versus observed error rates to validate models
perr<-plotErrors(err, nominalQ = TRUE)
# Save the plots
library(ggplot2)
ggsave(snakemake@output[["plot"]], perr, width = 8, height = 8, dpi = 300)
# Store the estimated errors as RDS files
saveRDS(err, snakemake@output[["err"]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_MAKE_TABLE¶
DADA2
Build a sequence - sample table from denoised samples using dada2 makeSequenceTable
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_make_table_se:
input:
# Inferred composition
expand("denoised/{sample}.1.RDS", sample=['a','b'])
output:
"results/dada2/seqTab-se.RDS"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
params:
names=['a','b'] # Sample names instead of paths
log:
"logs/dada2/make-table/make-table-se.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/make-table"
rule dada2_make_table_pe:
input:
# Merged composition
expand("merged/{sample}.RDS", sample=['a','b'])
output:
"results/dada2/seqTab-pe.RDS"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
params:
names=['a','b'], # Sample names instead of paths
orderBy="nsamples" # Change the ordering of samples
log:
"logs/dada2/make-table/make-table-pe.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/make-table"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- A list of RDS files with denoised samples (se), or denoised and merged samples (pe)
Output:
- RDS file with the table
Params¶
names
: A list of sample names instead of pathsparams
: Any other optional arguments formakeSequenceTable()
, please provide them as pythonkey=value
pairs
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for building a sequence - sample table from denoised samples using dada2 makeSequenceTable function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# If names are provided use them
nm<-if(is.null(snakemake@params[["names"]])) NULL else snakemake@params[["names"]]
# From a list of n lists to one named list of n elements
smps<-setNames(
object=unlist(snakemake@input),
nm=nm
)
# Read the RDS into the list
smps<-lapply(smps, readRDS)
# Prepare arguments (no matter the order)
args<-list( samples = smps)
# Check if extra params are passed (apart from [["names"]])
if(length(snakemake@params) > 1 ){
# Keeping only the named elements of the list for do.call() (apart from [["names"]])
extra<-snakemake@params[ names(snakemake@params) != "" & names(snakemake@params) != "names" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::makeSequenceTable()")
}
# Make table
seqTab<-do.call(makeSequenceTable, args)
# Store the table as a RDS file
saveRDS(seqTab, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_MERGE_PAIRS¶
DADA2
Merging denoised forward and reverse reads using dada2 mergePairs
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_merge_pairs:
input:
dadaF="denoised/{sample}.1.RDS",# Inferred composition
dadaR="denoised/{sample}.2.RDS",
derepF="uniques/{sample}.1.RDS",# Dereplicated sequences
derepR="uniques/{sample}.2.RDS"
output:
"merged/{sample}.RDS"
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/merge-pairs/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/merge-pairs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
dadaF
: RDS file with the inferred sample composition from forward readsdadaR
: reversederepF
: RDS file with the dereplicated forward readsderepR
: reverse
Output:
- RDS file with the merged pairs
Params¶
optional arguments for ``mergePairs()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for merging denoised forward and reverse reads using dada2 mergePairs function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
dadaF = snakemake@input[["dadaF"]],
derepF = snakemake@input[["derepF"]],
dadaR = snakemake@input[["dadaR"]],
derepR = snakemake@input[["derepR"]]
)
# Read RDS from the list
args<-sapply(args,readRDS)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::mergePairs()")
}
# Merge pairs
merger<-do.call(mergePairs, args)
# Store the estimated errors as RDS files
saveRDS(merger, snakemake@output[[1]],compress = T)
# Close the connection for the log file
sink(type="message")
sink()
DADA2_QUALITY_PROFILES¶
DADA2
Plotting the quality profile of reads using dada2 plotQualityProfile
function. The function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_quality_profile_se:
input:
# FASTQ file without primers sequences
"trimmed/{sample}.{orientation}.fastq"
output:
"reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png"
log:
"logs/dada2/quality-profile/{sample}.{orientation}-quality-profile-se.log"
wrapper:
"v2.2.1/bio/dada2/quality-profile"
rule dada2_quality_profile_pe:
input:
# FASTQ file without primers sequences
expand("trimmed/{{sample}}.{orientation}.fastq",orientation=[1,2])
output:
"reports/dada2/quality-profile/{sample}-quality-profile.png"
log:
"logs/dada2/quality-profile/{sample}-quality-profile-pe.log"
wrapper:
"v2.2.1/bio/dada2/quality-profile"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- a FASTQ file (potentially compressed) without primers sequences
Output:
- A PNG file of the quality plot
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for plotting the quality profile of reads using dada2 plotQualityProfile function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Plot the quality profile for a given FASTQ file or a list of files
pquality<-plotQualityProfile(unlist(snakemake@input))
# Write the plots to files
library(ggplot2)
ggsave(snakemake@output[[1]], pquality, width = 4, height = 3, dpi = 300)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_REMOVE_CHIMERAS¶
DADA2
Remove chimera sequences from the sequence table data using dada2 removeBimeraDenovo
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_remove_chimeras:
input:
"results/dada2/seqTab.RDS" # Sequence table
output:
"results/dada2/seqTab.nochim.RDS" # Chimera-free sequence table
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/remove-chimeras/remove-chimeras.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/remove-chimeras"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- RDS file with the sequence table
Output:
- RDS file with the chimera-free sequence table
Params¶
optional arguments for ``removeBimeraDenovo()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for removing chimeras sequences from
# the sequence table data using dada2 removeBimeraDenovo function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
unqs = readRDS(snakemake@input[[1]]),
multithread=snakemake@threads
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::removeBimeraDenovo()")
}
# Remove chimeras
seqTab_nochimeras<-do.call(removeBimeraDenovo, args)
# Store the estimated errors as RDS files
saveRDS(seqTab_nochimeras, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DADA2_SAMPLE_INFERENCE¶
DADA2
Inferring sample composition using dada2 dada
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule dada2_sample_inference:
input:
# Dereplicated (aka unique) sequences of the sample
derep="uniques/{fastq}.RDS",
err="results/dada2/model_1.RDS" # Error model
output:
"denoised/{fastq}.RDS" # Inferred sample composition
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# verbose=True
log:
"logs/dada2/sample-inference/{fastq}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/sample-inference"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
derep
: RDS file with the dereplicated sequenceserr
: RDS file with the error model
Output:
- RDS file with the stored inferred sample composition
Params¶
optional arguments for ``dada()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for inferring sample composition using dada2 dada function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
derep = readRDS(snakemake@input[["derep"]]),
err = readRDS(snakemake@input[["err"]]),
multithread = snakemake@threads
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::dada()")
}
# Learn errors rates for both read types
inferred_composition<-do.call(dada, args)
# Store the inferred sample composition as RDS files
saveRDS(inferred_composition, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()
DATAVZRD¶
datavzrd allows to render tables by providing a configuration file. Configuration templates can be dynamically customized by utilizing the rendering integration. Any files specified in the configuration file have to be also specified as additional input files in the datavzrd rule.
URL: https://github.com/datavzrd/datavzrd
Example¶
This wrapper can be used in the following way:
rule datavzrd:
input:
config="resources/{sample}.datavzrd.yaml",
# optional files required for rendering the given config
table="data/A.tsv",
params:
extra="",
output:
report(
directory("results/datavzrd-report/{sample}"),
htmlindex="index.html",
# see https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html
# for additional options like caption, categories and labels
),
log:
"logs/datavzrd_report/{sample}.log",
wrapper:
"v2.2.1/utils/datavzrd"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
datavzrd=2.21.1
Authors¶
- Felix Mölder
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("datavzrd {snakemake.input.config} {extra} --output {snakemake.output[0]} {log}")
DEEPTOOLS¶
For deeptools, the following wrappers are available:
DEEPTOOLS ALIGNMENT-SIEVE¶
filters/shift alignments in a BAM/CRAM file according the the specified parameters. It can optionally output to BEDPE format.
URL: https://deeptools.readthedocs.io/en/develop/content/tools/alignmentSieve.html
Example¶
This wrapper can be used in the following way:
rule test_deeptools_alignment_sieve:
input:
aln="a.bam",
output:
"filtered.bam",
threads: 1
log:
"logs/deeptools.log",
params:
extra="",
wrapper:
"v2.2.1/bio/deeptools/alignmentsieve"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Input/output formats are automatically detected.
Software dependencies¶
deeptools=3.5.1
Input/Output¶
Input:
aln
: Path to BAM/CRAM formatted alignments. Bam filesm ust be indexed.
Output:
- Path to filtered bam alignments or bedpe intervals.
Params¶
extra
: Optional arguments for alignmentSieve.py
Authors¶
- Thibault Dayris
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
blacklist = snakemake.input.get("blacklist", "")
if blacklist:
extra += f" --blackListFileName {blacklist} "
out_file = snakemake.output[0]
if out_file.endswith(".bed"):
extra += " --BED "
shell(
"alignmentSieve "
"{extra} "
"--numberOfProcessors {snakemake.threads} "
"--bam {snakemake.input.aln} "
"--outFile {out_file} "
"{log} "
)
DEEPTOOLS BAMCOVERAGE¶
deepTools bamcoverage
takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output. For more information about deepTools
, also see the source code.
URL: https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html?highlight=bamcoverage
Example¶
This wrapper can be used in the following way:
rule test_deeptools_bamcoverage:
input:
bam="a.sorted.bam",
bai="a.sorted.bam.bai",
# Optional path to a blacklist bed file
# blacklist="",
output:
"a.coverage.bw",
params:
effective_genome_size=1000,
extra="",
log:
"logs/coverage.log",
wrapper:
"v2.2.1/bio/deeptools/bamcoverage"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
deeptools=3.5.2
Input/Output¶
Input:
bam
: Path to alignment (BAM) fileblacklist
: Path to optional blacklist region file (BED)
Output:
- Path to coverage file
Params¶
effective_genome_size
: Optional effective genome size valuegenome
: Optional parameter used to fill effective genome size with pre-computed parameters. Can only be one of GRCm37, GRCm38, GRCh37, GRCh38, dm3, dm6, WBcel235, or GRCz10.read_length
: Optional parameter used to fill effective genome size with pre-computed parameters. Can only be one of 50, 75, 100, 150, or 200.extra
: Optional parameters to be given to deepTools bamcoverage
Authors¶
- Thibault Dayris
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# See: https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html
default_effective_genome_size = {
"GRCz10": {
"50": 1195445591,
"75": 1251132686,
"100": 1280189044,
"150": 1312207169,
"200": 1321355241,
},
"WBcel235": {
"50": 95159452,
"75": 96945445,
"100": 98259998,
"150": 98721253,
"200": 98672758,
},
"dm3": {
"50": 130428560,
"75": 135004462,
"100": 139647232,
"150": 144307808,
"200": 148524010,
},
"dm6": {
"50": 125464728,
"75": 127324632,
"100": 129789873,
"150": 129941135,
"200": 132509163,
},
"GRCh37": {
"50": 2685511504,
"75": 2736124973,
"100": 2776919808,
"150": 2827437033,
"200": 2855464000,
},
"GRCh38": {
"50": 2701495761,
"75": 2747877777,
"100": 2805636331,
"150": 2862010578,
"200": 2887553303,
},
"GRCm37": {
"50": 2304947926,
"75": 2404646224,
"100": 2462481010,
"150": 2489384235,
"200": 2513019276,
},
"GRCm38": {
"50": 2308125349,
"75": 2407883318,
"100": 2467481108,
"150": 2494787188,
"200": 2520869189,
},
}
effective_genome_size = snakemake.params.get("effective_genome_size")
if not effective_genome_size:
genome = snakemake.params.get("genome")
read_length = snakemake.params.get("read_length")
if genome and read_length:
effective_genome_size = "--effectiveGenomeSize "
effective_genome_size += default_effective_genome_size[genome][str(read_length)]
else:
effective_genome_size = "--effectiveGenomeSize " + str(effective_genome_size)
output_format = ""
bigwig_format = ["bw", "bigwig"]
bedgraph_format = ["bg", "bedgraph"]
output_ext = str(snakemake.output[0]).split(".")[-1].lower()
if output_ext in bigwig_format:
output_format = "bigwig"
elif output_ext in bedgraph_format:
output_format = "bedgraph"
else:
raise ValueError("Output file should be either a bigwig or a bedgraph file")
blacklist = snakemake.input.get("blacklist", "")
if blacklist:
blacklist = "--blackListFileName " + blacklist
shell(
"bamCoverage "
"{blacklist} {extra} "
"--numberOfProcessors {snakemake.threads} "
"{effective_genome_size} "
"--bam {snakemake.input.bam} "
"--outFileName {snakemake.output} "
"--outFileFormat {output_format} "
"{log} "
)
DEEPTOOLS COMPUTEMATRIX¶
deepTools computeMatrix
calculates scores per genomic region. The matrix file can be used as input for other tools or for the generation of a deepTools plotHeatmap
or deepTools plotProfiles
. For usage information about deepTools computeMatrix
, please see the documentation. For more information about deepTools
, also see the source code.
computeMatrix option Output format Name of output
variable to be used
Recommended
extension
–outFileName, -out, -o gzipped matrix file matrix_gz
(required)
“.gz” –outFileNameMatrix tab-separated table of
matrix file
matrix_tab “.tab” –outFileSortedRegions BED matrix file with sorted
regions after skipping zeros
or min/max threshold values
matrix_bed “.bed”
URL: https://deeptools.readthedocs.io/en/develop/content/tools/computeMatrix.html
Example¶
This wrapper can be used in the following way:
rule compute_matrix:
input:
# Please note that the -R and -S options are defined via input files
bed=expand("{sample}.bed", sample=["a", "b"]),
bigwig=expand("{sample}.bw", sample=["a", "b"]),
# Optional blacklist file
# blacklist="",
output:
# Please note that --outFileName, --outFileNameMatrix and --outFileSortedRegions are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/computematrix.html.
matrix_gz="matrix_files/matrix.gz", # required
# optional output files
matrix_tab="matrix_files/matrix.tab",
matrix_bed="matrix_files/matrix.bed",
log:
"logs/deeptools/compute_matrix.log",
params:
# required argument, choose "scale-regions" or "reference-point"
command="scale-regions",
# optional parameters
extra="--regionBodyLength 200 --verbose",
wrapper:
"v2.2.1/bio/deeptools/computematrix"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
[‘Bigwig DeepBlue URL, if any, should be given in params.extra, or downloaded separately.’]
Software dependencies¶
deeptools=3.5.2
Input/Output¶
Input:
bed
: Path to BED or GTF files (.bed or .gtf) ANDbigwig
: Path to bigWig files (.bw)
Output:
matrix_gz
: gzipped matrix file (.gz) AND/ORmatrix_tab
: tab-separated table of matrix file (.tab) AND/ORmatrix_bed
: BED matrix file with sorted regions after skiping zeros or min/max threshold values (.bed)
Params¶
command
: Either scale-regions or reference-pointextra
: Optional parameters given to computeMatrix
Authors¶
- Antonie Vietor
- Thibault Dayris
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
blacklist = snakemake.input.get("blacklist", "")
if blacklist:
blacklist = f"--blackListFileName {blacklist}"
out_tab = snakemake.output.get("matrix_tab")
out_bed = snakemake.output.get("matrix_bed")
optional_output = ""
if out_tab:
optional_output += " --outFileNameMatrix {out_tab} ".format(out_tab=out_tab)
if out_bed:
optional_output += " --outFileSortedRegions {out_bed} ".format(out_bed=out_bed)
with TemporaryDirectory() as tempdir:
temp = ""
if "deepBlueURL" in snakemake.params.extra:
temp = f"--deepBlueTempDir {tempdir}"
shell(
"computeMatrix "
"{snakemake.params.command} "
"{snakemake.params.extra} "
"--numberOfProcessors {snakemake.threads} "
"-R {snakemake.input.bed} "
"-S {snakemake.input.bigwig} "
"-o {snakemake.output.matrix_gz} "
"{blacklist} {optional_output} {temp} {log}"
)
DEEPTOOLS PLOTCOVERAGE¶
deepTools plotCoverage
assess the sequencing depth of given samples. For more information about deepTools
, also see the source code.
URL: https://deeptools.readthedocs.io/en/develop/content/tools/plotCoverage.html
Example¶
This wrapper can be used in the following way:
rule test_deeptools_plotcoverage:
input:
# Optional blacklisted regions
# blacklist="",
# Optional region file
# bed="",
bams=["a.bam"],
bais=["a.bam.bai"],
output:
plot="coverage.png",
# Optional raw counts
raw_counts="coverage.raw",
# Optional metrics
metrics="coverage.metrics",
params:
extra="--coverageThresholds 1",
log:
"logs/deeptools/coverage.log"
wrapper:
"v2.2.1/bio/deeptools/plotcoverage"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
deeptools=3.5.2
Input/Output¶
Input:
bams
: Path to alignment (BAM)bed
: Path to region file (BED)blacklist
: Path to blacklisted regions (BED)
Output:
raw_counts
: Raw coverage plotmetrics
: Raw coverage metricsplot
: Path to image
Authors¶
- Thibault Dayris
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
bed = snakemake.input.get("bed", "")
if bed:
bed = " --BED " + bed
raw_counts = snakemake.output.get("raw_counts", "")
if raw_counts:
raw_counts = " --outRawCounts " + raw_counts
metrics = snakemake.output.get("metrics", "")
if metrics:
metrics = " --outCoverageMetrics " + metrics
if not "--coverageThresholds" in extra:
raise ValueError(
"Coverage metrics without a `--coverageThresholds` in "
"extra parameters will result in an empty file. Please "
"provide `--coverageThresholds` or remove "
"metrics file from expected output files."
)
blacklist = snakemake.input.get("blacklist", "")
if blacklist:
blacklist = " --blackListFileName " + blacklist
accepted_extensions = ["eps", "png", "svg", "pdf"]
out_image_extension = str(snakemake.output["plot"]).split(".")[-1]
if out_image_extension not in accepted_extensions:
raise ValueError(
"Wrong image format: {ext}, expected: {expected}".format(
ext=out_image_extension, expected=str(accepted_extensions)
)
)
shell(
"plotCoverage "
"{extra} {bed} {raw_counts} {metrics} {blacklist} "
"--numberOfProcessors {snakemake.threads} "
"--bamfiles {snakemake.input.bams} "
"--plotFile {snakemake.output.plot} "
"--plotFileFormat {out_image_extension} "
" {log}"
)
DEEPTOOLS PLOTFINGERPRINT¶
deepTools plotFingerprint
plots a profile of cumulative read coverages from a list of indexed BAM files. For usage information about deepTools plotFingerprint
, please see the documentation. For more information about deepTools
, also see the source code.
In addition to required output, an optional output file of read counts can be generated by setting the output variable “counts” (see example Snakemake rule below). Also an optional output file of quality control metrics can be generated by setting the variable “qc_metrics”. If the jsd_sample is specified in the input, the results of the Jensen-Shannon distance calculation are also written to this file.
plotFingerprint option Output Name of output
variable to be used
Recommended
extension(s)
–plotFile, -plot, -o coverage plot fingerprint
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outRawCounts tab-separated table of read
counts per bin
counts “.tab” –outQualityMetrics tab-separated table of metrics
for quality control and for
results of Jensen-Shannon
distance calculation (optional)
metrics “.txt”
Example¶
This wrapper can be used in the following way:
rule plot_fingerprint:
input:
bam_files=expand("samples/{sample}.bam", sample=["a", "b"]),
bam_idx=expand("samples/{sample}.bam.bai", sample=["a", "b"]),
jsd_sample="samples/b.bam" # optional, requires qc_metrics output
output:
# Please note that --plotFile and --outRawCounts are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotfingerprint.html.
fingerprint="plot_fingerprint/plot_fingerprint.png", # required
# optional output
counts="plot_fingerprint/raw_counts.tab",
qc_metrics="plot_fingerprint/qc_metrics.txt"
log:
"logs/deeptools/plot_fingerprint.log"
params:
# optional parameters
"--numberOfSamples 200 "
threads:
8
wrapper:
"v2.2.1/bio/deeptools/plotfingerprint"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
deeptools=3.5.2
Input/Output¶
Input:
- list of BAM files (.bam) AND
- list of their index files (.bam.bai)
Output:
- plot file in image format (.png, .eps, .pdf or .svg)
- tab-separated table of read counts per bin (.tab) (optional)
- tab-separated table of metrics and JSD calculation (.txt) (optional)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
import re
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
jsd_sample = snakemake.input.get("jsd_sample")
out_counts = snakemake.output.get("counts")
out_metrics = snakemake.output.get("qc_metrics")
optional_output = ""
jsd = ""
if jsd_sample:
jsd += " --JSDsample {jsd} ".format(jsd=jsd_sample)
if out_counts:
optional_output += " --outRawCounts {out_counts} ".format(out_counts=out_counts)
if out_metrics:
optional_output += " --outQualityMetrics {metrics} ".format(metrics=out_metrics)
shell(
"(plotFingerprint "
"-b {snakemake.input.bam_files} "
"-o {snakemake.output.fingerprint} "
"{optional_output} "
"--numberOfProcessors {snakemake.threads} "
"{jsd} "
"{snakemake.params}) {log}"
)
# ToDo: remove the 'NA' string replacement when fixed in deepTools, see:
# https://github.com/deeptools/deepTools/pull/999
regex_passes = 2
with open(out_metrics, "rt") as f:
metrics = f.read()
for i in range(regex_passes):
metrics = re.sub("\tNA(\t|\n)", "\tnan\\1", metrics)
with open(out_metrics, "wt") as f:
f.write(metrics)
DEEPTOOLS PLOTHEATMAP¶
deepTools plotHeatmap
creates a heatmap for scores associated with genomic regions. As input, it requires a matrix file generated by deepTools computeMatrix
. For usage information about deepTools plotHeatmap
, please see the documentation. For more information about deepTools
, also see the source code.
You can select which optional output files are generated by adding the respective output variable with the recommended extension(s) for them (see example Snakemake rule below).
PlotHeatmap option Output Name of output
variable to be used
Recommended
extension(s)
–outFileName, -out, -o plot image heatmap_img
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outFileSortedRegions BED file with
sorted regions
regions “.bed” –outFileNameMatrix tab-separated matrix
of values underlying
the heatmap
heatmap_matrix “.tab”
Example¶
This wrapper can be used in the following way:
rule plot_heatmap:
input:
# matrix file from deepTools computeMatrix tool
"matrix.gz"
output:
# Please note that --outFileSortedRegions and --outFileNameMatrix are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotheatmap.html.
heatmap_img="plot_heatmap/heatmap.png", # required
# optional output files
regions="plot_heatmap/heatmap_regions.bed",
heatmap_matrix="plot_heatmap/heatmap_matrix.tab"
log:
"logs/deeptools/heatmap.log"
params:
# optional parameters
"--plotType=fill "
wrapper:
"v2.2.1/bio/deeptools/plotheatmap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
deeptools=3.5.1
Input/Output¶
Input:
- gzipped matrix file from
deepTools computeMatrix
(.gz)
Output:
- plot file in image format (.png, .eps, .pdf or .svg) AND/OR
- file with sorted regions after skipping zeros or min/max threshold values (.bed) AND/OR
- tab-separated table for average profile (.tab)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_region = snakemake.output.get("regions")
out_matrix = snakemake.output.get("heatmap_matrix")
optional_output = ""
if out_region:
optional_output += " --outFileSortedRegions {out_region} ".format(
out_region=out_region
)
if out_matrix:
optional_output += " --outFileNameMatrix {out_matrix} ".format(
out_matrix=out_matrix
)
shell(
"(plotHeatmap "
"-m {snakemake.input[0]} "
"-o {snakemake.output.heatmap_img} "
"{optional_output} "
"{snakemake.params}) {log}"
)
DEEPTOOLS PLOTPROFILE¶
deepTools plotProfile
plots scores over sets of genomic regions. As input, it requires a matrix file generated by deepToolscomputeMatrix
. For usage information about deepTools plotProfile
, please see the documentation. For more information about deepTools
, also see the source code.
You can select which optional output files are generated by adding the respective output variable with the recommended extension for them (see example Snakemake rule below).
PlotProfile option Output Name of output
variable to be used
Recommended
extension(s)
–outFileName, -out, -o profile plot plot_img
(required)
“.png” or
“.eps” or
“.pdf” or
“.svg”
–outFileSortedRegions BED file with
sorted regions
regions “.bed” –outFileNameData tab-separated table
for average profile
data “.tab”
Example¶
This wrapper can be used in the following way:
rule plot_profile:
input:
# matrix file from deepTools computeMatrix tool
"matrix.gz"
output:
# Please note that --outFileSortedRegions and --outFileNameData are exclusively defined via output files.
# Usable output variables, their extensions and which option they implicitly call are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/deeptools/plotprofile.html.
# Through the output variables image file and more output options for plot profile can be selected.
plot_img="plot_profile/plot.png", # required
# optional output files
regions="plot_profile/regions.bed",
data="plot_profile/data.tab"
log:
"logs/deeptools/plot_profile.log"
params:
# optional parameters
"--plotType=fill "
"--perGroup "
"--colors red yellow blue "
"--dpi 150 "
wrapper:
"v2.2.1/bio/deeptools/plotprofile"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
deeptools=3.5.2
Input/Output¶
Input:
- gzipped matrix file from
deepTools computeMatrix
(.gz)
Output:
- plot file in image format (.png, .eps, .pdf or .svg) AND/OR
- file with sorted regions after skipping zeros or min/max threshold values (.bed) AND/OR
- tab-separated table for average profile (.tab)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_region = snakemake.output.get("regions")
out_data = snakemake.output.get("data")
optional_output = ""
if out_region:
optional_output += " --outFileSortedRegions {out_region} ".format(
out_region=out_region
)
if out_data:
optional_output += " --outFileNameData {out_data} ".format(out_data=out_data)
shell(
"(plotProfile "
"-m {snakemake.input[0]} "
"-o {snakemake.output.plot_img} "
"{optional_output} "
"{snakemake.params}) {log}"
)
DEEPVARIANT¶
Call genetic variants using deep neural network. Copyright 2017 Google LLC. BSD 3-Clause “New” or “Revised” https://github.com/google/deepvariant
Example¶
This wrapper can be used in the following way:
rule deepvariant:
input:
bam="mapped/{sample}.bam",
ref="genome/genome.fasta"
output:
vcf="calls/{sample}.vcf.gz"
params:
model="wgs", # {wgs, wes, pacbio, hybrid}
sample_name=lambda w: w.sample, # optional
extra=""
threads: 2
log:
"logs/deepvariant/{sample}/stdout.log"
wrapper:
"v2.2.1/bio/deepvariant"
rule deepvariant_gvcf:
input:
bam="mapped/{sample}.bam",
ref="genome/genome.fasta"
output:
vcf="gvcf_calls/{sample}.vcf.gz",
gvcf="gvcf_calls/{sample}.g.vcf.gz"
params:
model="wgs", # {wgs, wes, pacbio, hybrid}
extra=""
threads: 2
log:
"logs/deepvariant/{sample}/stdout.log"
wrapper:
"v2.2.1/bio/deepvariant"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param alllows for additional program arguments.
- This snakemake wrapper uses bioconda deepvariant package. Copyright 2018 Brad Chapman.
Software dependencies¶
deepvariant=1.4
numpy=1.23
Authors¶
- Tetsuro Hisayoshi
- Nikos Tsardakas Renhuldt
Code¶
__author__ = "Tetsuro Hisayoshi"
__copyright__ = "Copyright 2020, Tetsuro Hisayoshi"
__email__ = "hisayoshi0530@gmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
log_dir = os.path.dirname(snakemake.log[0])
output_dir = os.path.dirname(snakemake.output[0])
# sample name defaults to basename
sample_name = snakemake.params.get(
"sample_name", os.path.splitext(os.path.basename(snakemake.input.bam))[0]
)
make_examples_gvcf = postprocess_gvcf = ""
gvcf = snakemake.output.get("gvcf", None)
if gvcf:
make_examples_gvcf = "--gvcf {tmp_dir} "
postprocess_gvcf = (
"--gvcf_infile {tmp_dir}/{sample_name}.gvcf.tfrecord@{snakemake.threads}.gz "
"--gvcf_outfile {snakemake.output.gvcf} "
)
with tempfile.TemporaryDirectory() as tmp_dir:
shell(
"(dv_make_examples.py "
"--cores {snakemake.threads} "
"--ref {snakemake.input.ref} "
"--reads {snakemake.input.bam} "
"--sample {sample_name} "
"--examples {tmp_dir} "
"--logdir {log_dir} " + make_examples_gvcf + "{extra} \n"
"dv_call_variants.py "
"--cores {snakemake.threads} "
"--outfile {tmp_dir}/{sample_name}.tmp "
"--sample {sample_name} "
"--examples {tmp_dir} "
"--model {snakemake.params.model} \n"
"dv_postprocess_variants.py "
"--ref {snakemake.input.ref} "
+ postprocess_gvcf
+ "--infile {tmp_dir}/{sample_name}.tmp "
"--outfile {snakemake.output.vcf} ) {log}"
)
DELLY¶
Call variants with delly.
URL: https://github.com/dellytools/delly
Example¶
This wrapper can be used in the following way:
rule delly_bcf:
input:
ref="genome.fasta",
alns=["mapped/a.bam"],
# optional
exclude="human.hg19.excl.tsv",
output:
"sv/calls.bcf",
params:
uncompressed_bcf=True,
extra="", # optional parameters for delly (except -g, -x)
log:
"logs/delly.log",
threads: 2 # It is best to use as many threads as samples
wrapper:
"v2.2.1/bio/delly"
rule delly_vcfgz:
input:
ref="genome.fasta",
alns=["mapped/a.bam"],
# optional
exclude="human.hg19.excl.tsv",
output:
"sv/calls.vcf.gz",
params:
extra="", # optional parameters for delly (except -g, -x)
log:
"logs/delly.log",
threads: 2 # It is best to use as many threads as samples
wrapper:
"v2.2.1/bio/delly"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The uncompressed_bcf param sets output to uncompressed BCF (ignored if output is vcf or vcf.gz)
- The extra param allows for additional program arguments
Software dependencies¶
delly=1.1.6
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- BAM/CRAM file(s)
- reference genome
- BED file (optional)
Output:
- VCF/BCF with SVs.
Authors¶
- Johannes Köster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
bcftools_opts = get_bcftools_opts(snakemake, parse_ref=False, parse_memory=False)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
exclude = snakemake.input.get("exclude", "")
if exclude:
exclude = f"-x {exclude}"
shell(
"(OMP_NUM_THREADS={snakemake.threads} delly call"
" -g {snakemake.input.ref}"
" {exclude}"
" {extra}"
" {snakemake.input.alns} | "
# Convert output to specified format
"bcftools view"
" {bcftools_opts}"
") {log}"
)
DESEQ2¶
For deseq2, the following wrappers are available:
DESEQDATASET¶
Create a DESeqDataSet object from either, a tximport SummarizedExperiment, a directory containing HTSeq counts, a sample table containing paths to count matrices, or a RangedSummarizedExperiment object. Then optionally run DESeq2 pre-filtering.
URL: https://bioconductor.org/packages/3.16/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#input-data
Example¶
This wrapper can be used in the following way:
rule test_DESeqDataSet_filtering:
input:
dds="dataset/dds.RDS",
output:
"dds_minimal.RDS",
threads: 1
log:
"logs/DESeqDataSet/txi.log",
params:
formula="~condition", # Required R statistical formula
factor="condition", # Optionally used for relevel
reference_level="A", # Optionally used for relevel
tested_level="B", # Optionally used for relevel
min_counts=0, # Optionally used to filter low counts
extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
rule test_DESeqDataSet_from_tximport:
input:
txi="dataset/txi.RDS",
colData="coldata.tsv",
output:
"dds_txi.RDS",
threads: 1
log:
"logs/DESeqDataSet/txi.log",
params:
formula="~condition", # Required R statistical formula
# factor="condition", # Optionally used for relevel
# reference_level="A", # Optionally used for relevel
# tested_level="B", # Optionally used for relevel
# min_counts=0, # Optionally used to filter low counts
# extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
rule test_DESeqDataSet_from_ranged_se:
input:
se="dataset/se.RDS",
output:
"dds_se.RDS",
threads: 1
log:
"logs/DESeqDataSet/se.log",
params:
formula="~condition", # Required R statistical formula
# factor="condition", # Optionally used for relevel
# reference_level="A", # Optionally used for relevel
# tested_level="B", # Optionally used for relevel
# min_counts=0, # Optionally used to filter low counts
# extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
rule test_DESeqDataSet_from_r_matrix:
input:
matrix="dataset/matrix.RDS",
colData="coldata.tsv",
output:
"dds_rmatrix.RDS",
threads: 1
log:
"logs/DESeqDataSet/r_matrix.log",
params:
formula="~condition", # Required R statistical formula
# factor="condition", # Optionally used for relevel
# reference_level="A", # Optionally used for relevel
# tested_level="B", # Optionally used for relevel
# min_counts=0, # Optionally used to filter low counts
# extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
rule test_DESeqDataSet_from_tsv_matrix:
input:
counts="dataset/counts.tsv",
colData="coldata.tsv",
output:
"dds_matrix.RDS",
threads: 1
log:
"logs/DESeqDataSet/txt_matrix.log",
params:
formula="~condition", # Required R statistical formula
# factor="condition", # Optionally used for relevel
# reference_level="A", # Optionally used for relevel
# tested_level="B", # Optionally used for relevel
# min_counts=0, # Optionally used to filter low counts
# extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
rule test_DESeqDataSet_from_htseqcount:
input:
htseq_dir="dataset/htseq_dir",
sample_table="sample_table.tsv",
output:
"dds_htseq.RDS",
threads: 1
log:
"logs/DESeqDataSet/txt_matrix.log",
params:
formula="~condition", # Required R statistical formula
# factor="condition", # Optionally used for relevel
# reference_level="A", # Optionally used for relevel
# tested_level="B", # Optionally used for relevel
# min_counts=0, # Optionally used to filter low counts
# extra="", # Optional parameters provided to import function
wrapper:
"v2.2.1/bio/deseq2/deseqdataset"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-tximport=1.26.0
r-readr=2.1.4
r-jsonlite=1.8.5
bioconductor-deseq2=1.38.0
Input/Output¶
Input:
colData
: Path to the file describing the experiment design (TSV formatted file). First column contains sample names.dds
: Path to the DESeqDataSet object (RDS formatted file) ORtxi
: Path to the tximport/tximeta SummarizedExperiment object (RDS formatted file) ORse
: Path to the RangedSummarizedExperiment object (RDS formatted file) ORmatrix
: Path to the R `matrix(…) ` containing counts. Sample names must be in rownames. (RDS formatted file) ORcounts
: Path to the text matrix containing counts. Sample names should be in the first column. (TSV formatted file) ORhtseq_dir
: Path to the directory containing HTSeq/FeatureCount count matrices ANDsample_table
: Path to the table containing sample names and path to HTSeq/FeatureCount count matrices
Output:
- Path to the DESeqDataSet object (RDS formatted file)
Params¶
formula
: Required.reference_level
: Optional reference level name, in case relevel is neededtested_level
: Optional tested level name, in case relevel is neededfactor
: Factor of interest, in case relevel is neededmin_count
: Minimum number of counted/estimated reads threshold (do not filter by default)extra
: Optional argument passed to DESeq2, apart from txi, colData, design, htseq, directory, se, sampleTable, or tidy.
Authors¶
- Thibault Dayris
Code¶
# __author__ = "Thibault Dayris"
# __copyright__ = "Copyright 2023, Thibault Dayris"
# __email__ = "thibault.dayris@gustaveroussy.fr"
# __license__ = "MIT"
# This script builds a deseq2 dataset from a range of possible input
# files. It also performs relevel if needed,
# as well as count filtering.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]], open = "wt")
base::sink(log.file)
base::sink(log.file, type = "message")
# Loading libraries (order matters)
base::library(package = "tximport", character.only = TRUE)
base::library(package = "readr", character.only = TRUE)
base::library(package = "jsonlite", character.only = TRUE)
base::library(package = "DESeq2", character.only = TRUE)
base::message("Libraries loaded")
# A small function to add user-defined parameters
# if and only if this parameter is not null **and** not
# empty (R does not like trailing commas on function calls)
add_extra <- function(wrapper_defined) {
if ("extra" %in% base::names(snakemake@params)) {
# Then user defined optional parameters
user_defined <- snakemake@params[["extra"]]
if ((user_defined != "") && inherits(user_defined, "character")) {
# Then there paremters are non-empty characters
base::return(
base::paste(
wrapper_defined,
user_defined,
sep = ", "
)
)
}
}
# Case user did not provide any optional parameter
# or did provide a non/empty character value
base::return(wrapper_defined)
}
colData <- NULL
if ("colData" %in% base::names(snakemake@input)) {
# Load colData
colData <- utils::read.table(
file = snakemake@input[["colData"]],
header = TRUE,
row.names = 1,
sep = "\t",
stringsAsFactors = FALSE
)
base::print(head(colData))
}
# Cast formula from string to R formula
formula <- stats::as.formula(object = snakemake@params[["formula"]])
base::print(formula)
dds_command <- NULL
# Case user provides a Tximport/Tximeta object
if ("txi" %in% base::names(x = snakemake@input)) {
if (base::is.null(colData)) {
base::stop(
"When a `txi` dataset is provided in input,",
" then a `colData` is expected"
)
}
# Loading tximport object
txi <- base::readRDS(file = snakemake@input[["txi"]])
# Acquiring user-defined optional parameters
dds_parameters <- add_extra(
wrapper_defined = "txi = txi, colData = colData, design = formula"
)
# Building command line
dds_command <- base::paste0(
"DESeq2::DESeqDataSetFromTximport(",
dds_parameters,
")"
)
# Case user provides a RangesSummarizedExperiment object
} else if ("se" %in% base::names(x = snakemake@input)) {
# Loading RangedSummarizedExperiment object
se <- base::readRDS(file = snakemake@input[["se"]])
# Acquiring user-defined optional parameters
dds_parameters <- add_extra(
wrapper_defined = "se = se, design = formula, ignoreRank = FALSE"
)
# Building command line
dds_command <- base::paste0(
"DESeq2::DESeqDataSet(",
dds_parameters,
")"
)
# Case user provides HTSeq-Count/Feature-Count input files
} else if ("htseq_dir" %in% base::names(x = snakemake@input)) {
# Casting path in case it contains only numbers
hts_dir <- base::as.character(x = snakemake@input[["htseq_dir"]])
base::message(hts_dir)
# Loading sample table, and casting factors
sample_table <- utils::read.table(
file = snakemake@input[["sample_table"]],
sep = "\t",
header = TRUE,
stringsAsFactors = TRUE
)
# The columns `sampleName` and `fileName`
# are expected to be characters, while the rest
# (if any) is supposed to be factors.
sample_table$sampleName <- base::lapply(
sample_table$sampleName, base::as.character
)
sample_table$fileName <- base::lapply(
sample_table$fileName, base::as.character
)
# Acquiring user-defined optional parameters
dds_parameters <- add_extra(
"sampleTable = sample_table, directory = hts_dir, design = formula"
)
# Building command line
dds_command <- base::paste0(
"DESeq2::DESeqDataSetFromHTSeqCount(",
dds_parameters,
")"
)
# Case user provides an R count matrix as input
} else if ("matrix" %in% base::names(x = snakemake@input)) {
if (base::is.null(colData)) {
base::stop(
"When a R `matrix` is provided in input,",
" then a `colData` is expected"
)
}
# Loading RangedSummarizedExperiment object
count_matrix <- base::readRDS(file = snakemake@input[["matrix"]])
base::print(head(count_matrix))
# Acquiring user-defined optional parameters
dds_parameters <- add_extra(
"countData = count_matrix, colData = colData, design = formula"
)
# Building command line
dds_command <- base::paste0(
"DESeq2::DESeqDataSetFromMatrix(",
dds_parameters,
")"
)
# Case user provides a TSV count matrix as input
} else if ("counts" %in% base::names(x = snakemake@input)) {
if (base::is.null(colData)) {
base::stop(
"When `counts` are provided in input, then a `colData` is expected"
)
}
# Loading count table
count_matrix <- utils::read.table(
file = snakemake@input[["counts"]],
header = TRUE,
se = "\t",
row.names = 1,
stringsAsFactors = FALSE
)
base::print(head(count_matrix))
# Acquiring user-defined optional parameters
dds_parameters <- add_extra(
"countData = count_matrix, colData = colData, design = formula"
)
# Building command line
dds_command <- base::paste0(
"DESeq2::DESeqDataSetFromMatrix(",
dds_parameters,
")"
)
# Case user provides a DDS object to filter
} else if ("dds" %in% base::names(x = snakemake@input)) {
# Loading count table
dds_path <- base::as.character(
x = snakemake@input[["dds"]]
)
# Building command line
dds_command <- "base::readRDS(file = dds_path)"
} else {
base::stop("Error: No counts provided !")
}
base::message("Command line used to build DESeqDataSet object:")
base::message(dds_command)
dds <- base::eval(base::parse(text = dds_command))
# Dropping unused factors and ensuring level ranks on user demand
is_factor <- "factor" %in% base::names(x = snakemake@params)
is_reference <- "reference_level" %in% base::names(x = snakemake@params)
is_test <- "tested_level" %in% base::names(x = snakemake@params)
if (is_factor && is_reference && is_test) {
# Casting characters in case of factors/levels being numbers
factor_name <- base::as.character(
x = snakemake@params[["factor"]]
)
reference_name <- base::as.character(
x = snakemake@params[["reference_level"]]
)
test_name <- base::as.character(
x = snakemake@params[["tested_level"]]
)
# Actual relevel
levels <- c(reference_name, test_name)
dds[[factor_name]] <- base::factor(
dds[[factor_name]], levels = levels
)
dds[[factor_name]] <- stats::relevel(
dds[[factor_name]], ref = reference_name
)
dds[[factor_name]] <- base::droplevels(dds[[factor_name]])
base::message(
"Factors have been relevel-ed. Reference level: `",
reference_name,
"`, tested level: `",
test_name,
"`. Factor of interest: `",
factor_name,
"`. Other levels have been filtered out."
)
} else {
base::message(
"No relevel performed, since either `factor`, `reference_level`,",
" and/or `tested_level` are missing in `snakemake@params`."
)
}
# Dropping null counts (or below threshold) on user demand
if ("min_count" %in% base::names(x = snakemake@params)) {
# Casting count filter since integer/numeric cannot be compared
# to double, and other number-like types in R (depending on R version)
count_filter <- base::as.double(x = snakemake@params[["min_count"]])
base::message(
"Genes with less than ",
count_filter,
" estimated/counted reads are filtered out."
)
keep <- rowSums(counts(dds)) >= count_filter
dds <- dds[keep, ]
} else {
base::message(
"No count filtering performed since `min_count` is missing ",
"in `snakemake@params`"
)
}
# Saving DESeqDataSet object
base::saveRDS(
object = dds,
file = base::as.character(x = snakemake@output[[1]])
)
base::message("DDS object saved, process over")
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
base::sink(type = "message")
base::sink()
DESEQ2¶
Call differentially expressed genes with DESeq2
URL: https://bioconductor.org/packages/3.16/bioc/html/DESeq2.html
Example¶
This wrapper can be used in the following way:
rule test_deseq2_wald:
input:
dds="dds.RDS",
output:
wald_rds="wald.RDS",
wald_tsv="dge.tsv",
deseq2_result_dir=directory("deseq_results"),
normalized_counts_table="counts.tsv",
normalized_counts_rds="counts.RDS",
params:
deseq_extra="",
shrink_extra="",
results_extra="",
contrast=["condition", "A", "B"],
threads: 1
log:
"logs/deseq2.log",
wrapper:
"v2.2.1/bio/deseq2/wald"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-deseq2=1.38.0
bioconductor-biocparallel=1.32.5
r-ashr=2.2_54
Input/Output¶
Input:
dds
: Path to RDS-formatted DESeq2-object
Output:
wald_rds
: Optional path to wald test results (RDS formatted)wald_tsv
: Optional path to wald test results (TSV formatted). Required optional parameter contrast (see below)deseq2_result_dir
: Optional path to a directory that shall contain all DESeq2 results for each comparison (each file is TSV formatted)normalized_counts_table
: Optional path to normalized counts (TSV formatted)normalized_counts_rds
: Optional path to normalized counts (RDS formatted)
Params¶
deseq_extra
: Optional parameters provided to the function DESeq()schrink_extra
: Optional parameters provided to the function lfSchrink()results_extra
: Optional parameters provided to the function result()contrast
: List of characters. See notes below.
Authors¶
- Thibault Dayris
Code¶
# This script takes a deseq2 dataset object, performs
# a DESeq2 wald test, and saves results as requested by user
# __author__ = "Thibault Dayris"
# __copyright__ = "Copyright 2023, Thibault Dayris"
# __email__ = "thibault.dayris@gustaveroussy.fr"
# __license__ = "MIT"
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log_file <- base::file(description = snakemake@log[[1]], open = "wt")
base::sink(file = log_file)
base::sink(file = log_file, type = "message")
# Loading libraries (order matters)
base::library(package = "BiocParallel", character.only = TRUE)
base::library(package = "SummarizedExperiment", character.only = TRUE)
base::library(package = "DESeq2", character.only = TRUE)
base::library(package = "ashr", character.only = TRUE)
# Function to handle optional user-defined parameters
# and still follow R syntax
add_extra <- function(wrapper_extra, snakemake_param_name) {
if (snakemake_param_name %in% base::names(snakemake@params)) {
# Case user provides snakemake_param_name in snakemake rule
user_param <- snakemake@params[[snakemake_param_name]]
param_is_empty <- user_param == ""
param_is_character <- inherits(x = user_param, what = "charcter")
if ((! param_is_empty) && (param_is_character)) {
# Case user do not provide an empty string
# (R does not like trailing commas at the end
# of a function call)
wrapper_extra <- base::paste(
wrapper_extra,
user_param,
sep = ", "
)
} # Nothing to do if user provides an empty / NULL parameter value
} # Nothing to do if user did not provide snakemake_param_name
# In any case, required parameters must be returned
base::return(wrapper_extra)
}
# Setting up multithreading if required
parallel <- FALSE
if (snakemake@threads > 1) {
BiocParallel::register(
BPPARAM = BiocParallel::MulticoreParam(snakemake@threads)
)
parallel <- TRUE
}
# Load DESeq2 dataset
dds_path <- base::as.character(x = snakemake@input[["dds"]])
dds <- base::readRDS(file = dds_path)
base::message("Libraries and dataset loaded")
# Build extra parameters for DESeq2
extra_deseq2 <- add_extra(
wrapper_extra = "object = dds, test = 'Wald', parallel = parallel",
snakemake_param_name = "deseq_extra"
)
deseq2_cmd <- base::paste0(
"DESeq2::DESeq(", extra_deseq2, ")"
)
base::message("DESeq2 command line:")
base::message(deseq2_cmd)
# Running DESeq2::DESeq for wald test result
wald <- base::eval(base::parse(text = deseq2_cmd))
# The rest of the script is here to save part or complete
# list of results in RDS or plain text (TSV) formats.
# Save main result on user request (RDS)
# This includes counts, wald tests for all levels
# assays, design, etc.
if ("wald_rds" %in% base::names(x = snakemake@output)) {
output_rds <- base::as.character(x = snakemake@output[["wald_rds"]])
base::saveRDS(obj = wald, file = output_rds)
base::message("Wald test saved as RDS file")
}
# Saving normalized counts on demand
table <- counts(wald)
# TSV-formatted count table
if ("normalized_counts_table" %in% base::names(snakemake@output)) {
output_table <- base::as.character(
x = snakemake@output[["normalized_counts_table"]]
)
utils::write.table(x = table, file = output_table, sep = "\t", quote = FALSE)
base::message("Normalized counts saved as TSV")
}
# RDS-formated count object with many information,
# including counts, assays, etc.
if ("normalized_counts_rds" %in% base::names(snakemake@output)) {
output_rds <- base::as.character(
x = snakemake@output[["normalized_counts_rds"]]
)
base::saveRDS(obj = table, file = output_rds)
base::message("Normalized counts saved as RDS")
}
# On user request: save all results as TSV in a directory.
# User can later access the directory content, e.g. with
# a snakemake checkpoint-rule.
if ("deseq2_result_dir" %in% base::names(snakemake@output)) {
# Acquire list of available results in DESeqDataSet
wald_results_names <- DESeq2::resultsNames(object = wald)
# Recovering extra parameters for TSV tables
# The variable `result_name` is built below in `for` loop.
results_extra <- add_extra(
wrapper_extra = "object = wald, name = result_name, parallel = parallel",
snakemake_param_name = "results_extra"
)
# DESeq2 result dir will contain all results available in the Wald object
output_prefix <- snakemake@output[["deseq2_result_dir"]]
if (! base::file.exists(output_prefix)) {
base::dir.create(path = output_prefix, recursive = TRUE)
}
# Building command lines for both wald results and fc schinkage
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")")
base::message("Command line used for TSV results creation:")
base::message(results_cmd)
shrink_extra <- add_extra(
"dds = wald, res = results_frame, contrast = contrast, parallel = parallel, type = 'ashr'",
"shrink_extra"
)
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")")
base::message("Command line used for log(FC) shrinkage:")
base::message(shrink_cmd)
# For each available comparison in the wald-dds object
for (result_name in wald_results_names) {
# Building table
base::message(base::paste("Saving results for", result_name))
results_frame <- base::eval(base::parse(text = results_cmd))
shrink_frame <- base::eval(base::parse(text = shrink_cmd))
results_frame$log2FoldChange <- shrink_frame$log2FoldChange
results_path <- base::file.path(
output_prefix,
base::paste0(result_name, ".tsv")
)
# Saving table
utils::write.table(
x = results_frame,
file = results_path,
quote = FALSE,
sep = "\t",
row.names = TRUE
)
}
}
# If user provides contrasts, then a precise result
# can be extracted from DESeq2 object.
if ("wald_tsv" %in% base::names(x = snakemake@output)) {
if ("contrast" %in% base::names(x = snakemake@params)) {
contrast_length <- base::length(x = snakemake@params[["contrast"]])
results_extra <- "object=wald, parallel = parallel"
contrast <- NULL
if (contrast_length == 1) {
# Case user provided a result name in the `contrast` parameter
contrast <- base::as.character(x = snakemake@params[["contrast"]])
contrast <- base::paste0("name='", contrast[1], "'")
} else if (contrast_length == 2) {
# Case user provided both tested and reference level
# In that order! Order matters.
contrast <- sapply(
snakemake@params[["contrast"]],
function(extra) base::as.character(x = extra)
)
contrast <- base::paste0(
"contrast=list('", contrast[1], "', '", contrast[2], "')"
)
} else if (contrast_length == 3) {
# Case user provided both tested and reference level,
# and studied factor.
contrast <- sapply(
snakemake@params[["contrast"]],
function(extra) base::as.character(x = extra)
)
contrast <- base::paste0(
"contrast=c('",
contrast[1],
"', '",
contrast[2],
"', '",
contrast[3],
"')"
)
# Finally saving results as contrast has been
# built from user input.
results_extra <- base::paste(results_extra, contrast, sep = ", ")
results_cmd <- base::paste0("DESeq2::results(", results_extra, ")")
base::message("Result extraction command: ", results_cmd)
shrink_extra <- add_extra(
"dds = wald, res = results_frame, contrast = contrast[1], parallel = parallel, type = 'ashr'",
"shrink_extra"
)
shrink_cmd <- base::paste0("DESeq2::lfcShrink(", shrink_extra, ")")
base::message("Command line used for log(FC) shrinkage:")
base::message(shrink_cmd)
results_frame <- base::eval(base::parse(text = results_cmd))
shrink_frame <- base::eval(base::parse(text = shrink_cmd))
results_frame$log2FoldChange <- shrink_frame$log2FoldChange
# Saving table
utils::write.table(
x = results_frame,
file = base::as.character(x = snakemake@output[["wald_tsv"]]),
quote = FALSE,
sep = "\t",
row.names = TRUE
)
}
} else {
base::stop(
"No contrast provided. ",
"In absence of contrast, it is not possible ",
"to guess the expected result name.",
)
}
}
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
base::sink(type = "message")
base::sink()
DIAMOND¶
For diamond, the following wrappers are available:
DIAMOND BLASTP¶
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. For documentation, see https://github.com/bbuchfink/diamond/wiki
Example¶
This wrapper can be used in the following way:
rule diamond_blastp:
input:
fname_fasta="{sample}.fasta", # Query fasta file
fname_db="db.dmnd", # Diamond db
output:
fname="{sample}.tsv.gz", # Output file
log:
"logs/diamond_blastp/{sample}.log",
params:
extra="--header --compress 1", # Additional arguments
threads: 8
wrapper:
"v2.2.1/bio/diamond/blastp"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
diamond=2.1.6
Authors¶
- Nikos Tsardakas Renhuldt
- Kim Philipp Jablonski
Code¶
__author__ = "Kim Philipp Jablonski, Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski, Nikos Tsardakas Renhuldt"
__email__ = "kim.philipp.jablonski@gmail.com, nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"diamond blastp"
" --threads {snakemake.threads}"
" --db {snakemake.input.fname_db}"
" --query {snakemake.input.fname_fasta}"
" --out {snakemake.output.fname}"
" {extra}"
" {log}"
)
DIAMOND BLASTX¶
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
Example¶
This wrapper can be used in the following way:
rule diamond_blastx:
input:
fname_fastq = "{sample}.fastq",
fname_db = "db.dmnd"
output:
fname = "{sample}.tsv.gz"
log:
"logs/diamond_blastx/{sample}.log"
params:
extra="--header --compress 1"
threads: 8
wrapper:
"v2.2.1/bio/diamond/blastx"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
diamond=2.1.6
Authors¶
- Kim Philipp Jablonski
Code¶
__author__ = "Kim Philipp Jablonski"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski"
__email__ = "kim.philipp.jablonski@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"diamond blastx"
" --threads {snakemake.threads}"
" --db {snakemake.input.fname_db}"
" --query {snakemake.input.fname_fastq}"
" --out {snakemake.output.fname}"
" {extra}"
" {log}"
)
DIAMOND MAKEDB¶
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
Example¶
This wrapper can be used in the following way:
rule diamond_makedb:
input:
fname = "{reference}.fasta",
output:
fname = "{reference}.dmnd"
log:
"logs/diamond_makedb/{reference}.log"
params:
extra=""
threads: 8
wrapper:
"v2.2.1/bio/diamond/makedb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
diamond=2.1.7
Authors¶
- Kim Philipp Jablonski
Code¶
__author__ = "Kim Philipp Jablonski"
__copyright__ = "Copyright 2020, Kim Philipp Jablonski"
__email__ = "kim.philipp.jablonski@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"diamond makedb"
" --threads {snakemake.threads}"
" --in {snakemake.input.fname}"
" --db {snakemake.output.fname}"
" {extra}"
" {log}"
)
DRAGMAP¶
For dragmap, the following wrappers are available:
DRAGMAP¶
Map reads with Dragmap.
URL: https://github.com/Illumina/DRAGMAP
Example¶
This wrapper can be used in the following way:
rule dragmap_align:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx="genome",
output:
"mapped/{sample}.bam",
log:
"logs/dragmap/{sample}.align.log",
params:
extra="",
sorting="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="queryname", # Can be 'queryname' or 'coordinate'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.2.1/bio/dragmap/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
dragmap=1.2
samtools=1.14
picard=2.26
snakemake-wrapper-utils=0.5.2
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from os import path
import re
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
samtools_opts = samtools_opts = get_samtools_opts(snakemake)
java_opts = get_java_opts(snakemake)
sort = snakemake.params.get("sorting", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
n = len(snakemake.input.reads)
assert (
n == 1 or n == 2
), "input->reads must have 1 (single-end) or 2 (paired-end) elements."
if n == 1:
reads = "-1 {}".format(*snakemake.input.reads)
else:
reads = "-1 {} -2 {}".format(*snakemake.input.reads)
index = snakemake.input.idx
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view {samtools_opts} {sort_extra} -"
elif sort == "samtools":
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {samtools_opts} {sort_extra} -"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {java_opts} {sort_extra} --INPUT /dev/stdin"
" --OUTPUT {snakemake.output[0]} --SORT_ORDER {sort_order} --TMP_DIR {tmpdir}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(dragen-os"
" --num-threads {snakemake.threads}"
" -r {snakemake.input.idx}"
" {reads}"
" {extra}"
" | " + pipe_cmd + ") {log}"
)
DRAGMAP¶
Build hash table for Dragmap read mapper.
URL: https://github.com/Illumina/DRAGMAP
Example¶
This wrapper can be used in the following way:
rule dragmap_build:
input:
ref="{genome}.fasta",
output:
idx=multiext(
"{genome}/",
"hash_table.cfg",
"hash_table.cfg.bin",
"hash_table.cmp",
"hash_table_stats.txt",
"reference.bin",
"ref_index.bin",
"repeat_mask.bin",
"str_table.bin",
),
log:
"logs/dragmap/{genome}.build.log",
params:
extra="",
threads: 2
wrapper:
"v2.2.1/bio/dragmap/build"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
dragmap=1.3.0
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Prefix that should be used for the database
prefix = Path(snakemake.output[0]).parent
shell(
"dragen-os"
" --ht-num-threads {snakemake.threads}"
" --build-hash-table true"
" --ht-reference {snakemake.input[0]}"
" --output-directory {prefix}"
" {extra}"
" {log}"
)
EPIC¶
For epic, the following wrappers are available:
EPIC¶
Find broad enriched domains in ChIP-Seq data with epic
Example¶
This wrapper can be used in the following way:
rule epic:
input:
treatment = "bed/test.bed",
background = "bed/control.bed"
output:
enriched_regions = "epic/enriched_regions.csv", # required
bed = "epic/enriched_regions.bed", # optional
matrix = "epic/matrix.gz" # optional
log:
"logs/epic/epic.log"
params:
genome = "hg19", # optional, default hg19
extra="-g 3 -w 200" # "--bigwig epic/bigwigs"
threads: 1 # optional, defaults to 1
wrapper:
"v2.2.1/bio/epic/peaks"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All/any of the different bigwig options must be given as extra parameters
Software dependencies¶
epic=0.2.12
pandas=1.1.5
Input/Output¶
Input:
treatment
: chip .bed(.gz/.bz) filesbackground
: input .bed(.gz/.bz) files
Output:
enriched_regions
: main output file with enriched peaksbed
: (optional) contains much of the same info as enriched_regions but in a bed format, suitable for viewing in the UCSC genome browser or downstream use with bedtoolsmatrix
: (optional) a gzipped matrix of read counts
Params¶
extra
: additional parameterslog
: (optional) file to write the log output to
Authors¶
- Endre Bakken Stovner
Code¶
__author__ = "Endre Bakken Stovner"
__copyright__ = "Copyright 2017, Endre Bakken Stovner"
__email__ = "endrebak85@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
threads = snakemake.threads or 1
treatment = snakemake.input.get("treatment")
background = snakemake.input.get("background")
# Executed shell command
enriched_regions = snakemake.output.get("enriched_regions")
bed = snakemake.output.get("bed")
matrix = snakemake.output.get("matrix")
if len(snakemake.log) > 0:
log = snakemake.log[0]
genome = snakemake.params.get("genome")
cmd = "epic -cpu {threads} -t {treatment} -c {background} -o {enriched_regions} -gn {genome}"
if bed:
cmd += " -b {bed}"
if matrix:
cmd += " -sm {matrix}"
if log:
cmd += " -l {log}"
cmd += " {extra}"
shell(cmd)
FASTP¶
trim and QC fastq reads with fastp
Example¶
This wrapper can be used in the following way:
rule fastp_se:
input:
sample=["reads/se/{sample}.fastq"]
output:
trimmed="trimmed/se/{sample}.fastq",
failed="trimmed/se/{sample}.failed.fastq",
html="report/se/{sample}.html",
json="report/se/{sample}.json"
log:
"logs/fastp/se/{sample}.log"
params:
adapters="--adapter_sequence ACGGCTAGCTA",
extra=""
threads: 1
wrapper:
"v2.2.1/bio/fastp"
rule fastp_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
# Unpaired reads separately
unpaired1="trimmed/pe/{sample}.u1.fastq",
unpaired2="trimmed/pe/{sample}.u2.fastq",
# or in a single file
# unpaired="trimmed/pe/{sample}.singletons.fastq",
merged="trimmed/pe/{sample}.merged.fastq",
failed="trimmed/pe/{sample}.failed.fastq",
html="report/pe/{sample}.html",
json="report/pe/{sample}.json"
log:
"logs/fastp/pe/{sample}.log"
params:
adapters="--adapter_sequence ACGGCTAGCTA --adapter_sequence_r2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
extra="--merge"
threads: 2
wrapper:
"v2.2.1/bio/fastp"
rule fastp_pe_wo_trimming:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
html="report/pe_wo_trimming/{sample}.html",
json="report/pe_wo_trimming/{sample}.json"
log:
"logs/fastp/pe_wo_trimming/{sample}.log"
params:
extra=""
threads: 2
wrapper:
"v2.2.1/bio/fastp"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The adapters param allows to specify adapter sequences
- The extra param allows for additional program arguments.
- For more inforamtion see, https://github.com/OpenGene/fastp
Software dependencies¶
fastp=0.23.4
Input/Output¶
Input:
- fastq file(s)
Output:
- trimmed fastq file(s)
- unpaired reads (optional; eihter in a single file or separate)
- merged reads (optional)
- failed reads (optional)
- json file containing trimming statistics
- html file containing trimming statistics
Authors¶
- Sebastian Kurscheid (sebastian.kurscheid@unibas.ch)
- Filipe G. Vieira
Code¶
__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"
from snakemake.shell import shell
import re
extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Assert input
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
# Input files
if n == 1:
reads = "--in1 {}".format(snakemake.input.sample)
else:
reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)
# Output files
trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths:
if n == 1:
trimmed = "--out1 {}".format(snakemake.output.trimmed)
else:
trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)
# Output unpaired files
unpaired = snakemake.output.get("unpaired", None)
if unpaired:
trimmed += f" --unpaired1 {unpaired} --unpaired2 {unpaired}"
else:
unpaired1 = snakemake.output.get("unpaired1", None)
if unpaired1:
trimmed += f" --unpaired1 {unpaired1}"
unpaired2 = snakemake.output.get("unpaired2", None)
if unpaired2:
trimmed += f" --unpaired2 {unpaired2}"
# Output merged PE reads
merged = snakemake.output.get("merged", None)
if merged:
if not re.search(r"--merge\b", extra):
raise ValueError(
"output.merged specified but '--merge' option missing from params.extra"
)
trimmed += f" --merged_out {merged}"
else:
trimmed = ""
# Output failed reads
failed = snakemake.output.get("failed", None)
if failed:
trimmed += f" --failed_out {failed}"
# Stats
html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)
shell(
"(fastp --thread {snakemake.threads} "
"{extra} "
"{adapters} "
"{reads} "
"{trimmed} "
"{json} "
"{html} ) {log}"
)
FASTQ_SCREEN¶
fastq_screen screens a library of sequences in FASTQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
This wrapper allows the configuration to be passed as a filename or as a dictionary in the rule’s params.fastq_screen_config of the rule. So the following configuration file:
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE2
DATABASE ecoli /data/Escherichia_coli/Bowtie2Index/genome BOWTIE
DATABASE hg19 /data/hg19/Bowtie2Index/genome BOWTIE2
DATABASE mm10 /data/mm10/Bowtie2Index/genome BOWTIE2
BOWTIE /path/to/bowtie
BOWTIE2 /path/to/bowtie2
becomes:
fastq_screen_config = {
'database': {
'ecoli': {
'bowtie2': '/data/Escherichia_coli/Bowtie2Index/genome',
'bowtie': '/data/Escherichia_coli/BowtieIndex/genome'},
'hg19': {
'bowtie2': '/data/hg19/Bowtie2Index/genome'},
'mm10': {
'bowtie2': '/data/mm10/Bowtie2Index/genome'}
},
'aligner_paths': {'bowtie': 'bowtie', 'bowtie2': 'bowtie2'}
}
By default, the wrapper will use bowtie2 as the aligner and a subset of 100000
reads. These can be overridden using params.aligner
and params.subset
respectively. Furthermore, params.extra can be used to pass additional
arguments verbatim to fastq_screen
, for example extra="--illumina1_3"
or
extra="--bowtie2 '--trim5=8'"
.
Example¶
This wrapper can be used in the following way:
rule fastq_screen:
input:
"samples/{sample}.fastq"
output:
txt="qc/{sample}.fastq_screen.txt",
png="qc/{sample}.fastq_screen.png"
params:
fastq_screen_config="fastq_screen.conf",
subset=100000,
aligner='bowtie2'
threads: 8
wrapper:
"v2.2.1/bio/fastq_screen"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
fastq_screen
hard-codes the output filenames. This wrapper moves the hard-coded output files to those specified by the rule.- While the dictionary form of
fastq_screen_config
is convenient, the unordered nature of the dictionary may causesnakemake --list-params-changed
to incorrectly report changed parameters even though the contents remain the same. If you plan on using--list-params-changed
then it will be better to write a config file and pass that as fastq_screen_config. This problem will disappear with Python 3.6. - When providing the dictionary form of
fastq_screen_config
, the wrapper will write a temp file using Python’stempfile
module. To control the temp file directory, make sure the $TMPDIR environmental variable is set (see the tempfile docs) for details). One way of doing this is by adding something likeshell.prefix("export TMPDIR=/scratch; ")
to the snakefile calling this wrapper.
Software dependencies¶
fastq-screen=0.15.3
bowtie2=2.5.1
bowtie=1.3.1
Input/Output¶
Input:
- A FASTQ file, gzipped or not.
Output:
txt
: a text file containing the fraction of reads mapping to each provided indexpng
: a bar plot of the contents oftxt
, saved as a PNG file
Authors¶
- Ryan Dale
Code¶
import os
import re
from snakemake.shell import shell
import tempfile
__author__ = "Ryan Dale"
__copyright__ = "Copyright 2016, Ryan Dale"
__email__ = "dalerr@niddk.nih.gov"
__license__ = "MIT"
_config = snakemake.params["fastq_screen_config"]
subset = snakemake.params.get("subset", 100000)
aligner = snakemake.params.get("aligner", "bowtie2")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
# snakemake.params.fastq_screen_config can be either a dict or a string. If
# string, interpret as a filename pointing to the fastq_screen config file.
# Otherwise, create a new tempfile out of the contents of the dict:
if isinstance(_config, dict):
tmp = tempfile.NamedTemporaryFile(delete=False).name
with open(tmp, "w") as fout:
for label, indexes in _config["database"].items():
for aligner, index in indexes.items():
fout.write(
"\t".join(["DATABASE", label, index, aligner.upper()]) + "\n"
)
for aligner, path in _config["aligner_paths"].items():
fout.write("\t".join([aligner.upper(), path]) + "\n")
config_file = tmp
else:
config_file = _config
# fastq_screen hard-codes filenames according to this prefix. We will send
# hard-coded output to a temp dir, and then move them later.
prefix = re.split(".fastq|.fq|.txt|.seq", os.path.basename(snakemake.input[0]))[0]
tempdir = tempfile.mkdtemp()
shell(
"fastq_screen --outdir {tempdir} "
"--force "
"--aligner {aligner} "
"--conf {config_file} "
"--subset {subset} "
"--threads {snakemake.threads} "
"{extra} "
"{snakemake.input[0]} "
"{log}"
)
# Move output to the filenames specified by the rule
shell("mv {tempdir}/{prefix}_screen.txt {snakemake.output.txt}")
shell("mv {tempdir}/{prefix}_screen.png {snakemake.output.png}")
# Clean up temp
shell("rm -r {tempdir}")
if isinstance(_config, dict):
shell("rm {tmp}")
FASTQC¶
Generate fastq qc statistics using fastqc.
URL: https://github.com/s-andrews/FastQC
Example¶
This wrapper can be used in the following way:
rule fastqc:
input:
"reads/{sample}.fastq"
output:
html="qc/fastqc/{sample}.html",
zip="qc/fastqc/{sample}_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params:
extra = "--quiet"
log:
"logs/fastqc/{sample}.log"
threads: 1
resources:
mem_mb = 1024
wrapper:
"v2.2.1/bio/fastqc"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fastqc=0.12.1
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- fastq file
Output:
- html file containing statistics
- zip file containing statistics
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for fastqc."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
import re
from tempfile import TemporaryDirectory
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Define memory per thread (https://github.com/s-andrews/FastQC/blob/master/fastqc#L201-L222)
mem_mb = int(get_mem(snakemake, "MiB") / snakemake.threads)
def basename_without_ext(file_path):
"""Returns basename of file path, without the file extension."""
base = path.basename(file_path)
# Remove file extension(s) (similar to the internal fastqc approach)
base = re.sub("\\.gz$", "", base)
base = re.sub("\\.bz2$", "", base)
base = re.sub("\\.txt$", "", base)
base = re.sub("\\.fastq$", "", base)
base = re.sub("\\.fq$", "", base)
base = re.sub("\\.sam$", "", base)
base = re.sub("\\.bam$", "", base)
return base
# If you have multiple input files fastqc doesn't know what to do. Taking silently only first gives unapreciated results
if len(snakemake.input) > 1:
raise IOError("Got multiple input files, I don't know how to process them!")
# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
shell(
"fastqc"
" --threads {snakemake.threads}"
" --memory {mem_mb}"
" {extra}"
" --outdir {tempdir:q}"
" {snakemake.input[0]:q}"
" {log}"
)
# Move outputs into proper position.
output_base = basename_without_ext(snakemake.input[0])
html_path = path.join(tempdir, output_base + "_fastqc.html")
zip_path = path.join(tempdir, output_base + "_fastqc.zip")
if snakemake.output.html != html_path:
shell("mv {html_path:q} {snakemake.output.html:q}")
if snakemake.output.zip != zip_path:
shell("mv {zip_path:q} {snakemake.output.zip:q}")
FASTTREE¶
build phylogenetic trees using fasttree. Documentation found at http://www.microbesonline.org/fasttree/
Example¶
This wrapper can be used in the following way:
rule fasttree:
input:
alignment="{sample}.fa", # Input alignment file
output:
tree="{sample}.nwk", # Output tree file
log:
"logs/muscle/{sample}.log",
params:
extra="", # Additional arguments
wrapper:
"v2.2.1/bio/fasttree"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- fasttree can only be run with a single thread.
Software dependencies¶
fasttree=2.1.11
Authors¶
- Nikos Tsardakas Renhuldt
Code¶
__author__ = "Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2021, Nikos Tsardakas Renhuldt"
__email__ = "nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"
from snakemake.shell import shell
import os
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"fasttree "
"{extra} "
"{snakemake.input.alignment} "
"> {snakemake.output.tree} "
"{log}"
)
FGBIO¶
For fgbio, the following wrappers are available:
FGBIO ANNOTATEBAMWITHUMIS¶
Annotates existing BAM files with UMIs (Unique Molecular Indices, aka Molecular IDs, Molecular barcodes) from a separate FASTQ file.
URL: https://fulcrumgenomics.github.io/fgbio/
Example¶
This wrapper can be used in the following way:
rule AnnotateBam:
input:
bam="mapped/{sample}.bam",
umi="umi/{sample}.fastq",
output:
"mapped/{sample}.annotated.bam",
params: ""
resources:
# suggestion assuming unsorted input, so that memory should
# be proportional to input size:
# https://fulcrumgenomics.github.io/fgbio/tools/latest/AnnotateBamWithUmis.html
mem_mb=lambda wildcards, input: max([input.size_mb * 1.3, 200])
log:
"logs/fgbio/annotate_bam/{sample}.log",
wrapper:
"v2.2.1/bio/fgbio/annotatebamwithumis"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fgbio=2.1.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
bam_input = snakemake.input.bam
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not isinstance(bam_input, str):
raise ValueError("Input bam should be a string: " + str(bam_input) + "!")
umi_input = snakemake.input.umi
if umi_input is None:
raise ValueError("Missing input file with UMIs")
elif not isinstance(umi_input, str):
raise ValueError("Input UMIs-file should be a string: " + str(umi_input) + "!")
if not len(snakemake.output) == 1:
raise ValueError("Only one output value expected: " + str(snakemake.output) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file!")
elif not isinstance(output_file, str):
raise ValueError("Output bam-file should be a string: " + str(output_file) + "!")
shell(
"fgbio {java_opts} AnnotateBamWithUmis"
" -i {bam_input}"
" -f {umi_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FGBIO CALLMOLECULARCONSENSUSREADS¶
Calls consensus sequences from reads with the same unique molecular tag.
Example¶
This wrapper can be used in the following way:
rule ConsensusReads:
input:
"mapped/a.bam"
output:
"mapped/{sample}.m3.bam"
params:
extra="-M 3"
log:
"logs/fgbio/consensus_reads/{sample}.log"
wrapper:
"v2.2.1/bio/fgbio/callmolecularconsensusreads"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fgbio=2.1.0
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell(
"fgbio CallMolecularConsensusReads"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FGBIO COLLECTDUPLEXSEQMETRICS¶
Collects a suite of metrics to QC duplex sequencing data.
URL: https://fulcrumgenomics.github.io/fgbio/
Example¶
This wrapper can be used in the following way:
rule CollectDuplexSeqMetrics:
input:
"mapped/{sample}.gu.bam"
output:
family_sizes="stats/{sample}.family_sizes.txt",
duplex_family_sizes="stats/{sample}.duplex_family_sizes.txt",
duplex_yield_metrics="stats/{sample}.duplex_yield_metrics.txt",
umi_counts="stats/{sample}.umi_counts.txt",
duplex_qc="stats/{sample}.duplex_qc.pdf",
duplex_umi_counts="stats/{sample}.duplex_umi_counts.txt",
params:
extra=lambda wildcards: "-d " + wildcards.sample
log:
"logs/fgbio/collectduplexseqmetrics/{sample}.log"
wrapper:
"v2.2.1/bio/fgbio/collectduplexseqmetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fgbio=2.1.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
bam_input = snakemake.input[0]
family_sizes = snakemake.output.family_sizes
duplex_family_sizes = snakemake.output.duplex_family_sizes
duplex_yield_metrics = snakemake.output.duplex_yield_metrics
umi_counts = snakemake.output.umi_counts
duplex_qc = snakemake.output.duplex_qc
duplex_umi_counts = snakemake.output.get("duplex_umi_counts", None)
file_path = str(path.dirname(family_sizes))
name = str(path.basename(family_sizes)).split(".")[0]
path_name_prefix = str(path.join(file_path, name))
if not family_sizes == path_name_prefix + ".family_sizes.txt":
raise Exception(
"Unexpected family_sizes path/name format, expected {}, got {}.".format(
path_name_prefix + ".family_sizes.txt", family_sizes
)
)
if not duplex_family_sizes == path_name_prefix + ".duplex_family_sizes.txt":
raise Exception(
"Unexpected duplex_family_sizes path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_family_sizes.txt", duplex_family_sizes
)
)
if not duplex_yield_metrics == path_name_prefix + ".duplex_yield_metrics.txt":
raise Exception(
"Unexpected duplex_yield_metrics path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_yield_metrics.txt", duplex_yield_metrics
)
)
if not umi_counts == path_name_prefix + ".umi_counts.txt":
raise Exception(
"Unexpected umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".umi_counts.txt", umi_counts
)
)
if not duplex_qc == path_name_prefix + ".duplex_qc.pdf":
raise Exception(
"Unexpected duplex_qc path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_qc.pdf", duplex_qc
)
)
if (
duplex_umi_counts is not None
and not duplex_umi_counts == path_name_prefix + ".duplex_umi_counts.txt"
):
raise Exception(
"Unexpected duplex_umi_counts path/name format, expected {}, got {}. Note that dirname will be extracted from family_sizes variable.".format(
path_name_prefix + ".duplex_umi_counts.txt", duplex_umi_counts
)
)
duplex_umi_counts_flag = ""
if duplex_umi_counts is not None:
duplex_umi_counts_flag = "-u "
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
shell(
"fgbio {java_opts} CollectDuplexSeqMetrics"
" -i {bam_input}"
" -o {path_name_prefix}"
" {duplex_umi_counts_flag}"
" {extra_params}"
" {log}"
)
FGBIO FILTERCONSENSUSREADS¶
Filters consensus reads generated by CallMolecularConsensusReads or CallDuplexConsensusReads.
Example¶
This wrapper can be used in the following way:
rule FilterConsensusReads:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.filtered.bam"
params:
extra="",
min_base_quality=2,
min_reads=[2, 2, 2],
ref="genome.fasta"
log:
"logs/fgbio/filterconsensusreads/{sample}.log"
threads: 1
wrapper:
"v2.2.1/bio/fgbio/filterconsensusreads"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- min_base_quality: a single value (Int). Mask (make N) consensus bases with quality less than this threshold. (default: 5)
- min_reads: n array of Ints, max length 3, min length 1. Number of reads that need to support a UMI. For filtering bam files processed with CallMolecularConsensusReads one value is required. 3 values can be provided for bam files processed with CallDuplexConsensusReads, if fewer than 3 are provided the last value will be repeated, the first value is for the final consensus sequence and the two last for each strands consensus.
- For more information see, http://fulcrumgenomics.github.io/fgbio/tools/latest/FilterConsensusReads.html
Software dependencies¶
fgbio=2.1.0
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
min_base_quality = snakemake.params.get("min_base_quality", None)
if not isinstance(min_base_quality, int):
raise ValueError("min_base_quality needs to be provided as an Int!")
min_reads = snakemake.params.get("min_reads", None)
if not isinstance(min_reads, list) or not (1 <= len(min_reads) <= 3):
raise ValueError(
"min_reads needs to be provided as list of Ints, min length 1, max length 3!"
)
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference needs to be provided!")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
bam_output = snakemake.output[0]
if not isinstance(bam_output, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(bam_output) + "!")
shell(
"fgbio FilterConsensusReads"
" -i {bam_input}"
" -o {bam_output}"
" -r {ref}"
" --min-reads {min_reads}"
" --min-base-quality {min_base_quality}"
" {extra_params}"
" {log}"
)
FGBIO GROUPREADSBYUMI¶
Groups reads together that appear to have come from the same original molecule.
Example¶
This wrapper can be used in the following way:
rule GroupReads:
input:
"mapped/a.bam"
output:
bam="mapped/{sample}.gu.bam",
hist="mapped/{sample}.gu.histo.tsv",
params:
extra="-s adjacency --edits 1"
log:
"logs/fgbio/group_reads/{sample}.log"
wrapper:
"v2.2.1/bio/fgbio/groupreadsbyumi"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fgbio=2.1.0
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_bam_file = snakemake.output.bam
if not isinstance(output_bam_file, str) and len(output_bam_file) != 1:
raise ValueError("Bam output should be one bam file: " + str(output_bam_file) + "!")
output_histo_file = snakemake.output.hist
if not isinstance(output_histo_file, str) and len(output_histo_file) != 1:
raise ValueError(
"Histo output should be one histogram file path: "
+ str(output_histo_file)
+ "!"
)
shell(
"fgbio GroupReadsByUmi"
" -i {bam_input}"
" -o {output_bam_file}"
" -f {output_histo_file}"
" {extra_params}"
" {log}"
)
FGBIO SETMATEINFORMATION¶
Adds and/or fixes mate information on paired-end reads. Sets the MQ (mate mapping quality), MC (mate cigar string), ensures all mate-related flag fields are set correctly, and that the mate reference and mate start position are correct.
Example¶
This wrapper can be used in the following way:
rule SetMateInfo:
input:
"mapped/a.bam"
output:
"mapped/{sample}.mi.bam"
params: ""
log:
"logs/fgbio/set_mate_info/{sample}.log"
wrapper:
"v2.2.1/bio/fgbio/setmateinformation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
fgbio=2.1.0
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra_params = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if not isinstance(bam_input, str) and len(snakemake.input) != 1:
raise ValueError("Input bam should be one bam file: " + str(bam_input) + "!")
output_file = snakemake.output[0]
if not isinstance(output_file, str) and len(snakemake.output) != 1:
raise ValueError("Output should be one bam file: " + str(output_file) + "!")
shell(
"fgbio SetMateInformation"
" -i {bam_input}"
" -o {output_file}"
" {extra_params}"
" {log}"
)
FILTLONG¶
Quality filtering tool for long reads.
Example¶
This wrapper can be used in the following way:
rule filtlong:
input:
reads = "{sample}.fastq"
output:
"{sample}.filtered.fastq"
params:
extra=" --mean_q_weight 5.0",
target_bases = 10
log:
"logs/filtlong/test/{sample}.log"
wrapper:
"v2.2.1/bio/filtlong"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
filtlong=0.2.1
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for filtlong."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
target_bases = int(snakemake.params.get("target_bases", 0))
if target_bases > 0:
extra += " --target_bases {}".format(target_bases)
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Executed shell command
shell("filtlong {extra}" " {snakemake.input.reads} > {snakemake.output} {log}")
FREEBAYES¶
Call small genomic variants with freebayes.
URL: https://github.com/freebayes/freebayes
Example¶
This wrapper can be used in the following way:
rule freebayes:
input:
alns="mapped/{sample}.bam",
idxs="mapped/{sample}.bam.bai",
ref="genome.fasta",
output:
vcf = "calls/{sample}.vcf",
log:
"logs/freebayes/{sample}.log",
params:
normalize="-a",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/freebayes"
rule freebayes_bcf:
input:
alns="mapped/{sample}.bam",
ref="genome.fasta",
output:
bcf="calls/{sample}.bcf",
log:
"logs/freebayes/{sample}.bcf.log",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/freebayes"
rule freebayes_bed:
input:
alns="mapped/{sample}.bam",
ref="genome.fasta",
regions="regions.bed",
output:
vcf="calls/{sample}.vcf.gz",
log:
"logs/freebayes/{sample}.bed.log",
params:
chunksize=50000,
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/freebayes"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
freebayes=1.3.6
bcftools=1.17
parallel=20230522
bedtools=2.31.0
sed=4.8
snakemake-wrapper-utils=0.6.1
Params¶
extra
: additional arguments for freebayesnormalize
: use bcftools norm to normalize indels (one of -a, -f, -m, -D or -d must be used)chunkzise
: reference genome chunk size for parallelization (default 100000)
Authors¶
- Johannes Köster
- Felix Mölder
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster, Felix Mölder, Christopher Schröder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
from snakemake_wrapper_utils.bcftools import get_bcftools_opts
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
bcftools_sort_opts = get_bcftools_opts(
snakemake,
parse_threads=False,
parse_ref=False,
parse_regions=False,
parse_samples=False,
parse_targets=False,
parse_output=False,
parse_output_format=False,
)
pipe = ""
norm_params = snakemake.params.get("normalize")
if norm_params:
bcftools_norm_opts = get_bcftools_opts(
snakemake, parse_regions=False, parse_targets=False, parse_memory=False
)
pipe = f"bcftools norm {bcftools_norm_opts} {norm_params}"
else:
bcftools_view_opts = get_bcftools_opts(
snakemake,
parse_ref=False,
parse_regions=False,
parse_targets=False,
parse_memory=False,
)
pipe = f"bcftools view {bcftools_view_opts}"
if snakemake.threads == 1:
freebayes = "freebayes"
else:
chunksize = snakemake.params.get("chunksize", 100000)
regions = f"<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})"
if snakemake.input.get("regions"):
regions = (
"<(bedtools intersect -a "
+ r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
+ f"{regions}) -b {snakemake.input.regions} | "
+ r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
)
freebayes = f"freebayes-parallel {regions} {snakemake.threads}"
with TemporaryDirectory() as tempdir:
shell(
"({freebayes}"
" --fasta-reference {snakemake.input.ref}"
" {extra}"
" {snakemake.input.alns}"
" | bcftools sort {bcftools_sort_opts} --temp-dir {tempdir}"
" | {pipe}"
") {log}"
)
GATK¶
For gatk, the following wrappers are available:
GATK APPLYBQSR¶
Run gatk ApplyBQSR.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570337264923-ApplyBQSR
Example¶
This wrapper can be used in the following way:
rule gatk_applybqsr:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
recal_table="recal/{sample}.grp",
output:
bam="recal/{sample}.bam",
log:
"logs/gatk/gatk_applybqsr/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
embed_ref=True, # embed the reference in cram output
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/applybqsr"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
samtools=1.17
Input/Output¶
Input:
- BAM file
- FASTA reference
- recalibration table for the bam
Output:
- recalibrated bam file
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
# Extract arguments
extra = snakemake.params.get("extra", "")
reference = snakemake.input.get("ref")
embed_ref = snakemake.params.get("embed_ref", False)
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
if snakemake.output.bam.endswith(".cram") and embed_ref:
output = "/dev/stdout"
pipe_cmd = " | samtools view -h -O cram,embed_ref -T {reference} -o {snakemake.output.bam} -"
else:
output = snakemake.output.bam
pipe_cmd = ""
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(gatk --java-options '{java_opts}' ApplyBQSR"
" --input {snakemake.input.bam}"
" --bqsr-recal-file {snakemake.input.recal_table}"
" --reference {reference}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {output}" + pipe_cmd + ") {log}"
)
GATK APPLYBQSRSPARK¶
ApplyBQSRSpark: Apply base quality score recalibration on Spark; uses output of the BaseRecalibrator tool.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570424849051-ApplyBQSRSpark-BETA-
Example¶
This wrapper can be used in the following way:
rule gatk_applybqsr_spark:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
recal_table="recal/{sample}.grp",
output:
bam="recal/{sample}.bam",
log:
"logs/gatk/gatk_applybqsr_spark/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
#spark_runner="", # optional, local by default
#spark_v2.2.1="", # optional
#spark_extra="", # optional
embed_ref=True, # embed reference in cram output
exceed_thread_limit=True, # samtools is also parallized and thread limit is not guaranteed anymore
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/applybqsrspark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments for applybqsrspark.
- The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
- The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
- The spark_extra param allows for additional spark arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
samtools=1.17
Input/Output¶
Input:
- bam file
- fasta reference
- recalibration table for the bam
Output:
- recalibrated bam file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira, Christopher Schröder"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
import random
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
"spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
reference = snakemake.input.get("ref")
embed_ref = snakemake.params.get("embed_ref", False)
exceed_thread_limit = snakemake.params.get("exceed_thread_limit", False)
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if exceed_thread_limit:
samtools_threads = snakemake.threads
else:
samtools_threads = 1
if snakemake.output.bam.endswith(".cram") and embed_ref:
output = "/dev/stdout --create-output-bam-splitting-index false"
pipe_cmd = " | samtools view -h -O cram,embed_ref -T {reference} -o {snakemake.output.bam} -@ {samtools_threads} -"
else:
output = snakemake.output.bam
pipe_cmd = ""
with tempfile.TemporaryDirectory() as tmpdir:
# This folder must not exist; it is created by GATK
tmpdir_shards = Path(tmpdir) / "shards_{:06d}".format(random.randrange(10**6))
shell(
"(gatk --java-options '{java_opts}' ApplyBQSRSpark"
" --input {snakemake.input.bam}"
" --bqsr-recal-file {snakemake.input.recal_table}"
" --reference {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output-shard-tmp-dir {tmpdir_shards}"
" --output {output}"
" -- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra}"
+ pipe_cmd
+ ") {log}"
)
GATK APPLYVQSR¶
Run gatk ApplyVQSR.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570419503259-ApplyVQSR
Example¶
This wrapper can be used in the following way:
rule apply_vqsr:
input:
vcf="test.vcf",
recal="snps.recal",
tranches="snps.tranches",
ref="ref.fasta",
output:
vcf="test.snp_recal.vcf",
log:
"logs/gatk/applyvqsr.log",
params:
mode="SNP", # set mode, must be either SNP, INDEL or BOTH
extra="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/applyvqsr"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- VCF file
- Recalibration file
- Tranches file
Output:
- Variant QualityScore-Recalibrated VCF
Authors¶
- Brett Copeland
- Filipe G. Vieira
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' ApplyVQSR"
" --variant {snakemake.input.vcf}"
" --recal-file {snakemake.input.recal}"
" --reference {snakemake.input.ref}"
" --tranches-file {snakemake.input.tranches}"
" --mode {snakemake.params.mode}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK BASERECALIBRATOR¶
Run gatk BaseRecalibrator.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570376886683-BaseRecalibrator
Example¶
This wrapper can be used in the following way:
rule gatk_baserecalibrator:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
known="dbsnp.vcf.gz", # optional known sites - single or a list
output:
recal_table="recal/{sample}.grp",
log:
"logs/gatk/baserecalibrator/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/baserecalibrator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- bam file
- fasta reference
- vcf.gz of known variants
Output:
- recalibration table for the bam
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
if isinstance(known, str):
known = [known]
known = list(map("--known-sites {}".format, known))
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' BaseRecalibrator"
" --input {snakemake.input.bam}"
" --reference {snakemake.input.ref}"
" {known}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.recal_table}"
" {log}"
)
GATK BASERECALIBRATORSPARK¶
Run gatk BaseRecalibratorSpark.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570309302171-BaseRecalibratorSpark-BETA-
Example¶
This wrapper can be used in the following way:
rule gatk_baserecalibratorspark:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
known="dbsnp.vcf.gz", # optional known sites
output:
recal_table="recal/{sample}.grp",
log:
"logs/gatk/baserecalibrator/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
#spark_runner="", # optional, local by default
#spark_v2.2.1="", # optional
#spark_extra="", # optional
resources:
mem_mb=1024,
threads: 8
wrapper:
"v2.2.1/bio/gatk/baserecalibratorspark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments for baserecalibratorspark.
- The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
- The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
- The spark_extra param allows for additional spark arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- bam file
- fasta reference
- vcf.gz of known variants
Output:
- recalibration table for the bam
Authors¶
- Christopher Schröder
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
"spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
known = snakemake.input.get("known", "")
if known:
known = "--known-sites {}".format(known)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' BaseRecalibratorSpark"
" --input {snakemake.input.bam}"
" --reference {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.recal_table} {known}"
" -- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra}"
" {log}"
)
GATK CALLCOPYRATIONSEGMENTS¶
Calls copy-ratio segments as amplified, deleted, or copy-number neutral
URL: https://gatk.broadinstitute.org/hc/en-us/articles/13832751795227-CallCopyRatioSegments
Example¶
This wrapper can be used in the following way:
rule call_copy_ratio_segments:
input:
copy_ratio_seg="a.cr.seg",
output:
called_copy_ratio_seg="a.called.seg",
igv_seg="a.called.igv.seg",
log:
"logs/gatk/call_copy_ratio_segments.log",
params:
#prefix="a.den.test",
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/callcopyratiosegments"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
copy_ratio_seg
: cr.seq file from ModelSegments
Output:
copy_ratio_seg
: called copy ratio segments fileigv_seg
: CBS formatted igv.seg file, optional
Params¶
java_opts
: param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: param allows for additional program arguments.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smed"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
input_copy_ratio_seg = snakemake.input.copy_ratio_seg
called_copy_ratio_seg = snakemake.output.called_copy_ratio_seg
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
outputfile_call = os.path.join(tmpdir, "temp.seq")
outputfile_igv = os.path.join(tmpdir, "temp.igv.seg")
shell(
"gatk --java-options '{java_opts}' CallCopyRatioSegments"
" -I {input_copy_ratio_seg}"
" -O {outputfile_call}"
" --tmp-dir {tmpdir}"
" {extra}"
" {log}"
)
shell("cp {outputfile_call} {called_copy_ratio_seg}")
if snakemake.output.get("igv_seg", None):
shell("cp {outputfile_igv} {snakemake.output.igv_seg}")
GATK CLEANSAM¶
Run gatk CleanSam
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570531983643-CleanSam-Picard-
Example¶
This wrapper can be used in the following way:
rule gatk_clean_sam:
input:
bam="{sample}.bam",
output:
clean="{sample}.clean.bam",
log:
"logs/{sample}.log",
params:
extra="",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/cleansam"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' CleanSam"
" --INPUT {snakemake.input.bam}"
" {extra}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.clean}"
" {log}"
)
GATK COLLECTALLELICCOUNTS¶
Collects reference and alternate allele counts at specified sites.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/13832754187035-CollectAllelicCounts
Example¶
This wrapper can be used in the following way:
rule collectalleliccounts:
input:
bam=["mapped/a.bam"],
intervals=["a.interval_list"],
ref="ref/genome.fasta"
output:
counts="a.counts.tsv",
log:
"logs/gatk/collectalleliccounts.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/collectalleliccounts"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
bam
: BAM/SAM/CRAM file containing readsintervals
: one or more genomic intervals over which to operateref
: reference FASTA file
Output:
counts
: tab-separated values (TSV) file with allelic counts and a SAM-style header
Params¶
java_opts
: additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: additional program arguments.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smed"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' CollectAllelicCounts"
" -I {snakemake.input.bam}"
" -L {snakemake.input.intervals}"
" -R {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.counts}"
" {log}"
)
GATK COLLECTREADCOUNTS¶
Collects read counts at specified intervals. The count for each interval is calculated by counting the number of read starts that lie in the interval.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/360037592671-CollectReadCounts
Example¶
This wrapper can be used in the following way:
rule collectreadcounts:
input:
bam=["mapped/a.bam"],
intervals=["a.interval_list"],
output:
counts="a.counts.hdf5",
log:
"logs/gatk/collectreadcounts.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/collectreadcounts"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
bam
: BAM/SAM/CRAM file containing readsintervals
: one or more genomic intervals over which to operate
Output:
counts
: output file for read counts, tsv or hdf5
Params¶
mergingRule
: interval merging rule for abutting intervals (default, OVERLAPPING_ONLY)java_opts
: additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: additional program arguments.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smed"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
mergingRule = snakemake.params.get("mergingRule", "OVERLAPPING_ONLY")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' CollectReadCounts"
" -I {snakemake.input.bam}"
" -L {snakemake.input.intervals}"
" --interval-merging-rule {mergingRule}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.counts}"
" {log}"
)
GATK COMBINEGVCFS¶
Run gatk CombineGVCFs.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570423318427-CombineGVCFs
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcfs=["calls/a.g.vcf", "calls/b.g.vcf"],
ref="genome.fasta",
output:
gvcf="calls/all.g.vcf",
log:
"logs/gatk/combinegvcfs.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/combinegvcfs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
gvcfs = list(map("--variant {}".format, snakemake.input.gvcfs))
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' CombineGVCFs"
" {gvcfs}"
" --reference {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.gvcf}"
" {log}"
)
GATK DENOISEREADCOUNTS¶
Denoises read counts to produce denoised copy ratios
URL: https://gatk.broadinstitute.org/hc/en-us/articles/13832751133851-DenoiseReadCounts
Example¶
This wrapper can be used in the following way:
rule denoisereadcounts:
input:
hdf5=["a.counts.hdf5"],
output:
std_copy_ratio="a.standardizedCR.tsv",
denoised_copy_ratio="a.denoisedCR.tsv",
log:
"logs/gatk/denoisereadcounts.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/denoisereadcounts"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
hdf5
: TSV or HDF5 file with counts from CollectReadCounts.pon
: Panel-of-normals from CreateReadCountPanelOfNormals (optional)gc_interval
: GC-content annotated-intervals from {@link AnnotateIntervals (optional)
Output:
std_copy_ratio
: Standardized-copy-ratios filedenoised_copy_ratio
: Denoised-copy-ratios file
Params¶
java_opts
: additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: additional program arguments.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smed"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
pon = snakemake.input.get("pon", "")
if pon:
pon = f"--count-panel-of-normals {snakemake.input.pon}"
gc_interval = snakemake.input.get("gc_interval", "")
if gc_interval:
gc_interval = f"--annotated-intervals {snakemake.input.gc_interval}"
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' DenoiseReadCounts"
" --input {snakemake.input.hdf5}"
" {pon}"
" {gc_interval}"
" {extra}"
" --standardized-copy-ratios {snakemake.output.std_copy_ratio}"
" --denoised-copy-ratios {snakemake.output.denoised_copy_ratio}"
" --tmp-dir {tmpdir}"
" {log}"
)
GATK DEPTHOFCOVERAGE¶
Run gatk DepthOfCoverage (BETA).
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570475259291-DepthOfCoverage-BETA-
Example¶
This wrapper can be used in the following way:
rule gatk_depth_of_coverage:
input:
bam="mapped/a.bam", # File containing reads
fasta="genome.fasta",
intervals="regions.interval_list", # Regions where the coverage is computed
output:
multiext(
"depth/a",
"",
".sample_interval_summary",
".sample_cumulative_coverage_counts",
".sample_cumulative_coverage_proportions",
".sample_interval_statistics",
".sample_statistics",
".sample_summary",
),
log:
"logs/gatk/depthofcoverage.log",
params:
extra="",
java_opts="",
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/depthofcoverage"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.3.0.0
snakemake-wrapper-utils=0.5.2
Input/Output¶
Input:
- one bam file to be analyzed for coverage statistics
- one or more genomic intervals over which to operate
- reference genome
Output:
- base file location to which to write coverage summary information
Authors¶
- Lauri Mesilaakso
Code¶
__author__ = "Lauri Mesilaakso"
__copyright__ = "Copyright 2022, Lauri Mesilaakso"
__email__ = "lauri.mesilaakso@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
from os import path
java_opts = get_java_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Extract basename from the output file names
out_basename = path.commonprefix(snakemake.output).rstrip(".")
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' DepthOfCoverage"
" --input {snakemake.input.bam}"
" --intervals {snakemake.input.intervals}"
" --reference {snakemake.input.fasta}"
" --output {out_basename}"
" --tmp-dir {tmpdir}"
" {extra}"
" {log}"
)
GATK ESTIMATELIBRARYCOMPLEXITY¶
Run gatk EstimateLibraryComplexity
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570264328603-EstimateLibraryComplexity-Picard-
Example¶
This wrapper can be used in the following way:
rule gatk_estimate_library_complexity:
input:
bam="{sample}.bam",
output:
metrics="{sample}.metrics",
log:
"logs/{sample}.log",
params:
extra="",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/estimatelibrarycomplexity"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' EstimateLibraryComplexity"
" --INPUT {snakemake.input}"
" {extra}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.metrics}"
" {log}"
)
GATK FILTERMUTECTCALLS¶
Run gatk FilterMutectCalls to filter variants in a Mutect2 VCF callset.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570331605531-FilterMutectCalls
Example¶
This wrapper can be used in the following way:
rule gatk_filtermutectcalls:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.mutect.filtered.vcf",
log:
"logs/gatk/filter/snvs.log",
params:
extra="--max-alt-allele-count 3", # optional arguments, see GATK docs
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/filtermutectcalls"
rule gatk_filtermutectcalls_complete:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
bam="mapped/a.bam",
intervals="intervals.bed",
# contamination="", # from gatk CalculateContamination
# segmentation="", # from gatk CalculateContamination
# f1r2="", # from gatk LearnReadOrientationBias
output:
vcf="calls/snvs.mutect.filtered.b.vcf",
log:
"logs/gatk/filter/snvs.log",
params:
extra="--max-alt-allele-count 3", # optional arguments, see GATK docs
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/filtermutectcalls"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- For more information see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
vcf
: Path to vcf file (pbgzipped, indexed)ref
: Path to reference genome (with .dict file alongside)aln
: Optional path to SAM/BAM/CRAM filescontamination
: Optional path tosegmentation
: Optional path to tumor segmentsf1r2
: Optional path to prior artefact (tar.gz2)intervels
: Optional file to BED intervals
Output:
vcf
: filtered vcf filestats
: Optional stats from Mutect2
Authors¶
- Patrik Smeds
- Filipe G. Vieira
- Thibault Dayris
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
aln = snakemake.input.get("aln", "")
if aln:
aln = f"--input {aln}"
contamination = snakemake.input.get("contemination_table", "")
if contamination:
contamination = f"--contamination-table {contamination}"
segmentation = snakemake.input.get("segmentation", "")
if segmentation:
segmentation = f"--tumor-segmentation {segmentation}"
f1r2 = snakemake.input.get("f1r2", "")
if f1r2:
f1r2 = f"--orientation-bias-artifact-priors {f1r2}"
intervals = snakemake.input.get("bed", "")
if intervals:
intervals = f"--intervals {intervals}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' FilterMutectCalls"
" --variant {snakemake.input.vcf}"
" --reference {snakemake.input.ref}"
" {aln}" # BAM/SAM/CRAM file containing reads
" {contamination}" # Tables containing contamination information
" {segmentation}" # Tumor segments' minor allele fractions
" {f1r2}" # .tar.gz files containing tables of prior artifact
" {intervals}" # Genomic intervals over which to operate
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK GENOMICSDBIMPORT¶
Run gatk GenomicsDBImport.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570326648475-GenomicsDBImport
Example¶
This wrapper can be used in the following way:
rule genomics_db_import:
input:
gvcfs=["calls/a.g.vcf.gz", "calls/b.g.vcf.gz"],
output:
db=directory("db"),
log:
"logs/gatk/genomicsdbimport.log",
params:
intervals="ref",
db_action="create", # optional
extra="", # optional
java_opts="", # optional
resources:
mem_mb=lambda wildcards, input: max([input.size_mb * 1.6, 200]),
wrapper:
"v2.2.1/bio/gatk/genomicsdbimport"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The intervals param is mandatory
- By default, the wrapper will create a new database (output directory must be empty or non-existent). If you want to update an existing DB, set db_action param to update.
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
# uses Java native library TileDB, which requires a lot of memory outside
# of the `-Xmx` memory, so we reserve 40% instead of the default 20%. See:
# https://gatk.broadinstitute.org/hc/en-us/articles/9570326648475-GenomicsDBImportGenomicsDBImport
java_opts = get_java_opts(snakemake, java_mem_overhead_factor=0.4)
gvcfs = list(map("--variant {}".format, snakemake.input.gvcfs))
db_action = snakemake.params.get("db_action", "create")
if db_action == "create":
db_action = "--genomicsdb-workspace-path"
elif db_action == "update":
db_action = "--genomicsdb-update-workspace-path"
else:
raise ValueError(
"invalid option provided to 'params.db_action'; please choose either 'create' or 'update'."
)
intervals = snakemake.input.get("intervals")
if not intervals:
intervals = snakemake.params.get("intervals")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' GenomicsDBImport"
" {gvcfs}"
" --intervals {intervals}"
" {extra}"
" --tmp-dir {tmpdir}"
" {db_action} {snakemake.output.db}"
" {log}"
)
GATK GENOTYPEGVCFS¶
Run gatk GenotypeGVCFs.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570489472411-GenotypeGVCFs
Example¶
This wrapper can be used in the following way:
rule genotype_gvcfs:
input:
gvcf="calls/all.g.vcf", # combined gvcf over multiple samples
# N.B. gvcf or genomicsdb must be specified
# in the latter case, this is a GenomicsDB data store
ref="genome.fasta"
output:
vcf="calls/all.vcf",
log:
"logs/gatk/genotypegvcfs.log"
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/gatk/genotypegvcfs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- GVCF files or GenomicsDB workspace
- reference genome
Output:
- VCF file with genotypes
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
dbsnp = snakemake.input.get("known", "")
if dbsnp:
dbsnp = "--dbsnp {}".format(dbsnp)
# Allow for either an input gvcf or GenomicsDB
gvcf = snakemake.input.get("gvcf", "")
genomicsdb = snakemake.input.get("genomicsdb", "")
if gvcf:
if genomicsdb:
raise Exception("Only input.gvcf or input.genomicsdb expected, got both.")
input_string = gvcf
else:
if genomicsdb:
input_string = "gendb://{}".format(genomicsdb)
else:
raise Exception("Expected input.gvcf or input.genomicsdb.")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' GenotypeGVCFs"
" --variant {input_string}"
" --reference {snakemake.input.ref}"
" {dbsnp}"
" {intervals}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK GETPILEUPSUMMARIES¶
Summarizes counts of reads that support reference, alternate and other alleles
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570416554907-GetPileupSummaries
Example¶
This wrapper can be used in the following way:
rule test_gatk_get_pileup_summaries:
input:
bam="mapped/a.bam",
intervals="genome/intervals.bed",
variants="genome/variants.vcf.gz",
output:
"summaries.table",
threads: 1
resources:
mem_mb=1024,
params:
extra="",
log:
"logs/summary.log",
wrapper:
"v2.2.1/bio/gatk/getpileupsummaries"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
bam
: Path to bam file (sorted and indexed)intervals
: Path to one or more BED genomic intervals over which to operatevariants
: Path to a VCF containing allele frequencies (pbgzipped and tabix indexed)
Output:
- Path to output table
Params¶
extra
: Optional parameters
Authors¶
Code¶
#!/usr/bin/env python3
# coding: utf-8
"""Snakemake wrapper for GATK GetPileupSummaries"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tempdir:
shell(
"gatk GetPileupSummaries "
"--java-options '{java_opts}' "
"--input {snakemake.input.bam} "
"--intervals {snakemake.input.intervals} "
"--variant {snakemake.input.variants} "
"--output {snakemake.output[0]} "
"--tmp-dir {tempdir} "
"{extra} "
"{log} "
)
GATK HAPLOTYPECALLER¶
Run gatk HaplotypeCaller.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570334998171-HaplotypeCaller
Example¶
This wrapper can be used in the following way:
rule haplotype_caller:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
ref="genome.fasta",
# known="dbsnp.vcf" # optional
output:
vcf="calls/{sample}.vcf",
# bam="{sample}.assemb_haplo.bam",
log:
"logs/gatk/haplotypecaller/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
threads: 4
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/haplotypecaller"
rule haplotype_caller_gvcf:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
ref="genome.fasta",
# known="dbsnp.vcf" # optional
output:
gvcf="calls/{sample}.g.vcf",
# bam="{sample}.assemb_haplo.bam",
log:
"logs/gatk/haplotypecaller/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
threads: 4
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/haplotypecaller"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
bams = snakemake.input.bam
if isinstance(bams, str):
bams = [bams]
bams = list(map("--input {}".format, bams))
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
known = snakemake.input.get("known", "")
if known:
known = "--dbsnp " + str(known)
vcf_output = snakemake.output.get("vcf", "")
if vcf_output:
output = " --output " + str(vcf_output)
gvcf_output = snakemake.output.get("gvcf", "")
if gvcf_output:
output = " --emit-ref-confidence GVCF " + " --output " + str(gvcf_output)
if (vcf_output and gvcf_output) or (not gvcf_output and not vcf_output):
if vcf_output and gvcf_output:
raise ValueError(
"please set vcf or gvcf as output, not both! It's not supported by gatk"
)
else:
raise ValueError("please set one of vcf or gvcf as output (not both)!")
bam_output = snakemake.output.get("bam", "")
if bam_output:
bam_output = " --bam-output " + str(bam_output)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' HaplotypeCaller"
" --native-pair-hmm-threads {snakemake.threads}"
" {bams}"
" --reference {snakemake.input.ref}"
" {intervals}"
" {known}"
" {extra}"
" --tmp-dir {tmpdir}"
" {output}"
" {bam_output}"
" {log}"
)
GATK INTERVALLISTTOBED¶
Run gatk IntervalListToBed.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570392740123-IntervalListToBed-Picard-
Example¶
This wrapper can be used in the following way:
rule gatk_interval_list_to_bed:
input:
intervals="genome.intervals",
output:
bed="genome.bed",
log:
"logs/genome.log",
params:
extra="",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/intervallisttobed"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' IntervalListToBed"
" --INPUT {snakemake.input.intervals}"
" {extra}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.bed}"
" {log}"
)
GATK LEARNREADORIENTATIONMODEL¶
Get the maximum likelihood estimates of artifact prior probabilities in the orientation bias mixture model filter
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570329571227-LearnReadOrientationModel
Example¶
This wrapper can be used in the following way:
rule test_gatk_learnreadorientationmodel:
input:
f1r2="f1r2.tar.gz",
output:
"artifacts_prior.tar.gz",
resources:
mem_mb=1024,
params:
extra="",
log:
"learnreadorientationbias.log",
wrapper:
"v2.2.1/bio/gatk/learnreadorientationmodel"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
f1r2
: Path to one or multiple f1r2 files
Output:
- Path to tar.gz of artifact prior tables
Authors¶
- Thibault Dayris
Code¶
#!/usr/bin/env python3
# coding: utf-8
"""Snakemake wrapper for GATK4 LearnReadOrientationModel"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
f1r2 = "--input "
if isinstance(snakemake.input["f1r2"], list):
# Case user provided a list of archives
f1r2 += "--input ".join(snakemake.input["f1r2"])
else:
# Case user provided a single archive as a string
f1r2 += snakemake.input["f1r2"]
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' LearnReadOrientationModel" # Tool and its subprocess
" {f1r2}" # Path to input mapping file
" {extra}" # Extra parameters
" --tmp-dir {tmpdir}"
" --output {snakemake.output[0]}" # Path to output vcf file
" {log}" # Logging behaviour
)
GATK LEFTALIGNANDTRIMVARIANTS¶
Run gatk LeftAlignAndTrimVariants
URL: https://gatk.broadinstitute.org/hc/en-us/articles/360037225872-LeftAlignAndTrimVariants
Example¶
This wrapper can be used in the following way:
rule gatk_leftalignandtrimvariants:
input:
vcf="calls/test_split_with_AS_filters.vcf",
ref="Homo_sapiens_assembly38.chrM.fasta",
fai="Homo_sapiens_assembly38.chrM.fasta.fai",
dict="Homo_sapiens_assembly38.chrM.dict",
# intervals="intervals.bed", # optional
output:
vcf="calls/split_multiallelics.vcf",
log:
"logs/gatk/leftalignandtrimvariants.log",
params:
extra="--split-multi-allelics", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/leftalignandtrimvariants"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Dmitry Bespiatykh
Code¶
"""Snakemake wrapper for GATK LeftAlignAndTrimVariants"""
__author__ = "Dmitry Bespiatykh"
__copyright__ = "Copyright 2023, Dmitry Bespiatykh"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' LeftAlignAndTrimVariants"
" --variant {snakemake.input.vcf}"
" --reference {snakemake.input.ref}"
" {intervals}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK MARKDUPLICATESSPARK¶
Spark implementation of Picard MarkDuplicates that allows the tool to be run in parallel on multiple cores on a local machine or multiple machines on a Spark cluster while still matching the output of the non-Spark Picard version of the tool. Since the tool requires holding all of the readnames in memory while it groups read information, machine configuration and starting sort-order impact tool performance.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570319741083-MarkDuplicatesSpark
Example¶
This wrapper can be used in the following way:
rule mark_duplicates_spark:
input:
"mapped/{sample}.bam",
output:
bam="dedup/{sample}.bam",
metrics="dedup/{sample}.metrics.txt",
log:
"logs/dedup/{sample}.log",
params:
extra="--remove-sequencing-duplicates", # optional
java_opts="", # optional
#spark_runner="", # optional, local by default
#spark_v2.2.1="", # optional
#spark_extra="", # optional
resources:
# Memory needs to be at least 471859200 for Spark, so 589824000 when
# accounting for default JVM overhead of 20%. We round round to 650M.
mem_mb=lambda wildcards, input: max([input.size_mb * 0.25, 650]),
threads: 8
wrapper:
"v2.2.1/bio/gatk/markduplicatesspark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
- The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
- The spark_extra param allows for additional spark arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Fillipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
"spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)
metrics = snakemake.output.get("metrics", "")
if metrics:
metrics = f"--metrics-file {metrics}"
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' MarkDuplicatesSpark"
" --input {snakemake.input}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.bam}"
" {metrics}"
" -- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra}"
" {log}"
)
GATK MODELSEGMENTS¶
Models segmented copy ratios from denoised copy ratios and segmented minor-allele fractions from allelic counts
URL: https://gatk.broadinstitute.org/hc/en-us/articles/13832747657883-ModelSegments
Example¶
This wrapper can be used in the following way:
rule modelsegments_denoise_input:
input:
denoised_copy_ratios="a.denoisedCR.tsv",
output:
"a.den.modelFinal.seg",
"a.n.cr.seg",
log:
"logs/gatk/modelsegments_denoise.log",
params:
#prefix="a.den.test",
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/modelsegments"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
denoised_copy_ratios
: denoised_copy_ratios file (optional)allelic_counts
: allelic_counts file (optional)normal_allelic_counts
: matched_normal allelic-counts (optional)segments
: segments Picard interval-list file containing a multisample segmentation output by a previous run (optional)
Output:
- list of files ending with either ‘.modelFinal.seq’, ‘.cr.seg’, ‘.af.igv.seg’, ‘.cr.igv.seg’, ‘.hets.tsv’, ‘.modelBegin.cr.param’, ‘.modelBegin.af.param’, ‘.modelBegin.seg’, ‘.modelFinal.af.param’ or ‘.modelFinal.cr.param’
Params¶
java_opts
: additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: additional program arguments.
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2023, Patrik Smed"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
denoised_copy_ratios = ""
if snakemake.input.get("denoised_copy_ratios", None):
denoised_copy_ratios = (
f"--denoised-copy-ratios {snakemake.input.denoised_copy_ratios}"
)
allelic_counts = ""
if snakemake.input.get("allelic_counts", None):
allelic_counts = f"--allelic-counts {snakemake.input.allelic_counts}"
normal_allelic_counts = ""
if snakemake.input.get("normal_allelic_counts", None):
matched_normal_allelic_counts = (
f"--normal-allelic-counts {snakemake.inut.normal_allelic_counts}"
)
segments = ""
if snakemake.input.get("segments", None):
interval_list = f"--segments {snakemake.input.segments}"
if not allelic_counts and not denoised_copy_ratios:
raise Exception(
"wrapper input requires either 'allelic_counts' or 'denoise_copy_ratios' to be set"
)
if normal_allelic_counts and not allelic_counts:
raise Exception(
"'allelica_counts' is required when 'normal-allelic-counts' is an input to the rule!"
)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
output_folder = os.path.join(tmpdir, "output_folder")
shell(
"gatk --java-options '{java_opts}' ModelSegments"
" {segments}"
" {denoised_copy_ratios}"
" {allelic_counts}"
" {normal_allelic_counts}"
" --output-prefix temp_name__"
" -O {output_folder}"
" --tmp-dir {tmpdir}"
" {extra}"
" {log}"
)
created_files = {}
# Find all created files
for new_file in os.listdir(output_folder):
file_path = os.path.join(output_folder, new_file)
if os.path.isfile(file_path):
file_end = os.path.basename(file_path).split("__")[1]
created_files[file_end] = file_path
# Match expected output with found files
for output in snakemake.output:
file_found = False
for file_ending in created_files:
if output.endswith(file_ending):
shell(f"cp {created_files[file_ending]} {output}")
file_found = True
break
if not file_found:
created_files_list = [f"{e}" for e in created_files]
raise IOError(
f"Could not create file {output}, possible files ends with {created_files_list}"
)
GATK MUTECT2¶
Call somatic SNVs and indels via local assembly of haplotypes
URL: https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2
Example¶
This wrapper can be used in the following way:
rule mutect2:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
output:
vcf="variant/{sample}.vcf",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v2.2.1/bio/gatk/mutect"
rule mutect2_bam:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
output:
vcf="variant_bam/{sample}.vcf",
bam="variant_bam/{sample}.bam",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v2.2.1/bio/gatk/mutect"
rule mutect2_complete:
input:
fasta="genome/genome.fasta",
map="mapped/{sample}.bam",
intervals="genome/intervals.bed",
output:
vcf="variant_complete/{sample}.vcf",
bam="variant_complete/{sample}.bam",
f1r2="counts/{sample}.f1r2.tar.gz",
message:
"Testing Mutect2 with {wildcards.sample}"
threads: 1
resources:
mem_mb=1024,
log:
"logs/mutect_{sample}.log",
wrapper:
"v2.2.1/bio/gatk/mutect"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
map
: Mapped reads (SAM/BAM/CRAM)fasta
: Reference Fasta fileintervals
: Optional path to a BED interval filepon
: Optional path to Panel of Normals (flagged as BETA)germline
: Optional path to known germline variants
Output:
vcf
: Path to variant filebam
: Optional path to output bam filef1r2
: Optional path to f1r2 count file
Params¶
extra
: Optional parameters for GATK Mutect2use_parallelgc
: Automatically add “-XX:ParallelGCThreads={snakemake.threads}” to your command line. Set to True if your architecture supports ParallelGCThreads.use_omp
: Automatically set OMP_NUM_THREADS environment variable. Set to True if your java architecture uses OMP threads.java_opts
: allows for additional arguments to be passed to the java compiler (not for -XmX or -Djava.io.tmpdir, -XX:ParallelGCThreads, since they are handled automatically).
Authors¶
- Thibault Dayris
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for GATK4 Mutect2"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# On non-omp systems, and in case OMP_NUM_THREADS
# was not set, define OMP_NUM_THREADS through python
if "OMP_NUM_THREADS" not in os.environ.keys() and snakemake.params.get(
"use_omp", False
):
os.environ["OMP_NUM_THREADS"] = snakemake.threads
bam_output = snakemake.output.get("bam", "")
if bam_output:
bam_output = f"--bam-output {bam_output }"
germline_resource = snakemake.input.get("germline", "")
if germline_resource:
germline_resource = f"--germline-resource {germline_resource}"
intervals = snakemake.input.get("intervals", "")
if intervals:
intervals = f"--intervals {intervals}"
f1r2 = snakemake.output.get("f1r2", "")
if f1r2:
f1r2 = f"--f1r2-tar-gz {f1r2}"
pon = snakemake.input.get("pon", "")
if pon:
pon = f"--panel-of-normals {pon}"
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
# In case Java execution environment suits GC parallel
# calls, these must be given as optional java parameters
if snakemake.params.get("use_parallelgc", False):
if "UseParallelGC" not in java_opts:
java_opts += " -XX:+UseParallelGC "
if "ParallelGCThreads" not in java_opts:
java_opts += f" -XX:ParallelGCThreads={snakemake.threads}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' Mutect2" # Tool and its subprocess
" --native-pair-hmm-threads {snakemake.threads}"
" --input {snakemake.input.map}" # Path to input mapping file
" --reference {snakemake.input.fasta}" # Path to reference fasta file
" {f1r2}" # Optional path to output f1r2 count file
" {germline_resource}" # Optional path to optional germline resource VCF
" {intervals}" # Optional path to optional bed intervals
" {pon} " # Optional path to panel of normals
" {extra}" # Extra parameters
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}" # Path to output vcf file
" {bam_output}" # Path to output bam file, optional
" {log}" # Logging behaviour
)
GATK PRINTREADSSPARK¶
Write reads from SAM format file (SAM/BAM/CRAM) that pass specified criteria to a new file. This is the version that can be run on Spark.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570521694747-PrintReadsSpark
Example¶
This wrapper can be used in the following way:
rule gatk_printreadsspark:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
dict="genome.dict",
output:
bam="{sample}.bam",
log:
"logs/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
#spark_runner="", # optional, local by default
#spark_v2.2.1="", # optional
#spark_extra="", # optional
resources:
mem_mb=1024,
threads: 8
wrapper:
"v2.2.1/bio/gatk/printreadsspark"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments for printreadsspark.
- The spark_runner param = “LOCAL”|”SPARK”|”GCS” allows to set the spark_runner. Set the parameter to “LOCAL” or don’t set it at all to run on local machine.
- The spark_master param allows to set the URL of the Spark Master to submit the job. Set to “local[number_of_cores]” for local execution. Don’t set it at all for local execution with number of cores determined by snakemake.
- The spark_extra param allows for additional spark arguments.
Software dependencies¶
gatk4=4.3.0.0
snakemake-wrapper-utils=0.5.2
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
spark_runner = snakemake.params.get("spark_runner", "LOCAL")
spark_master = snakemake.params.get(
"spark_master", "local[{}]".format(snakemake.threads)
)
spark_extra = snakemake.params.get("spark_extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' PrintReadsSpark"
" --input {snakemake.input.bam}"
" --reference {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.bam}"
" -- --spark-runner {spark_runner} --spark-master {spark_master} {spark_extra}"
" {log}"
)
GATK SCATTERINTERVALSBYNS¶
Run gatk ScatterIntervalsByNs.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570421542811-ScatterIntervalsByNs-Picard-
Example¶
This wrapper can be used in the following way:
rule gatk_scatter_interval_by_ns:
input:
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
output:
intervals="genome.intervals",
log:
"logs/genome.log",
params:
extra="--MAX_TO_MERGE 10 --OUTPUT_TYPE ACGT",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/scatterintervalsbyns"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' ScatterIntervalsByNs"
" --REFERENCE {snakemake.input.ref}"
" {extra}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.intervals}"
" {log}"
)
GATK SELECTVARIANTS¶
Run gatk SelectVariants.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570332289307-SelectVariants
Example¶
This wrapper can be used in the following way:
rule gatk_select:
input:
vcf="calls/all.vcf",
ref="genome.fasta",
output:
vcf="calls/snvs.vcf",
log:
"logs/gatk/select/snvs.log",
params:
extra="--select-type-to-include SNP", # optional filter arguments, see GATK docs
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/selectvariants"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' SelectVariants"
" --variant {snakemake.input.vcf}"
" --reference {snakemake.input.ref}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK SPLITINTERVALS¶
This tool takes in intervals via the standard arguments of IntervalArgumentCollection and splits them into interval files for scattering. The resulting files contain equal number of bases. Standard GATK engine arguments include -L and -XL, interval padding, and interval set rule etc. For example, for the -L argument, the tool accepts GATK-style intervals (.list or .intervals), BED files and VCF files. See –subdivision-mode parameter for more options.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570513631387-SplitIntervals
Example¶
This wrapper can be used in the following way:
rule gatk_split_interval_list:
input:
intervals="genome.interval_list",
ref="genome.fasta",
output:
bed=multiext("out/genome", ".00.bed", ".01.bed", ".02.bed"),
log:
"logs/genome.log",
params:
extra="--subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/splitintervals"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments, but not –scatter-count, –output, –interval-file-prefix, –interval-file-num-digits, or –extension (automatically inferred from output files).
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import os
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
n_out_files = len(snakemake.output)
assert n_out_files > 1, "you need to specify more than 2 output files!"
prefix = Path(os.path.commonprefix(snakemake.output))
suffix = os.path.commonprefix([file[::-1] for file in snakemake.output])[::-1]
chunk_labels = [
out.removeprefix(str(prefix)).removesuffix(suffix) for out in snakemake.output
]
assert all(
[chunk_label.isnumeric() for chunk_label in chunk_labels]
), "all chunk labels have to be numeric!"
len_chunk_labels = set([len(chunk_label) for chunk_label in chunk_labels])
assert len(len_chunk_labels) == 1, "all chunk labels must have the same length!"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' SplitIntervals"
" --intervals {snakemake.input.intervals}"
" --reference {snakemake.input.ref}"
" --scatter-count {n_out_files}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {prefix.parent}"
" --interval-file-prefix {prefix.name:q}"
" --interval-file-num-digits {len_chunk_labels}"
" --extension {suffix:q}"
" {log}"
)
GATK SPLITNCIGARREADS¶
Run gatk SplitNCigarReads.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570487998491-SplitNCigarReads
Example¶
This wrapper can be used in the following way:
rule splitncigarreads:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
output:
"split/{sample}.bam",
log:
"logs/gatk/splitNCIGARreads/{sample}.log",
params:
extra="", # optional
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/splitncigarreads"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Jan Forster
- Filipe G. Vieira
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' SplitNCigarReads"
" --reference {snakemake.input.ref}"
" --input {snakemake.input.bam}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output[0]}"
" {log}"
)
GATK VARIANTANNOTATOR¶
Run gatk VariantAnnotator.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570271608219-VariantAnnotator
Example¶
This wrapper can be used in the following way:
rule gatk_annotator:
input:
vcf="calls/snvs.vcf.gz",
aln="mapped/a.bam",
bai="mapped/a.bam.bai",
ref="genome.fasta",
db="calls/snvs.vcf.gz",
# intervals="targets.bed",
output:
vcf="snvs.annot.vcf",
log:
"logs/gatk/annotator/snvs.log",
params:
extra="--resource-allele-concordance -A Coverage --expression db.END",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/variantannotator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- VCF file
- BAM file
- reference genome
- VCF of known variation
Output:
- annotated VCF file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
input = snakemake.input.get("aln", "")
if input:
input = f"--input {input}"
reference = snakemake.input.get("ref", "")
if reference:
reference = f"--reference {reference}"
dbsnp = snakemake.input.get("known", "")
if dbsnp:
dbsnp = f"--dbsnp {dbsnp}"
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
resources = [
f"--resource:{name} {file}"
for name, file in snakemake.input.items()
if name not in ["vcf", "aln", "ref", "known", "intervals", "bai"]
]
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' VariantAnnotator"
" --variant {snakemake.input.vcf}"
" {input}"
" {reference}"
" {dbsnp}"
" {intervals}"
" {resources}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK VARIANTEVAL¶
Run gatk VariantEval.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570243836187-VariantEval-BETA-
Example¶
This wrapper can be used in the following way:
rule gatk_varianteval:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
dict="genome.dict",
# comp="calls/comp.vcf", # optional comparison VCF
output:
vcf="snvs.varianteval.grp",
log:
"logs/gatk/varianteval/snvs.log",
params:
extra="", # optional arguments, see GATK docs
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/varianteval"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.3.0.0
snakemake-wrapper-utils=0.5.2
Input/Output¶
Input:
- vcf files
- BAM/CRAM files (optional)
- reference genome (optional)
- reference dictionary (optional)
- vcf.gz of known variants (optional)
- PED (pedigree) file (optional)
Output:
- Evaluation tables detailing the results of the eval modules on VCF file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
vcf = snakemake.input.vcf
if isinstance(vcf, str):
vcf = "--eval {}".format(vcf)
else:
vcf = list(map("--eval {}".format, vcf))
bam = snakemake.input.get("bam", "")
if bam:
if isinstance(bam, str):
bam = "--input {}".format(bam)
else:
bam = list(map("--input {}".format, bam))
ref = snakemake.input.get("ref", "")
if ref:
ref = "--reference " + ref
ref_dict = snakemake.input.get("dict", "")
if ref_dict:
ref_dict = "--sequence-dictionary " + ref_dict
known = snakemake.input.get("known", "")
if known:
known = "--dbsnp " + known
comp = snakemake.input.get("comp", "")
if comp:
if isinstance(comp, str):
comp = "--comparison {}".format(comp)
else:
comp = list(map("--comparison {}".format, comp))
ped = snakemake.input.get("ped", "")
if ped:
ped = "--pedigree " + ped
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' VariantEval"
" {vcf}"
" {bam}"
" {ref}"
" {ref_dict}"
" {known}"
" {ped}"
" {comp}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output[0]}"
" {log}"
)
GATK VARIANTFILTRATION¶
Run gatk VariantFiltration.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570403488667-VariantFiltration
Example¶
This wrapper can be used in the following way:
rule gatk_filter:
input:
vcf="calls/snvs.vcf",
ref="genome.fasta",
# intervals="targets.bed",
output:
vcf="calls/snvs.filtered.vcf",
log:
"logs/gatk/filter/snvs.log",
params:
filters={"myfilter": "AB < 0.2 || MQ0 > 50"},
extra="", # optional arguments, see GATK docs
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/variantfiltration"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
filters = [
"--filter-name {} --filter-expression '{}'".format(name, expr.replace("'", "\\'"))
for name, expr in snakemake.params.filters.items()
]
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' VariantFiltration"
" --variant {snakemake.input.vcf}"
" --reference {snakemake.input.ref}"
" {filters}"
" {intervals}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK VARIANTRECALIBRATOR¶
Run gatk VariantRecalibrator.
URL: https://gatk.broadinstitute.org/hc/en-us/articles/9570466678811-VariantRecalibrator
Example¶
This wrapper can be used in the following way:
from snakemake.remote import HTTP
https = HTTP.RemoteProvider(allow_redirects=True)
rule haplotype_caller:
input:
vcf=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/phase1.projectConsensus.chr20.1M-10M.raw.snps.vcf"
),
ref=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/human_g1k_v37.20.21.fasta"
),
fai=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/human_g1k_v37.20.21.fasta.fai"
),
dict=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/human_g1k_v37.20.21.dict"
),
mills=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/ALL.wgs.indels_mills_devine_hg19_leftAligned_collapsed_double_hit.sites.20.1M-10M.vcf"
),
mills_idx=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/ALL.wgs.indels_mills_devine_hg19_leftAligned_collapsed_double_hit.sites.20.1M-10M.vcf.idx"
),
omni=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/Omni25_sites_1525_samples.b37.20.1M-10M.vcf"
),
omni_idx=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/Omni25_sites_1525_samples.b37.20.1M-10M.vcf.idx"
),
g1k=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/combined.phase1.chr20.raw.indels.filtered.sites.1M-10M.vcf"
),
g1k_idx=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/combined.phase1.chr20.raw.indels.filtered.sites.1M-10M.vcf.idx"
),
dbsnp=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/dbsnp_132_b37.leftAligned.20.1M-10M.vcf"
),
dbsnp_idx=https.remote(
"github.com/broadinstitute/gatk/raw/4.2.5.0/src/test/resources/large/VQSR/dbsnp_132_b37.leftAligned.20.1M-10M.vcf.idx"
),
output:
vcf="calls/all.recal.vcf",
idx="calls/all.recal.vcf.idx",
tranches="calls/all.tranches",
log:
"logs/gatk/variantrecalibrator.log",
params:
mode="SNP", # set mode, must be either SNP, INDEL or BOTH
resources={
"mills": {"known": False, "training": True, "truth": True, "prior": 15.0},
"omni": {"known": False, "training": True, "truth": False, "prior": 12.0},
"g1k": {"known": False, "training": True, "truth": False, "prior": 10.0},
"dbsnp": {"known": True, "training": False, "truth": False, "prior": 2.0},
},
annotation=["MQ", "QD", "SB"],
extra="--max-gaussians 2", # optional
threads: 1
resources:
mem_mb=1024,
wrapper:
"bio/gatk/variantrecalibrator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.6.1
google-cloud-sdk
google-crc32c
Authors¶
- Johannes Köster
- Jake VanCampen
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
def fmt_res(resname, resparams):
fmt_bool = lambda b: str(b).lower()
try:
f = snakemake.input.get(resname)
except KeyError:
raise RuntimeError(
f"There must be a named input file for every resource (missing: {resname})"
)
return "{},known={},training={},truth={},prior={} {}".format(
resname,
fmt_bool(resparams["known"]),
fmt_bool(resparams["training"]),
fmt_bool(resparams["truth"]),
resparams["prior"],
f,
)
annotation_resources = [
"--resource:{}".format(fmt_res(resname, resparams))
for resname, resparams in snakemake.params["resources"].items()
]
annotation = list(map("-an {}".format, snakemake.params.annotation))
tranches = snakemake.output.get("tranches", "")
if tranches:
tranches = f"--tranches-file {tranches}"
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' VariantRecalibrator"
" --variant {snakemake.input.vcf}"
" --reference {snakemake.input.ref}"
" --mode {snakemake.params.mode}"
" {annotation_resources}"
" {tranches}"
" {annotation}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.vcf}"
" {log}"
)
GATK VARIANTSTOTABLE¶
Run gatk VariantsToTable
URL: https://gatk.broadinstitute.org/hc/en-us/articles/360036896892-VariantsToTable
Example¶
This wrapper can be used in the following way:
rule gatk_variantstotable:
input:
vcf="calls/snvs.vcf",
# intervals="intervals.bed",
output:
tab="calls/snvs.tab",
log:
"logs/gatk/varintstotable.log",
params:
extra="-F CHROM -F POS -F TYPE -GF AD",
java_opts="", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk/variantstotable"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
Software dependencies¶
gatk4=4.4.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- A VCF file to convert to a table
Output:
- A tab-delimited file containing the values of the requested fields in the VCF file
Authors¶
- Dmitry Bespiatykh
Code¶
"""Snakemake wrapper for GATK VariantsToTable"""
__author__ = "Dmitry Bespiatykh"
__copyright__ = "Copyright 2023, Dmitry Bespiatykh"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
intervals = snakemake.input.get("intervals", "")
if not intervals:
intervals = snakemake.params.get("intervals", "")
if intervals:
intervals = "--intervals {}".format(intervals)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"gatk --java-options '{java_opts}' VariantsToTable"
" --variant {snakemake.input.vcf}"
" {intervals}"
" {extra}"
" --tmp-dir {tmpdir}"
" --output {snakemake.output.tab}"
" {log}"
)
GATK3¶
For gatk3, the following wrappers are available:
GATK3 BASERECALIBRATOR¶
Run gatk3 BaseRecalibrator.
Example¶
This wrapper can be used in the following way:
rule baserecalibrator:
input:
bam="{sample}.bam",
bai="{sample}.bai",
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
known="dbsnp.vcf.gz",
known_idx="dbsnp.vcf.gz.tbi",
output:
recal_table="{sample}.recal_data_table",
log:
"logs/gatk3/bqsr/{sample}.log",
params:
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
resources:
mem_mb=1024,
threads: 16
wrapper:
"v2.2.1/bio/gatk3/baserecalibrator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param allows for additional program arguments.
- For more information see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Software dependencies¶
gatk=3.8
python=3.11.3
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)
bed = snakemake.params.get("bed", "")
if bed:
bed = f"--intervals {bed}"
input_known = snakemake.input.get("known", "")
if input_known:
if isinstance(input_known, str):
input_known = [input_known]
input_known = list(map("--knownSites {}".format, input_known))
shell(
"gatk3 {java_opts}"
" --analysis_type BaseRecalibrator"
" --num_cpu_threads_per_data_thread {snakemake.threads}"
" --input_file {snakemake.input.bam}"
" {input_known}"
" --reference_sequence {snakemake.input.ref}"
" {bed}"
" {extra}"
" --out {snakemake.output}"
" {log}"
)
GATK3 INDELREALIGNER¶
Run gatk3 IndelRealigner
Example¶
This wrapper can be used in the following way:
rule indelrealigner:
input:
bam="{sample}.bam",
bai="{sample}.bai",
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
known="dbsnp.vcf.gz",
known_idx="dbsnp.vcf.gz.tbi",
target_intervals="{sample}.intervals",
output:
bam="{sample}.realigned.bam",
bai="{sample}.realigned.bai",
log:
"logs/gatk3/indelrealigner/{sample}.log",
params:
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
threads: 16
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/gatk3/indelrealigner"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (memory is automatically inferred from resources and temp dir from output.java_temp.
- The extra param allows for additional program arguments.
- For more information see, https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-tutorials/(howto)_Perform_local_realignment_around_indels.md
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Software dependencies¶
gatk=3.8
python=3.11.4
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- bam file
- reference genome
- target intervals to realign
- bed file (optional)
- vcf files known variation (optional)
Output:
- indel realigned bam file
- indel realigned bai file (optional)
- temp dir (optional)
Authors¶
- Patrik Smeds
- Filipe G. Vieira
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)
bed = snakemake.input.get("bed", "")
if bed:
bed = f"--intervals {bed}"
known = snakemake.input.get("known", "")
if known:
if isinstance(known, str):
known = f"--knownAlleles {known}"
else:
known = list(map("----knownAlleles {}".format, known))
output_bai = snakemake.output.get("bai", None)
if output_bai is None:
extra += " --disable_bam_indexing"
shell(
"gatk3 {java_opts}"
" --analysis_type IndelRealigner"
" --input_file {snakemake.input.bam}"
" --reference_sequence {snakemake.input.ref}"
" {known}"
" {bed}"
" --targetIntervals {snakemake.input.target_intervals}"
" {extra}"
" --out {snakemake.output.bam}"
" {log}"
)
GATK3 PRINTREADS¶
Run gatk3 PrintReads
Example¶
This wrapper can be used in the following way:
rule printreads:
input:
bam="{sample}.bam",
bai="{sample}.bai",
# recal_data="{sample}.recal_data_table",
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
output:
bam="{sample}.bqsr.bam",
bai="{sample}.bqsr.bai",
log:
"logs/gatk/bqsr/{sample}.log",
params:
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
resources:
mem_mb=1024,
threads: 16
wrapper:
"v2.2.1/bio/gatk3/printreads"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-Xmx4G” for one, and “-Xmx4G -XX:ParallelGCThreads=10” for two options.
- The extra param allows for additional program arguments.
- For more information see, https://software.broadinstitute.org/gatk/documentation/article?id=11050
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Software dependencies¶
gatk=3.8
python=3.11.3
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)
bqsr = snakemake.input.get("recal_data", "")
if bqsr:
bqsr = f"--BQSR {bqsr}"
shell(
"gatk3 {java_opts}"
" --analysis_type PrintReads"
" --input_file {snakemake.input.bam}"
" --reference_sequence {snakemake.input.ref}"
" {bqsr}"
" {extra}"
" --out {snakemake.output.bam}"
" {log}"
)
GATK3 REALIGNERTARGETCREATOR¶
Run gatk3 RealignerTargetCreator
Example¶
This wrapper can be used in the following way:
rule realignertargetcreator:
input:
bam="{sample}.bam",
bai="{sample}.bai",
ref="genome.fasta",
fai="genome.fasta.fai",
dict="genome.dict",
known="dbsnp.vcf.gz",
known_idx="dbsnp.vcf.gz.tbi",
output:
intervals="{sample}.intervals",
log:
"logs/gatk/realignertargetcreator/{sample}.log",
params:
extra="--defaultBaseQualities 20 --filter_reads_with_N_cigar", # optional
resources:
mem_mb=1024,
threads: 16
wrapper:
"v2.2.1/bio/gatk3/realignertargetcreator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (memory is automatically inferred from resources and temp dir from output.java_temp.
- The extra param allows for additional program arguments.
- For more information see, https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-tutorials/(howto)_Perform_local_realignment_around_indels.md
- Gatk3.jar is not included in the bioconda package, i.e it need to be added to the conda environment manually.
Software dependencies¶
gatk=3.8
python=3.11.0
snakemake-wrapper-utils=0.5.2
Input/Output¶
Input:
- bam file
- reference genome
- bed file (optional)
- vcf files known variation (optional)
Output:
- target intervals
- temp dir (optional)
Authors¶
- Patrik Smeds
- Filipe G. Vieira
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
java_opts = get_java_opts(snakemake)
bed = snakemake.input.get("bed", "")
if bed:
bed = f"--intervals {bed}"
known = snakemake.input.get("known", "")
if known:
if isinstance(known, str):
known = f"--known {known}"
else:
known = list(map("--known {}".format, known))
shell(
"gatk3 {java_opts}"
" --analysis_type RealignerTargetCreator"
" --num_threads {snakemake.threads}"
" --input_file {snakemake.input.bam}"
" --reference_sequence {snakemake.input.ref}"
" {known}"
" {bed}"
" {extra}"
" --out {snakemake.output.intervals}"
" {log}"
)
GDC-API¶
For gdc-api, the following wrappers are available:
GDC API-BASED DATA DOWNLOAD OF BAM SLICES¶
Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.
Example¶
This wrapper can be used in the following way:
rule gdc_api_bam_slice_download:
output:
bam="raw/{sample}.bam",
log:
"logs/gdc-api/bam-slicing/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
# a gdc_token is required for controlled access and all BAM files
# on GDC seem to be controlled access (adjust if this changes)
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# provide wanted `region=` or `gencode=` slices joined with `&`
slices="region=chr22®ion=chr5:1000-2000®ion=unmapped&gencode=BRCA2",
# extra command line arguments passed to curl
extra=""
wrapper:
"v2.2.1/bio/gdc-api/bam-slicing"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- BAM file UUIDs can be found via the GDC repository query, either by clicking on individual files or systematically by creating a cart and downloading a manifest file.
- Slicing can be performed using region syntax like ‘region=chr20:3000-4000’, gene name syntax like ‘gencode=BRCA2’ (this uses Gene symbols of GENCODE v22) or ‘region=unmapped’ to get unmapped reads. Multiple such entries can be joined with ampersands (e.g.
region=chr5:200-300®ion=unmapped&gencode=BRCA1
). - All BAM data files in GDC are controlled access according to this GDC repository query, thus a GDC access token file is always required and must be provided via
params: gdc_token: "path/to/access_token.txt"
. Should this change in the future, feel free to adjust this wrapper or contact the original author.
Software dependencies¶
curl=8.0.1
Authors¶
- David Lähnemann
Code¶
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
uuid = snakemake.params.get("uuid", "")
if uuid == "":
raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")
token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
raise ValueError(
"You need to provide a GDC data access token file via the 'token' in 'params'."
)
token = ""
with open(token_file) as tf:
token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)
slices = snakemake.params.get("slices", "")
if slices == "":
raise ValueError(
"You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s) via the 'slices' in 'params'."
)
extra = snakemake.params.get("extra", "")
shell(
"curl --silent"
" --header $CURL_HEADER_TOKEN"
" 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
" {extra}"
" --output {snakemake.output.bam} {log}"
)
if os.path.getsize(snakemake.output.bam) < 100000:
with open(snakemake.output.bam) as f:
if "error" in f.read():
shell("cat {snakemake.output.bam} {log}")
raise RuntimeError(
"Your GDC API request returned an error, check your log file for the error message."
)
GDC-CLIENT¶
For gdc-client, the following wrappers are available:
GDC DATA TRANSFER TOOL DATA DOWNLOAD¶
Download GDC data files with the gdc-client.
Example¶
This wrapper can be used in the following way:
rule gdc_download:
output:
# the file extension (up to two components, here .maf.gz), has
# to uniquely map to one of the files downloaded for that UUID
"raw/{sample}.maf.gz"
log:
"logs/gdc-client/download/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
# a gdc_token is only required for controlled access samples,
# leave blank otherwise (`gdc_token=""`) or skip this param entirely
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# for valid extra command line arguments, check command line help or:
# https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
extra = ""
threads: 4
wrapper:
"v2.2.1/bio/gdc-client/download"
rule gdc_download_bam:
output:
# specify all the downloaded files you want to keep, as all other
# downloaded files will be removed automatically e.g. for
# BAM data this could be
"raw/{sample}.bam",
"raw/{sample}.bam.bai",
"raw/{sample}.annotations.txt",
directory("raw/{sample}/logs")
log:
"logs/gdc-client/download/{sample}.log"
params:
# to use this rule flexibly, make uuid a function that maps your
# sample names of choice to the UUIDs they correspond to (they are
# the column `id` in the GDC manifest files, which can be used to
# systematically construct sample sheets)
uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb",
# a gdc_token is only required for controlled access samples,
# leave blank otherwise (`gdc_token=""`) or skip this param entirely
gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
# for valid extra command line arguments, check command line help or:
# https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/
extra = ""
threads: 4
wrapper:
"v2.2.1/bio/gdc-client/download"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gdc-client=1.6.1
Authors¶
- David Lähnemann
Code¶
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
from tempfile import TemporaryDirectory
import glob
uuid = snakemake.params.get("uuid", "")
if uuid == "":
raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")
extra = snakemake.params.get("extra", "")
token = snakemake.params.get("gdc_token", "")
if token != "":
token = "--token-file {}".format(token)
with TemporaryDirectory() as tempdir:
shell(
"gdc-client download"
" {token}"
" {extra}"
" -n {snakemake.threads} "
" --log-file {snakemake.log} "
" --dir {tempdir}"
" {uuid}"
)
for out_path in snakemake.output:
tmp_path = path.join(tempdir, uuid, path.basename(out_path))
if not path.exists(tmp_path):
(root, ext1) = path.splitext(out_path)
paths = glob.glob(path.join(tempdir, uuid, "*" + ext1))
if len(paths) > 1:
(root, ext2) = path.splitext(root)
paths = glob.glob(path.join(tempdir, uuid, "*" + ext2 + ext1))
if len(paths) == 0:
raise ValueError(
"{} file extension {} does not match any downloaded file.\n"
"Are you sure that UUID {} provides a file of such format?\n".format(
out_path, ext1, uuid
)
)
if len(paths) > 1:
raise ValueError(
"Found more than one downloaded file with extension '{}':\n"
"{}\n"
"Cannot match requested output file {} unambiguously.\n".format(
ext2 + ext1, paths, out_path
)
)
tmp_path = paths[0]
shell("mv {tmp_path} {out_path}")
GENEFUSE¶
A tool to detect and visualize target gene fusions by scanning FASTQ files directly.
URL: https://github.com/OpenGene/GeneFuse
Example¶
This wrapper can be used in the following way:
rule genefuse:
input:
fastq1="reads/{sample}_R1.fastq",
fastq2="reads/{sample}_R2.fastq",
config="genes.csv",
reference="genome.fasta",
output:
html="{sample}_genefuse_report.html",
json="{sample}_genefuse_report.json",
fusions="{sample}_fusions.txt",
log:
"logs/{sample}_genefuse.log",
params:
# optional parameters
extra="",
threads:1
wrapper:
"v2.2.1/bio/genefuse"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
genefuse=0.8.0
Input/Output¶
Input:
- fastq files
- gene fuse settings files
- refeference genome
Output:
- txt file with fusions
- html report
- json report
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2022, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
with TemporaryDirectory() as tempdir:
# Executed shell command
html_path = f"{tempdir}/genefuse.html"
json_path = f"{tempdir}/genefuse.json"
txt_path = f"{tempdir}/gene_fuse_fusions.txt"
shell(
"(genefuse "
"-r {snakemake.input.reference} "
"-t {snakemake.threads} "
"-f {snakemake.input.config} "
"-1 {snakemake.input.fastq1} "
"-2 {snakemake.input.fastq2} "
"-h {html_path} "
"-j {json_path} "
"{extra} > "
"{txt_path}) "
"{log}"
)
if snakemake.output.get("html", None):
shell("mv {html_path} {snakemake.output.html}")
if snakemake.output.get("json", None):
shell("mv {json_path} {snakemake.output.json}")
if snakemake.output.get("fusions", None):
shell("mv {txt_path} {snakemake.output.fusions}")
GENOMEPY¶
Download genomes the easy way: https://github.com/vanheeringen-lab/genomepy
Example¶
This wrapper can be used in the following way:
rule genomepy:
output:
multiext(
"{assembly}/{assembly}",
".fa",
".fa.fai",
".fa.sizes",
".gaps.bed",
".annotation.gtf.gz",
".blacklist.bed",
),
log:
"logs/genomepy_{assembly}.log",
params:
provider="UCSC", # optional, defaults to ucsc. Choose from ucsc, ensembl, and ncbi
cache: "omit-software" # mark as eligible for between workflow caching
wrapper:
"v2.2.1/bio/genomepy"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
genomepy=0.15.0
Params¶
provider
: which provider to download from, defaults to UCSC (choose from UCSC, Ensembl, NCBI).
Authors¶
- Maarten van der Sande
Code¶
__author__ = "Maarten van der Sande"
__copyright__ = "Copyright 2020, Maarten van der Sande"
__email__ = "M.vanderSande@science.ru.nl"
__license__ = "MIT"
from snakemake.shell import shell
# Optional parameters
provider = snakemake.params.get("provider", "UCSC")
# set options for plugins
all_plugins = "blacklist,bowtie2,bwa,gmap,hisat2,minimap2,star"
req_plugins = ","
if any(["blacklist" in out for out in snakemake.output]):
req_plugins = "blacklist,"
annotation = ""
if any(["annotation" in out for out in snakemake.output]):
annotation = "--annotation"
# parse the genome dir
genome_dir = "./"
if snakemake.output[0].count("/") > 1:
genome_dir = "/".join(snakemake.output[0].split("/")[:-1])
log = snakemake.log
# Finally execute genomepy
shell(
"""
# set a trap so we can reset to original user's settings
active_plugins=$(genomepy config show | grep -Po '(?<=- ).*' | paste -s -d, -) || echo ""
trap "genomepy plugin disable {{{all_plugins}}} >> {log} 2>&1;\
genomepy plugin enable {{$active_plugins,}} >> {log} 2>&1" EXIT
# disable all, then enable the ones we need
genomepy plugin disable {{{all_plugins}}} > {log} 2>&1
genomepy plugin enable {{{req_plugins}}} >> {log} 2>&1
# install the genome
genomepy install {snakemake.wildcards.assembly} \
{provider} {annotation} -g {genome_dir} >> {log} 2>&1
"""
)
GENOMESCOPE¶
Reference-free profiling of polyploid genomes
URL: https://github.com/tbenavi1/genomescope2.0
Example¶
This wrapper can be used in the following way:
rule genomescope:
input:
hist="{sample}.hist",
output:
multiext(
"{sample}/",
"linear_plot.png",
"log_plot.png",
"model.txt",
"progress.txt",
"SIMULATED_testing.tsv",
"summary.txt",
"transformed_linear_plot.png",
"transformed_log_plot.png",
),
log:
"logs/genomescope/{sample}.log",
params:
extra="--kmer_length 32 --testing",
wrapper:
"v2.2.1/bio/genomescope"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (kmer length -k/–kmer_length is mandatory).
Software dependencies¶
genomescope2=2.0
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
out_basename = os.path.commonpath(snakemake.output)
shell("genomescope2 --input {snakemake.input} {extra} --output {out_basename} {log}")
GFATOOLS¶
Tools for manipulating sequence graphs in the GFA and rGFA formats
URL: https://github.com/lh3/gfatools
Example¶
This wrapper can be used in the following way:
rule gfatools_stat:
input:
"{sample}.gfa",
output:
"{sample}.stat",
log:
"logs/{sample}.stat.log",
params:
command="stat",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_gfa2fa:
input:
"{sample}.gfa",
output:
"{sample}.fas",
log:
"logs/{sample}.gfa2fa.log",
params:
command="gfa2fa",
extra="-l 90",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_gfa2bed:
input:
"{sample}.gfa",
output:
"{sample}.bed",
log:
"logs/{sample}.gfa2bed.log",
params:
command="gfa2bed",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_blacklist:
input:
"{sample}.gfa",
output:
"{sample}.blacklist",
log:
"logs/{sample}.blacklist.log",
params:
command="blacklist",
extra="-l 100",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_bubble:
input:
"{sample}.gfa",
output:
"{sample}.bubble",
log:
"logs/{sample}.bubble.log",
params:
command="bubble",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_asm:
input:
"{sample}.gfa",
output:
"{sample}.asm",
log:
"logs/{sample}.asm.log",
params:
command="asm",
extra="-u",
wrapper:
"v2.2.1/bio/gfatools"
rule gfatools_sql:
input:
"{sample}.gfa",
output:
"{sample}.sql",
log:
"logs/{sample}.sql.log",
params:
command="sql",
wrapper:
"v2.2.1/bio/gfatools"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The command param allows to specify how to do with the GFA: view [default], stat, gfa2fa, gfa2bed, blacklist, bubble, asm, sql, or version.
Software dependencies¶
gfatools=0.5
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
command = snakemake.params.get("command", "view")
assert command in [
"view",
"stat",
"gfa2fa",
"gfa2bed",
"blacklist",
"bubble",
"asm",
"sql",
"version",
], "invalid command specified."
shell("gfatools {command} {extra} {snakemake.input[0]} > {snakemake.output[0]} {log}")
GFFREAD¶
Validate, filter, convert and perform various other operations on GFF/GTF files with Gffread
URL: http://ccb.jhu.edu/software/stringtie/gff.shtml
Example¶
This wrapper can be used in the following way:
rule test_gffread:
input:
fasta="genome.fasta",
annotation="annotation.gtf",
# ids="", # Optional path to records to keep
# nids="", # Optional path to records to drop
# seq_info="", # Optional path to sequence information
# sort_by="", # Optional path to the ordered list of reference sequences
# attr="", # Optional annotation attributes to keep.
# chr_replace="", # Optional path to <original_ref_ID> <new_ref_ID>
output:
records="transcripts.fa",
# dupinfo="", # Optional path to clustering/merging information
threads: 1
log:
"logs/gffread.log",
params:
extra="",
wrapper:
"v2.2.1/bio/gffread"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Input/output formats are automatically detected from their file extension.
Software dependencies¶
gffread=0.12.7
Input/Output¶
Input:
fasta
: Path to genome file (FASTA formatted).annotation
: Path to genome annotation (GTF/GTF/BED formatted).ids
: Optional path to records/transcript to keep.nids
: Optional path to records/transcripts to discard.seq_info
: Optional path to sequence information, a TSV formatted text file containing <seq-name> <seq-length> <seq-description>sort_by
: Optional path to a text file containing the ordered list of reference sequences.attr
: Optional text file containing comma-separated list of annotation attributes to keep.chr_replace
: Optional path to a TSV-formatted text file containing <original_ref_ID> <new_ref_ID>.
Output:
records
: Path to genome sequence/annotation in the requested format, containing the requested information.dupinfo
: Optional path to clustering/merging information
Authors¶
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__mail__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
annotation = snakemake.input.annotation
records = snakemake.output.records
# Input format control
if annotation.endswith(".bed"):
extra += " --in-bed "
elif annotation.endswith(".tlf"):
extra += " --in-tlf "
elif annotation.endswith(".gtf"):
pass
else:
raise ValueError("Unknown annotation format")
# Output format control
if records.endswith((".gtf", ".gff", ".gff3")):
extra += " -T "
elif records.endswith(".bed"):
extra += " --bed "
elif records.endswith(".tlf"):
extra += " --tlf "
elif records.endswith((".fasta", ".fa", ".fna")):
pass
else:
raise ValueError("Unknown records format")
# Optional input files
ids = snakemake.input.get("ids", "")
if ids:
extra += f" --ids {ids} "
nids = snakemake.input.get("nids", "")
if nids:
if ids:
raise ValueError(
"Provide either sequences ids to keep, or to drop."
" Or else, an empty file is produced."
)
extra += f" --nids {nids} "
seq_info = snakemake.input.get("seq_info", "")
if seq_info:
extra += f" -s {seq_info} "
sort_by = snakemake.input.get("sort_by", "")
if sort_by:
extra += f" --sort-by {sort_by} "
attr = snakemake.input.get("attr", "")
if attr:
if not records.endswith((".gtf", ".gff", ".gff3")):
raise ValueError(
"GTF attributes specified in input, "
"but records are not in GTF/GFF format."
)
extra += f" --attrs {attr} "
chr_replace = snakemake.input.get("chr_replace", "")
if chr_replace:
extra += f" -m {chr_replace} "
# Optional output files
dupinfo = snakemake.output.get("dupinfo", "")
if dupinfo:
extra += f" -d {dupinfo} "
shell(
"gffread {extra} "
"-o {records} "
"{snakemake.input.fasta} "
"{annotation} "
"{log} "
)
GRIDSS¶
For gridss, the following wrappers are available:
GRIDSS ASSEMBLE¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. assemble
performs GRIDSS breakend assembly. Documentation at: https://github.com/PapenfussLab/gridss
Example¶
This wrapper can be used in the following way:
WORKING_DIR = "working_dir"
samples = ["A", "B"]
preprocess_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".idsv_metrics",
".insert_size_histogram.pdf",
".insert_size_metrics",
".mapq_metrics",
".sv.bam",
".sv.bam.bai",
".sv_metrics",
".tag_metrics",
)
assembly_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".downsampled_0.bed",
".excluded_0.bed",
".idsv_metrics",
".mapq_metrics",
".quality_distribution.pdf",
".quality_distribution_metrics",
".subsetCalled_0.bed",
".sv.bam",
".sv.bam.bai",
".tag_metrics",
)
reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
rule gridss_assemble:
input:
bams=expand("mapped/{sample}.bam", sample=samples),
bais=expand("mapped/{sample}.bam.bai", sample=samples),
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
indices=multiext("reference/genome.fasta", *reference_index_endings),
preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings)
output:
assembly="assembly/group.bam",
assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/assemble/group.log"
threads:
100
wrapper:
"v2.2.1/bio/gridss/assemble"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gridss=2.13.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss assemble"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s assemble " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} " # Threads
"--workingdir {snakemake.params.workingdir} " # Working directory
"--assembly {snakemake.output.assembly} " # Assembly output
"{snakemake.input.bams} "
"{extra}) {log}"
)
GRIDSS CALL¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. call
performs variant calling. Documentation at: https://github.com/PapenfussLab/gridss
Example¶
This wrapper can be used in the following way:
WORKING_DIR = "working_dir"
samples = ["A", "B"]
preprocess_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".idsv_metrics",
".insert_size_histogram.pdf",
".insert_size_metrics",
".mapq_metrics",
".sv.bam",
".sv.bam.bai",
".sv_metrics",
".tag_metrics",
)
assembly_endings = (
".cigar_metrics",
".coverage.blacklist.bed",
".downsampled_0.bed",
".excluded_0.bed",
".idsv_metrics",
".mapq_metrics",
".quality_distribution.pdf",
".quality_distribution_metrics",
".subsetCalled_0.bed",
".sv.bam",
".sv.bam.bai",
".tag_metrics",
)
reference_index_endings = (".amb",".ann", ".bwt", ".pac", ".sa", ".gridsscache", ".img")
rule gridss_call:
input:
bams=expand("mapped/{sample}.bam", sample=samples),
bais=expand("mapped/{sample}.bam.bai", sample=samples),
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
indices=multiext("reference/genome.fasta", *reference_index_endings),
preprocess=expand("{working_dir}/{sample}.bam.gridss.working/{sample}.bam{ending}", working_dir=[WORKING_DIR], sample=samples, ending=preprocess_endings),
assembly="assembly/group.bam",
assembly_others=expand("{working_dir}/group.bam.gridss.working/group.bam{ending}", working_dir=[WORKING_DIR], ending=assembly_endings)
output:
vcf="vcf/group.vcf",
idx="vcf/group.vcf.idx",
tmpidx=temp(WORKING_DIR + "/group.vcf.gridss.working/group.vcf.allocated.vcf.idx") # be aware the group occurs two times here
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/call/group.log"
threads:
100
wrapper:
"v2.2.1/bio/gridss/call"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gridss=2.13.2
cpulimit=0.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss call"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(export JAVA_OPTS='-XX:ActiveProcessorCount={snakemake.threads}' & "
"gridss -s call " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} " # Threads
"--workingdir {snakemake.params.workingdir} " # Working directory
"--assembly {snakemake.input.assembly} " # Assembly input from gridss assemble
"--output {snakemake.output.vcf} " # Assembly vcf
"{snakemake.input.bams} "
"{extra}) {log}"
)
GRIDSS PREPROCESS¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. preprocess
pre-processes input BAM files (can be run per file).
URL: https://github.com/PapenfussLab/gridss
Example¶
This wrapper can be used in the following way:
WORKING_DIR="working_dir"
rule gridss_preprocess:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
reference="reference/genome.fasta",
dictionary="reference/genome.dict",
refindex=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
output:
multiext("{WORKING_DIR}/{sample}.bam.gridss.working/{sample}.bam", ".cigar_metrics", ".computesamtags.changes.tsv", ".coverage.blacklist.bed", ".idsv_metrics", ".insert_size_histogram.pdf", ".insert_size_metrics", ".mapq_metrics", ".sv.bam", ".sv.bam.csi", ".tag_metrics")
params:
extra="--jvmheap 1g",
workingdir=WORKING_DIR
log:
"log/gridss/preprocess/{WORKING_DIR}/{sample}.preprocess.log"
threads:
8
wrapper:
"v2.2.1/bio/gridss/preprocess"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gridss=2.13.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss preprocess"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
reference = snakemake.input.get("reference")
dictionary = snakemake.input.get("dictionary")
if not snakemake.params.workingdir:
raise ValueError("Please set params.workingdir to provide a working directory.")
if not snakemake.input.reference:
raise ValueError("Please set input.reference to provide reference genome.")
for ending in (".amb", ".ann", ".bwt", ".pac", ".sa"):
if not path.exists("{}{}".format(reference, ending)):
raise ValueError(
"{reference}{ending} missing. Please make sure the reference was properly indexed by bwa.".format(
reference=reference, ending=ending
)
)
dictionary = path.splitext(reference)[0] + ".dict"
if not path.exists(dictionary):
raise ValueError(
"{dictionary}.dict missing. Please make sure the reference dictionary was properly created. This can be accomplished for example by CreateSequenceDictionary.jar from Picard".format(
dictionary=dictionary
)
)
shell(
"(gridss -s preprocess " # Tool
"--reference {reference} " # Reference
"--threads {snakemake.threads} "
"--workingdir {snakemake.params.workingdir} "
"{snakemake.input.bam} "
"{extra}) {log}"
)
GRIDSS SETUPREFERENCE¶
GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. It includes a genome-wide break-end assembler, as well as a structural variation caller for Illumina sequencing data. setupreference
is a once-off setup generating additional files in the same directory as the reference. WARNING multiple instances of GRIDSS attempting to perform setupreference at the same time will result in file corruption. Make sure these files are generated before running parallel GRIDSS jobs.
URL: https://github.com/PapenfussLab/gridss
Example¶
This wrapper can be used in the following way:
rule gridss_setupreference:
input:
reference="reference/genome.fasta",
output:
idx=multiext("reference/genome.fasta", ".amb", ".ann", ".bwt", ".dict", ".fai", ".pac", ".sa")
params:
extra="--jvmheap 1g"
log:
"log/gridss/setupreference.log"
wrapper:
"v2.2.1/bio/gridss/setupreference"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
gridss=2.13.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for gridss setupreference"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
shell(
"(gridss -s setupreference " # Tool
"--reference {snakemake.input.reference} " # Reference
"{extra}) {log}"
)
HAP.PY¶
For hap.py, the following wrappers are available:
HAP.PY¶
Comparison of vcf files and calculating performance metrics following GA4GH defined best practices for benchmarking small variant call sets (Krusche, P. et al. 2019, https://doi.org/10.1038/s41587-019-0054-x). Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).
Example¶
This wrapper can be used in the following way:
rule benchmark_variants:
input:
truth="truth.vcf",
query="query.vcf",
truth_regions="truth.bed",
strats="stratifications.tsv",
strat_dir="strats_dir",
genome="genome.fasta",
genome_index="genome.fasta.fai"
output:
multiext("results",".runinfo.json",".vcf.gz",".summary.csv",
".extended.csv",".metrics.json.gz",".roc.all.csv.gz",
".roc.Locations.INDEL.csv.gz",".roc.Locations.INDEL.PASS.csv.gz",
".roc.Locations.SNP.csv.gz",".roc.tsv")
params:
engine="vcfeval",
prefix=lambda wc, input, output: output[0].split('.')[0],
## parameters such as -L to left-align variants
extra="--verbose"
log: "happy.log"
threads: 2
wrapper: "v2.2.1/bio/hap.py/hap.py"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hap.py=0.3.15
rtg-tools=3.12.1
Authors¶
- Nathan D. Olson
Code¶
__author__ = "Nathan Olson"
__copyright__ = "This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply "
__email__ = "nolson@nist.gov"
__license__ = """
This software was developed by employees of the National Institute of Standards and Technology (NIST),
an agency of the Federal Government and is being made available as a public service. Pursuant to title
17 United States Code Section 105, works of NIST employees are not subject to copyright protection in
the United States. This software may be subject to foreign copyright. Permission in the United States
and in foreign countries, to the extent that NIST may hold copyright, to use, copy, modify, create
derivative works, and distribute this software and its documentation without fee is hereby granted on
a non-exclusive basis, provided that this notice and disclaimer of warranty appears in all copies.
THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY,
INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY
WARRANTY THAT THE DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE SOFTWARE WILL BE
ERROR FREE. IN NO EVENT SHALL NIST BE LIABLE FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT,
INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY CONNECTED WITH
THIS SOFTWARE, WHETHER OR NOT BASED UPON WARRANTY, CONTRACT, TORT, OR OTHERWISE, WHETHER OR NOT INJURY WAS
SUSTAINED BY PERSONS OR PROPERTY OR OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED FROM, OR AROSE OUT OF
THE RESULTS OF, OR USE OF, THE SOFTWARE OR SERVICES PROVIDED HEREUNDER.
"""
from os import path
from snakemake.shell import shell
# Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Optional parameters
engine = snakemake.params.get("engine", "")
if engine:
engine = "--engine {}".format(engine)
truth_regions = snakemake.input.get("truth_regions", "")
if truth_regions:
truth_regions = "-f {}".format(truth_regions)
strats = snakemake.input.get("strats", "")
if strats:
strats = "--stratification {}".format(strats)
shell(
"(hap.py"
" --threads {snakemake.threads}"
" {engine}"
" -r {snakemake.input.genome}"
" {extra}"
" {truth_regions}"
" {strats}"
" -o {snakemake.params.prefix}"
" {snakemake.input.truth}"
" {snakemake.input.query})"
" {log}"
)
PRE.PY¶
Preprocessing/normalisation of vcf/bcf files. Part of the hap.py suite by Illumina (see https://github.com/Illumina/hap.py/blob/master/doc/normalisation.md).
Example¶
This wrapper can be used in the following way:
rule preprocess_variants:
input:
##vcf/bcf
variants="variants.vcf",
output:
"normalized/variants.vcf.gz",
log:
"log/pre.log",
params:
## path to reference genome
genome="genome.fasta",
## parameters such as -L to left-align variants
extra="-L",
threads: 2
wrapper:
"v2.2.1/bio/hap.py/pre.py"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hap.py=0.3.15
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
## Extract arguments
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"(pre.py"
" --threads {snakemake.threads}"
" -r {snakemake.params.genome}"
" {extra}"
" {snakemake.input.variants}"
" {snakemake.output})"
" {log}"
)
HIFIASM¶
A haplotype-resolved assembler for accurate Hifi reads
URL: https://github.com/chhylp123/hifiasm
Example¶
This wrapper can be used in the following way:
rule hifiasm:
input:
fasta=[
"reads/HiFi_dataset_01.fasta.gz",
"reads/HiFi_dataset_02.fasta.gz",
],
# optional
# hic1="reads/Hi-C_dataset_R1.fastq.gz",
# hic2="reads/Hi-C_dataset_R2.fastq.gz",
output:
multiext(
"hifiasm/{sample}.",
"a_ctg.gfa",
"a_ctg.lowQ.bed",
"a_ctg.noseq.gfa",
"p_ctg.gfa",
"p_ctg.lowQ.bed",
"p_ctg.noseq.gfa",
"p_utg.gfa",
"p_utg.lowQ.bed",
"p_utg.noseq.gfa",
"r_utg.gfa",
"r_utg.lowQ.bed",
"r_utg.noseq.gfa",
),
log:
"logs/hifiasm/{sample}.log",
params:
extra="--primary -f 37 -l 1 -s 0.75 -O 1",
threads: 2
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/hifiasm"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
hifiasm=0.18.5
Input/Output¶
Input:
- PacBio HiFi reads (fasta)
- Hi-C reads (fastq; optional)
Output:
- assembly graphs (GFA)
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
hic1 = snakemake.input.get("hic1", "")
if hic1:
if isinstance(hic1, list):
hic1 = ",".join(hic1)
hic1 = "--h1 {}".format(hic1)
hic2 = snakemake.input.get("hic2", "")
if hic2:
if isinstance(hic2, list):
hic2 = ",".join(hic2)
hic2 = "--h2 {}".format(hic2)
out_prefix = os.path.commonprefix(snakemake.output).rstrip(".")
shell(
"hifiasm"
" -t {snakemake.threads}"
" {extra}"
" {hic1} {hic2}"
" -o {out_prefix}"
" {snakemake.input.fasta}"
" {log}"
)
HISAT2¶
For hisat2, the following wrappers are available:
HISAT2 ALIGN¶
Map reads with hisat2.
URL: http://daehwankimlab.github.io/hisat2
Example¶
This wrapper can be used in the following way:
rule hisat2_align:
input:
reads=["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"],
idx="index/",
output:
"mapped/{sample}.bam",
log:
"logs/hisat2_align_{sample}.log",
params:
extra="",
threads: 2
wrapper:
"v2.2.1/bio/hisat2/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The -S flag must not be used since output is already directly piped to samtools for compression.
- The –threads/-p flag must not be used since threads is set separately via the snakemake threads directive.
- The wrapper does not yet handle SRA input accessions.
- No reference index files checking is done since the actual number of files may differ depending on the reference sequence size. This is also why the index is supplied in the params directive instead of the input directive.
Software dependencies¶
hisat2=2.2.1
samtools=1.17
Params¶
idx
: prefix of index file path (required)extra
: additional parameters
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
import os
from pathlib import Path
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Run log
log = snakemake.log_fmt_shell()
# Input file wrangling
reads = snakemake.input.get("reads")
if isinstance(reads, str):
input_flags = "-U {0}".format(reads)
elif len(reads) == 1:
input_flags = "-U {0}".format(reads[0])
elif len(reads) == 2:
input_flags = "-1 {0} -2 {1}".format(*reads)
else:
raise RuntimeError(
"Reads parameter must contain at least 1 and at most 2" " input files."
)
ht2_files = Path(snakemake.input.idx).glob("*.ht2")
idx_prefix = os.path.commonprefix(list(ht2_files)).rstrip(".")
# Executed shell command
shell(
"(hisat2 {extra} "
"--threads {snakemake.threads} "
" -x {idx_prefix} {input_flags} "
" | samtools view -Sbh -o {snakemake.output[0]} -) "
" {log}"
)
HISAT2 INDEX¶
Create index with hisat2.
Example¶
This wrapper can be used in the following way:
rule hisat2_index:
input:
fasta = "{genome}.fasta"
output:
directory("index_{genome}")
params:
prefix = "index_{genome}/"
log:
"logs/hisat2_index_{genome}.log"
threads: 2
wrapper:
"v2.2.1/bio/hisat2/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hisat2=2.2.1
samtools=1.17
Input/Output¶
Input:
sequence
: list of FASTA files of list of sequences
Output:
- Directory of the hisat2 custom index.
Params¶
prefix
: prefix of index file path (required). Must be related to outputextra
: additional parameters
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for HISAT2 index"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file or a sequence is required"
input_seq = ""
if not "." in fasta:
input_seq += "-c "
input_seq += ",".join(fasta) if isinstance(fasta, list) else fasta
hisat_dir = snakemake.params.get("prefix", "")
if hisat_dir:
os.makedirs(hisat_dir)
shell(
"hisat2-build {extra} "
"-p {snakemake.threads} "
"{input_seq} "
"{snakemake.params.prefix} "
"{log}"
)
HMMER¶
For hmmer, the following wrappers are available:
HMMBUILD¶
hmmbuild: construct profile HMM(s) from multiple sequence alignment(s)
Example¶
This wrapper can be used in the following way:
rule hmmbuild_profile:
input:
"test-profile.sto"
output:
"test-profile.hmm"
log:
"logs/test-profile-hmmbuild.log"
params:
extra="",
threads: 4
wrapper:
"v2.2.1/bio/hmmer/hmmbuild"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hmmer=3.3.2
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmbuild"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
" hmmbuild {extra} --cpu {snakemake.threads} "
" {snakemake.output} {snakemake.input} {log} "
)
HMMPRESS¶
Format an HMM database into a binary format for hmmscan.
Example¶
This wrapper can be used in the following way:
rule hmmpress_profile:
input:
"test-profile.hmm"
output:
"test-profile.hmm.h3f",
"test-profile.hmm.h3i",
"test-profile.hmm.h3m",
"test-profile.hmm.h3p"
log:
"logs/hmmpress.log"
params:
extra="",
threads: 4
wrapper:
"v2.2.1/bio/hmmer/hmmpress"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hmmer=3.3.2
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmpress"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# -f Force; overwrites any previous hmmpress-ed datafiles. The default is to bitch about any existing files and ask you to delete them first.
shell("hmmpress -f {snakemake.input} {log}")
HMMSCAN¶
search protein sequence(s) against a protein profile database
Example¶
This wrapper can be used in the following way:
rule hmmscan_profile:
input:
fasta="test-protein.fa",
profile="test-profile.hmm.h3f",
output:
# only one of these is required
tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
pfamtblout="test-prot-pfamtbl.txt", # save table of hits and domains to file, in Pfam format <f>
outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
log:
"logs/hmmscan.log"
params:
evalue_threshold=0.00001,
# if bitscore threshold provided, hmmscan will use that instead
#score_threshold=50,
extra="",
threads: 4
wrapper:
"v2.2.1/bio/hmmer/hmmscan"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hmmer=3.3.2
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmscan"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '
# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
out_cmd += " --domtblout {} ".format(domtblout)
# save table of hits and domains to file, in Pfam format <f>
pfamtblout = snakemake.output.get("pfamtblout", "")
if pfamtblout:
out_cmd += " --pfamtblout {} ".format(pfamtblout)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = " -T {} ".format(float(score_threshold))
else:
thresh_cmd = " -E {} ".format(float(evalue_threshold))
# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"hmmscan {out_cmd} {thresh_cmd} --cpu {snakemake.threads}"
" {extra} {profile} {snakemake.input.fasta} {log}"
)
HMMSEARCH¶
search profile(s) against a sequence database
Example¶
This wrapper can be used in the following way:
rule hmmsearch_profile:
input:
fasta="test-protein.fa",
profile="test-profile.hmm.h3f",
output:
# only one of these is required
tblout="test-prot-tbl.txt", # save parseable table of per-sequence hits to file <f>
domtblout="test-prot-domtbl.txt", # save parseable table of per-domain hits to file <f>
alignment_hits="test-prot-alignment-hits.txt", # Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
outfile="test-prot-out.txt", # Direct the main human-readable output to a file <f> instead of the default stdout.
log:
"logs/hmmsearch.log"
params:
evalue_threshold=0.00001,
# if bitscore threshold provided, hmmsearch will use that instead
#score_threshold=50,
extra="",
threads: 4
wrapper:
"v2.2.1/bio/hmmer/hmmsearch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
hmmer=3.3.2
Input/Output¶
Input:
- hmm profile(s)
- sequence database
Output:
- matches between sequences and hmm profiles
Authors¶
- N Tessa Pierce
Code¶
"""Snakemake wrapper for hmmsearch"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".h3", 1)[0]
assert profile.endswith(".hmm"), 'your profile file should end with ".hmm" '
# Direct the main human-readable output to a file <f> instead of the default stdout.
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of per-sequence hits to file <f>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
# save parseable table of per-domain hits to file <f>
domtblout = snakemake.output.get("domtblout", "")
if domtblout:
out_cmd += " --domtblout {} ".format(domtblout)
# Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file <f>
alignment_hits = snakemake.output.get("alignment_hits", "")
if alignment_hits:
out_cmd += " -A {} ".format(alignment_hits)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report models >= this score threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 0.00001)
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = " -T {} ".format(float(score_threshold))
else:
thresh_cmd = " -E {} ".format(float(evalue_threshold))
# all other params should be entered in "extra" param
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
" hmmsearch --cpu {snakemake.threads} "
" {out_cmd} {thresh_cmd} {extra} {profile} "
" {snakemake.input.fasta} {log}"
)
HOMER¶
For homer, the following wrappers are available:
HOMER ANNOTATEPEAKS¶
Performing peak annotation to associate peaks with nearby genes. For more information, please see the documentation.
Example¶
This wrapper can be used in the following way:
rule homer_annotatepeaks:
input:
peaks="peaks_refs/{sample}.peaks",
genome="peaks_refs/gene.fasta",
# optional input files
# gtf="", # implicitly sets the -gtf flag
# gene="", # implicitly sets the -gene flag for gene data file to add gene expression or other data types
motif_files="peaks_refs/motives.txt", # implicitly sets the -m flag
# filter_motiv="", # implicitly sets the -fm flag
# center="", # implicitly sets the -center flag
nearest_peak="peaks_refs/b.peaks", # implicitly sets the -p flag
# tag="", # implicitly sets the -d flag for tagDirectories
# vcf="", # implicitly sets the -vcf flag
# bed_graph="", # implicitly sets the -bedGraph flag
# wig="", # implicitly sets the -wig flag
# map="", # implicitly sets the -map flag
# cmp_genome="", # implicitly sets the -cmpGenome flag
# cmp_Liftover="", # implicitly sets the -cmpLiftover flag
# advanced_annotation="" # optional, implicitly sets the -ann flag, see http://homer.ucsd.edu/homer/ngs/advancedAnnotation.html
output:
annotations="{sample}_annot.txt",
# optional output, implicitly sets the -matrix flag, requires motif_files as input
matrix=multiext("{sample}",
".count.matrix.txt",
".ratio.matrix.txt",
".logPvalue.matrix.txt",
".stats.txt"
),
# optional output, implicitly sets the -mfasta flag, requires motif_files as input
mfasta="{sample}_motif.fasta",
# # optional output, implicitly sets the -mbed flag, requires motif_files as input
mbed="{sample}_motif.bed",
# # optional output, implicitly sets the -mlogic flag, requires motif_files as input
mlogic="{sample}_motif.logic"
threads:
2
params:
mode="", # add tss, tts or rna mode and options here, i.e. "tss mm8"
extra="-gid" # optional params, see http://homer.ucsd.edu/homer/ngs/annotation.html
log:
"logs/annotatePeaks/{sample}.log"
wrapper:
"v2.2.1/bio/homer/annotatePeaks"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
homer=4.11
Input/Output¶
Input:
- peak or BED file
- various optional input files, i.e. gtf, bedGraph, wiggle
Output:
- annotation file (.txt)
- various optional output files
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
genome = snakemake.input.get("genome", "")
extra = snakemake.params.get("extra", "")
motif_files = snakemake.input.get("motif_files", "")
matrix = snakemake.output.get("matrix", "")
if genome == "":
genome = "none"
# optional files
opt_files = {
"gtf": "-gtf",
"gene": "-gene",
"motif_files": "-m",
"filter_motiv": "-fm",
"center": "-center",
"nearest_peak": "-p",
"tag": "-d",
"vcf": "-vcf",
"bed_graph": "-bedGraph",
"wig": "-wig",
"map": "-map",
"cmp_genome": "-cmpGenome",
"cmp_Liftover": "-cmpLiftover",
"advanced_annotation": "-ann",
"mfasta": "-mfasta",
"mbed": "-mbed",
"mlogic": "-mlogic",
}
requires_motives = False
for i in opt_files:
file = None
if i == "mfasta" or i == "mbed" or i == "mlogic":
file = snakemake.output.get(i, "")
if file:
requires_motives = True
else:
file = snakemake.input.get(i, "")
if file:
extra += " {flag} {file}".format(flag=opt_files[i], file=file)
if requires_motives and motif_files == "":
sys.exit(
"The optional output files require motif_file(s) as input. For more information please see http://homer.ucsd.edu/homer/ngs/annotation.html."
)
# optional matrix output files:
if matrix:
if motif_files == "":
sys.exit(
"The matrix output files require motif_file(s) as input. For more information please see http://homer.ucsd.edu/homer/ngs/annotation.html."
)
ext = ".count.matrix.txt"
matrix_out = [i for i in snakemake.output if i.endswith(ext)][0]
matrix_name = os.path.basename(matrix_out[: -len(ext)])
extra += " -matrix {}".format(matrix_name)
shell(
"(annotatePeaks.pl"
" {snakemake.params.mode}"
" {snakemake.input.peaks}"
" {genome}"
" {extra}"
" -cpu {snakemake.threads}"
" > {snakemake.output.annotations})"
" {log}"
)
HOMER FINDPEAKS¶
Find ChIP- or ATAC-Seq peaks with the HOMER suite. For more information, please see the documentation.
Example¶
This wrapper can be used in the following way:
rule homer_findPeaks:
input:
# tagDirectory of sample
tag="tagDir/{sample}",
# tagDirectory of control background sample - optional
control="tagDir/control"
output:
"{sample}_peaks.txt"
params:
# one of 7 basic modes of operation, see homer manual
style="histone",
extra="" # optional params, see homer manual
log:
"logs/findPeaks/{sample}.log"
wrapper:
"v2.2.1/bio/homer/findPeaks"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
homer=4.11
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
control = snakemake.input.get("control", "")
if control == "":
control_command = ""
else:
control_command = "-i " + control
shell(
"(findPeaks"
" {snakemake.input.tag}"
" -style {snakemake.params.style}"
" {extra}"
" {control_command}"
" -o {snakemake.output})"
" {log}"
)
HOMER GETDIFFERENTIALPEAKS¶
Detect differentially bound ChIP peaks between samples. For more information, please see the documentation.
Example¶
This wrapper can be used in the following way:
rule homer_getDifferentialPeaks:
input:
# peak/bed file to be tested
peaks="{sample}.peaks.bed",
# tagDirectory of first sample
first="tagDir/{sample}",
# tagDirectory of sample to compare
second="tagDir/second"
output:
"{sample}_diffPeaks.txt"
params:
extra="" # optional params, see homer manual
log:
"logs/diffPeaks/{sample}.log"
wrapper:
"v2.2.1/bio/homer/getDifferentialPeaks"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
homer=4.11
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(getDifferentialPeaks"
" {snakemake.input.peaks}"
" {snakemake.input.first}"
" {snakemake.input.second}"
" {extra}"
" > {snakemake.output})"
" {log}"
)
HOMER MAKETAGDIRECTORY¶
Create a tag directory with the HOMER suite. For more information, please see the documentation.
Example¶
This wrapper can be used in the following way:
rule homer_makeTagDir:
input:
# input bam, can be one or a list of files
bam="{sample}.bam",
output:
directory("tagDir/{sample}")
params:
extra="" # optional params, see homer manual
log:
"logs/makeTagDir/{sample}.log"
wrapper:
"v2.2.1/bio/homer/makeTagDirectory"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
homer=4.11
samtools=1.17
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(makeTagDirectory" " {snakemake.output}" " {extra}" " {snakemake.input})" " {log}"
)
HOMER MERGEPEAKS¶
Merge ChIP-Seq peaks from multiple peak files. For more information, please see the documentation. Please be aware that this wrapper does not yet support use of the -prefix
parameter.
Example¶
This wrapper can be used in the following way:
rule homer_mergePeaks:
input:
# input peak files
"peaks/{sample1}.peaks",
"peaks/{sample2}.peaks"
output:
"merged/{sample1}_{sample2}.peaks"
params:
extra="-d given" # optional params, see homer manual
log:
"logs/mergePeaks/{sample1}_{sample2}.log"
wrapper:
"v2.2.1/bio/homer/mergePeaks"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
homer=4.11
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
from snakemake.shell import shell
import os.path as path
import sys
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
class PrefixNotSupportedError(Exception):
pass
if "-prefix" in extra:
raise PrefixNotSupportedError(
"The use of the -prefix parameter is not yet supported in this wrapper"
)
shell("(mergePeaks" " {snakemake.input}" " {extra}" " > {snakemake.output})" " {log}")
IGV-REPORTS¶
Create self-contained igv.js HTML pages.
Example¶
This wrapper can be used in the following way:
rule igv_report:
input:
fasta="minigenome.fa",
vcf="variants.vcf",
# any number of additional optional tracks, see igv-reports manual
tracks=["alignments.bam"]
output:
"igv-report.html"
params:
extra="" # optional params, see igv-reports manual
log:
"logs/igv-report.log"
wrapper:
"v2.2.1/bio/igv-reports"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
igv-reports=1.7.0
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for igv-reports."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
tracks = snakemake.input.get("tracks", [])
if tracks:
if isinstance(tracks, str):
tracks = [tracks]
tracks = "--tracks {}".format(" ".join(tracks))
shell(
"create_report {extra} --standalone --output {snakemake.output[0]} {snakemake.input.vcf} {snakemake.input.fasta} {tracks} {log}"
)
INFERNAL¶
For infernal, the following wrappers are available:
INFERNAL CMPRESS¶
Starting from a CM database <cmfile> in standard Infernal-1.1 format, construct binary compressed datafiles for cmscan. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
Example¶
This wrapper can be used in the following way:
rule infernal_cmpress:
input:
"test-covariance-model.cm"
output:
"test-covariance-model.cm.i1i",
"test-covariance-model.cm.i1f",
"test-covariance-model.cm.i1m",
"test-covariance-model.cm.i1p"
log:
"logs/cmpress.log"
params:
extra="",
wrapper:
"v2.2.1/bio/infernal/cmpress"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
infernal=1.1.4
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Infernal CMpress"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# -F enables overwrite of old (otherwise cmpress will fail if old versions exist)
shell("cmpress -F {snakemake.input} {log}")
INFERNAL CMSCAN¶
cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
Example¶
This wrapper can be used in the following way:
rule cmscan_profile:
input:
fasta="test-transcript.fa",
profile="test-covariance-model.cm.i1i"
output:
tblout="tr-infernal-tblout.txt",
log:
"logs/cmscan.log"
params:
evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
extra= "",
#score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
threads: 4
wrapper:
"v2.2.1/bio/infernal/cmscan"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
infernal=1.1.4
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Infernal CMscan"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]
assert profile.endswith(".cm"), 'your profile file should end with ".cm"'
# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
out_cmd += " -o {} ".format(outfile)
# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
out_cmd += " --tblout {} ".format(tblout)
## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)
# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10) # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")
if score_threshold:
thresh_cmd = f" -T {float(score_threshold)} "
else:
thresh_cmd = f" -E {float(evalue_threshold)} "
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)
JANNOVAR¶
Annotate predicted effect of nucleotide changes with `Jannovar<https://doc-openbio.readthedocs.io/projects/jannovar/en/master/>`_
Example¶
This wrapper can be used in the following way:
rule jannovar:
input:
vcf="{sample}.vcf",
pedigree="pedigree_ar.ped" # optional, contains familial relationships
output:
"jannovar/{sample}.vcf.gz"
log:
"logs/jannovar/{sample}.log"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mg}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb = 1024,
params:
database="hg19_small.ser", # path to jannovar reference dataset
extra="--show-all" # optional parameters
wrapper:
"v2.2.1/bio/jannovar"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
jannovar-cli=0.36
snakemake-wrapper-utils=0.5.2
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
shell.executable("bash")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
pedigree = snakemake.input.get("pedigree", "")
if pedigree:
pedigree = '--pedigree-file "%s"' % pedigree
shell(
"jannovar annotate-vcf --database {snakemake.params.database}"
" --input-vcf {snakemake.input.vcf} --output-vcf {snakemake.output}"
" {pedigree} {extra} {java_opts} {log}"
)
JELLYFISH¶
For jellyfish, the following wrappers are available:
JELLYFISH_COUNT¶
Count k-mers in a fastn file using jellyfish.
URL: https://github.com/gmarcais/Jellyfish
Example¶
This wrapper can be used in the following way:
rule jellyfish_count:
input:
"{prefix}.fasta",
output:
"{prefix}.jf",
log:
"{prefix}.jf.log",
params:
kmer_length=21,
size="1G",
extra="--canonical",
threads: 2
wrapper:
"v2.2.1/bio/jellyfish/count"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kmer-jellyfish=2.3.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"""
(jellyfish count \
{extra} \
--mer-len={snakemake.params.kmer_length} \
--size={snakemake.params.size} \
--threads={snakemake.threads} \
--output={snakemake.output} \
{snakemake.input}) {log}
"""
)
JELLYFISH_DUMP¶
Dump kmers from jellyfish database
URL: https://github.com/gmarcais/Jellyfish
Example¶
This wrapper can be used in the following way:
rule jellyfish_dump:
input:
"{prefix}.jf",
output:
"{prefix}.dump",
log:
"{prefix}.log",
params:
extra="-c -t",
wrapper:
"v2.2.1/bio/jellyfish/dump"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kmer-jellyfish=2.3.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("(jellyfish dump {extra} -o {snakemake.output} {snakemake.input}) {log}")
JELLYFISH_HISTO¶
Export histogram of kmer counts.
URL: https://github.com/gmarcais/Jellyfish
Example¶
This wrapper can be used in the following way:
rule jellyfish_histo:
input:
"{prefix}.jf",
output:
"{prefix}.histo",
log:
"{prefix}.log",
threads: 2
wrapper:
"v2.2.1/bio/jellyfish/histo"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kmer-jellyfish=2.3.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"""
(jellyfish histo \
{extra} \
--threads={snakemake.threads} \
{snakemake.input} > {snakemake.output}) {log}
"""
)
JELLYFISH_MERGE¶
Merge jellyfish databases.
URL: https://github.com/gmarcais/Jellyfish
Example¶
This wrapper can be used in the following way:
rule jellyfish_merge:
input:
"a.jf",
"b.jf",
output:
"ab.jf",
log:
"ab.jf.log",
wrapper:
"v2.2.1/bio/jellyfish/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kmer-jellyfish=2.3.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("(jellyfish merge {extra} -o {snakemake.output} {snakemake.input}) {log}")
KALLISTO¶
For kallisto, the following wrappers are available:
KALLISTO INDEX¶
Index a transcriptome using kallisto.
Example¶
This wrapper can be used in the following way:
rule kallisto_index:
input:
fasta="{transcriptome}.fasta",
output:
index="{transcriptome}.idx",
params:
extra="", # optional parameters
log:
"logs/kallisto_index_{transcriptome}.log",
threads: 1
wrapper:
"v2.2.1/bio/kallisto/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kallisto=0.48.0
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for Kallisto index"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTA files
fasta = snakemake.input.get("fasta")
assert fasta is not None, "input-> a FASTA-file is required"
fasta = " ".join(fasta) if isinstance(fasta, list) else fasta
shell(
"kallisto index " # Tool
"{extra} " # Optional parameters
"--index={snakemake.output.index} " # Output file
"{fasta} " # Input FASTA files
"{log}" # Logging
)
KALLISTO QUANT¶
Pseudoalign reads and quantify transcripts using kallisto.
Example¶
This wrapper can be used in the following way:
rule kallisto_quant:
input:
fastq=["reads/{exp}_R1.fastq", "reads/{exp}_R2.fastq"],
index="index/transcriptome.idx",
output:
directory("quant_results_{exp}"),
params:
extra="",
log:
"logs/kallisto_quant_{exp}.log",
threads: 1
wrapper:
"v2.2.1/bio/kallisto/quant"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
kallisto=0.48.0
Authors¶
- Joël Simoneau
Code¶
"""Snakemake wrapper for Kallisto quant"""
__author__ = "Joël Simoneau"
__copyright__ = "Copyright 2019, Joël Simoneau"
__email__ = "simoneaujoel@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Allowing for multiple FASTQ files
fastq = snakemake.input.get("fastq")
assert fastq is not None, "input-> a FASTQ-file is required"
fastq = " ".join(fastq) if isinstance(fastq, list) else fastq
shell(
"kallisto quant " # Tool
"{extra} " # Optional parameters
"--threads={snakemake.threads} " # Number of threads
"--index={snakemake.input.index} " # Input file
"--output-dir={snakemake.output} " # Output directory
"{fastq} " # Input FASTQ files
"{log}" # Logging
)
LAST¶
For last, the following wrappers are available:
LASTAL¶
LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)
Example¶
This wrapper can be used in the following way:
rule lastal_nucl_x_nucl:
input:
data="test-transcript.fa",
lastdb="test-transcript.fa.prj"
output:
# only one of these outputs is allowed
maf="test-transcript.maf",
#tab="test-transcript.tab",
#blasttab="test-transcript.blasttab",
#blasttabplus="test-transcript.blasttabplus",
params:
#Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
D_length=1000000,
extra=""
log:
"logs/lastal/test.log"
threads: 8
wrapper:
"v2.2.1/bio/last/lastal"
rule lastal_nucl_x_prot:
input:
data="test-transcript.fa",
lastdb="test-protein.fa.prj"
output:
# only one of these outputs is allowed
maf="test-tr-x-prot.maf"
#tab="test-tr-x-prot.tab",
#blasttab="test-tr-x-prot.blasttab",
#blasttabplus="test-tr-x-prot.blasttabplus",
params:
frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
extra="",
log:
"logs/lastal/test.log"
threads: 8
wrapper:
"v2.2.1/bio/last/lastal"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
last=1453
Authors¶
- Tessa Pierce
Code¶
""" Snakemake wrapper for lastal """
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000)) # last default
# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"
out_cmd = ""
if maf_out:
out_cmd = "-f {}".format("MAF")
outF = maf_out
elif tab_out:
out_cmd = "-f {}".format("TAB")
outF = tab_out
if btab_out:
out_cmd = "-f {}".format("BlastTab")
outF = btab_out
if btabplus_out:
out_cmd = "-f {}".format("BlastTab+")
outF = btabplus_out
frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
f_cmd = f"-F {frameshift_cost}"
lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]
shell(
"lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)
LASTDB¶
LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)
Example¶
This wrapper can be used in the following way:
rule lastdb_transcript:
input:
"test-transcript.fa"
output:
"test-transcript.fa.prj",
params:
protein_input=False,
extra=""
log:
"logs/lastdb/test-transcript.log"
wrapper:
"v2.2.1/bio/last/lastdb"
rule lastdb_protein:
input:
"test-protein.fa"
output:
"test-protein.fa.prj",
params:
protein_input=True,
extra=""
log:
"logs/lastdb/test-protein.log"
wrapper:
"v2.2.1/bio/last/lastdb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
last=1454
Authors¶
- Tessa Pierce
Code¶
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
protein_cmd = ""
protein = snakemake.params.get("protein_input", False)
if protein:
protein_cmd = " -p "
shell("lastdb {extra} {protein_cmd} -P {snakemake.threads} {snakemake.input} {log}")
LIFTOFF¶
Lift features from one genome assembly to another
URL: https://github.com/agshumate/Liftoff
Example¶
This wrapper can be used in the following way:
rule liftoff:
input:
ref="{ref}.fasta.gz",
tgt="{tgt}.fasta.gz",
ann="{ann}.gff.gz",
output:
main="{ref}_{ann}_{tgt}.gff3",
unmapped="{ref}_{ann}_{tgt}.unmapped.txt",
log:
"logs/liftoff_{ref}_{ann}_{tgt}.log",
params:
extra="",
threads: 1
wrapper:
"v2.2.1/bio/liftoff"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
liftoff=1.6.3
Input/Output¶
Input:
- A fasta formatted reference genome file
- A fasta formatted target genome file
- A GFF/GTF formatted annotations file
Output:
- A GFF formatted file containing the mapped annotations
- A GFF formatted file containing the unmapped annotations
Authors¶
- Tomás Di Domenico
Code¶
"""Snakemake wrapper for liftoff"""
__author__ = "Tomás Di Domenico"
__copyright__ = "Copyright 2021, Tomás Di Domenico"
__email__ = "tdido@tdido.ar"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"liftoff " # tool
"-g {snakemake.input.ann} " # annotation file to lift over in GFF or GTF format
"-o {snakemake.output.main} " # main output
"-u {snakemake.output.unmapped} " # unmapped output
"{extra} " # optional parameters
"{snakemake.input.tgt} " # target fasta genome to lift genes to
"{snakemake.input.ref} " # reference fasta genome to lift genes from
"{log}" # Logging
)
LOFREQ¶
For lofreq, the following wrappers are available:
LOFREQ CALL¶
simply call variants
Example¶
This wrapper can be used in the following way:
rule lofreq:
input:
bam="data/{sample}.bam",
bai="data/{sample}.bai"
output:
"calls/{sample}.vcf"
log:
"logs/lofreq_call/{sample}.log"
params:
ref="data/genome.fasta",
extra=""
threads: 8
wrapper:
"v2.2.1/bio/lofreq/call"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools=1.17
lofreq=2.1.5
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
ref = snakemake.params.get("ref", None)
if ref is None:
raise ValueError("A reference must be provided")
bam_input = snakemake.input.bam
bai_input = snakemake.input.bai
if bam_input is None:
raise ValueError("Missing bam input file!")
if bai_input is None:
raise ValueError("Missing bai input file!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
raise ValueError("Only expecting one output file: " + str(output_file) + "!")
shell(
"lofreq call-parallel "
" --pp-threads {snakemake.threads}"
" -f {ref}"
" {bam_input}"
" -o {output_file}"
" {extra}"
" {log}"
)
LOFREQ INDELQUAL¶
Insert indel qualities into BAM file (required for indel predictions)
URL: https://csb5.github.io/lofreq/
Example¶
This wrapper can be used in the following way:
rule lofreq_uniform_indelqual:
input:
bam="data/{sample}.bam",
output:
"out/indelqual/{sample}.uindel.bam"
log:
"logs/{sample}.uindel.log"
params:
extra="-u 15"
threads: 8
wrapper:
"v2.2.1/bio/lofreq/indelqual"
rule lofreq_dindel_indelqual:
input:
bam="data/{sample}.bam",
ref="data/hg38_chr21.fa"
output:
"out/indelqual/{sample}.dindel.bam"
log:
"logs/{sample}.dindel.log"
params:
extra="--dindel"
threads: 8
wrapper:
"v2.2.1/bio/lofreq/indelqual"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools=1.17
lofreq=2.1.5
Input/Output¶
Input:
- bam file that will have variants called
- reference genome of bam (optional unless –dindel specified)
Output:
- bam file with indel qualities added
Authors¶
- Tobin Groth
Code¶
__author__ = "Tobin Groth"
__copyright__ = "Copyright 2023, Tobin Groth"
__email__ = "tobingroth1@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
if len(snakemake.output) != 1:
raise ValueError("Expecting only one output file!")
ref = snakemake.input.get("ref", "")
if "--dindel" in extra and not ref:
raise ValueError("Reference required if --dindel option specified")
if ref:
ref = f"--ref {ref}"
shell(
"lofreq indelqual "
" {snakemake.input.bam}"
" {ref}"
" -o {snakemake.output[0]}"
" {extra}"
" {log}"
)
MACS2¶
For macs2, the following wrappers are available:
MACS2 CALLPEAK¶
MACS2 callpeak
model-based analysis tool for ChIP-sequencing that calls peaks from alignment results. For usage information about MACS2 callpeak
, please see the documentation and the command line help. For more information about MACS2
, also see the source code and published article. Depending on the selected extension(s), the option(s) will be set automatically (please see table below). Please note that there are extensions, that are incompatible with each other, because they require the –broad option either to be enabled or disabled.
Extension for the output files Description Format Option NAME_peaks.xls a table with information about called
peaks
excel NAME_control_lambda.bdg local biases estimated for each genomic
location from the control sample
bedGraph –bdg or -B NAME_treat_pileup.bdg pileup signals from treatment sample bedGraph –bdg or -B NAME_peaks.broadPeak similar to _peaks.narrowPeak file,
except for missing the annotating peak
summits
BED 6+3 –broad NAME_peaks.gappedPeak contains the broad region and narrow
peaks
BED 12+3 –broad NAME_peaks.narrowPeak contains the peak locations, peak
summit, p-value and q-value
BED 6+4 if not set –broad NAME_summits.bed peak summits locations for every peak BED if not set –broad
Example¶
This wrapper can be used in the following way:
rule callpeak:
input:
treatment="samples/a.bam", # required: treatment sample(s)
control="samples/b.bam" # optional: control sample(s)
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("callpeak/basename",
"_peaks.xls", ### required
### optional output files
"_peaks.narrowPeak",
"_summits.bed"
)
log:
"logs/macs2/callpeak.log"
params:
"-f BAM -g hs --nomodel"
wrapper:
"v2.2.1/bio/macs2/callpeak"
rule callpeak_options:
input:
treatment="samples/a.bam", # required: treatment sample(s)
control="samples/b.bam" # optional: control sample(s)
output:
# all output-files must share the same basename and only differ by it's extension
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.
multiext("callpeak_options/basename",
"_peaks.xls", ### required
### optional output files
# these output extensions internally set the --bdg or -B option:
"_treat_pileup.bdg",
"_control_lambda.bdg",
# these output extensions internally set the --broad option:
"_peaks.broadPeak",
"_peaks.gappedPeak"
)
log:
"logs/macs2/callpeak.log"
params:
"-f BAM -g hs --broad-cutoff 0.1 --nomodel"
wrapper:
"v2.2.1/bio/macs2/callpeak"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
macs2=2.2.7.1
Input/Output¶
Input:
- SAM, BAM, BED, ELAND, ELANDMULTI, ELANDEXPORT, BOWTIE, BAMPE or BEDPE files
Output:
- tabular file in excel format (.xls) AND
- different optional metrics in bedGraph or BED formats
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
import sys
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
in_contr = snakemake.input.get("control")
params = "{}".format(snakemake.params)
opt_input = ""
out_dir = ""
ext = "_peaks.xls"
out_file = [o for o in snakemake.output if o.endswith(ext)][0]
out_name = os.path.basename(out_file[: -len(ext)])
out_dir = os.path.dirname(out_file)
if in_contr:
opt_input = "-c {contr}".format(contr=in_contr)
if out_dir:
out_dir = "--outdir {dir}".format(dir=out_dir)
if any(out.endswith(("_peaks.narrowPeak", "_summits.bed")) for out in snakemake.output):
if any(
out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak"))
for out in snakemake.output
):
sys.exit(
"Output files with _peaks.narrowPeak and/or _summits.bed extensions cannot be created together with _peaks.broadPeak and/or _peaks.gappedPeak extended output files.\n"
"For usable extensions please see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/macs2/callpeak.html.\n"
)
else:
if " --broad" in params:
sys.exit(
"If --broad option in params is given, the _peaks.narrowPeak and _summits.bed files will not be created. \n"
"Remove --broad option from params if these files are needed.\n"
)
if any(
out.endswith(("_peaks.broadPeak", "_peaks.gappedPeak")) for out in snakemake.output
):
if "--broad " not in params and not params.endswith("--broad"):
params += " --broad "
if any(
out.endswith(("_treat_pileup.bdg", "_control_lambda.bdg"))
for out in snakemake.output
):
if all(p not in params for p in ["--bdg", "-B"]):
params += " --bdg "
else:
if any(p in params for p in ["--bdg", "-B"]):
sys.exit(
"If --bdg or -B option in params is given, the _control_lambda.bdg and _treat_pileup.bdg extended files must be specified in output. \n"
)
shell(
"(macs2 callpeak "
"-t {snakemake.input.treatment} "
"{opt_input} "
"{out_dir} "
"-n {out_name} "
"{params}) {log}"
)
MANTA¶
Call structural variants with manta.
Example¶
This wrapper can be used in the following way:
rule manta:
input:
ref="human_g1k_v37_decoy.small.fasta",
samples=["mapped/a.bam"],
index=["mapped/a.bam.bai"],
bed="test.bed.gz", # optional
output:
vcf="results/out.bcf",
idx="results/out.bcf.csi",
cand_indel_vcf="results/small_indels.vcf.gz",
cand_indel_idx="results/small_indels.vcf.gz.tbi",
cand_sv_vcf="results/cand_sv.vcf.gz",
cand_sv_idx="results/cand_sv.vcf.gz.tbi",
params:
extra_cfg="", # optional
extra_run="", # optional
log:
"logs/manta.log",
threads: 2
resources:
mem_mb=4096,
wrapper:
"v2.2.1/bio/manta"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra_cfg param allows for additional program arguments to configManta.py.
- The extra_run param allows for additional program arguments to runWorkflow.py.
- The runDir is created using pythons tempfile, meaning that all intermediate files are deleted on job completion
- For more information see, https://github.com/Illumina/manta
Software dependencies¶
manta=1.6.0
bcftools=1.17
Input/Output¶
Input:
- BAM/CRAM file(s)
- reference genome
- BED file (optional)
Output:
- SVs and indels scored and genotyped under a diploid model (diploidSV.vcf.gz).
- Unfiltered SV and indel candidates (candidateSV.vcf.gz).
- Subset of the previous file containing only simple insertion and deletion variants less than the minimum scored variant size (candidateSmallIndels.vcf.gz).
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import math
from snakemake.shell import shell
from pathlib import Path
from tempfile import TemporaryDirectory
extra_cfg = snakemake.params.get("extra_cfg", "")
extra_run = snakemake.params.get("extra_run", "")
bed = snakemake.input.get("bed", "")
if bed:
bed = f"--callRegions {bed}"
mem_gb = snakemake.resources.get("mem_gb", "")
if not mem_gb:
# 20 Gb of mem by default
mem_gb = math.ceil(snakemake.resources.get("mem_mb", 20480) / 1024)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
with TemporaryDirectory() as tempdir:
tempdir = Path(tempdir)
run_dir = tempdir / "runDir"
bams = []
# Symlink BAM/CRAM files to avoid problems with filenames
for aln, idx in zip(snakemake.input.samples, snakemake.input.index):
aln = Path(aln)
idx = Path(idx)
(tempdir / aln.name).symlink_to(aln.resolve())
bams.append(tempdir / aln.name)
if idx.name.endswith(".bam.bai") or idx.name.endswith(".cram.crai"):
(tempdir / idx.name).symlink_to(idx.resolve())
if idx.name.endswith(".bai"):
(tempdir / idx.name).with_suffix(".bam.bai").symlink_to(idx.resolve())
elif idx.name.endswith(".crai"):
(tempdir / idx.name).with_suffix(".cram.crai").symlink_to(idx.resolve())
else:
raise ValueError(f"invalid index file name provided: {idx}")
bams = list(map("--normalBam {}".format, bams))
shell(
# Configure Manta
"configManta.py {extra_cfg} {bams} --referenceFasta {snakemake.input.ref} {bed} --runDir {run_dir} {log}; "
# Run Manta
"python2 {run_dir}/runWorkflow.py {extra_run} --jobs {snakemake.threads} --memGb {mem_gb} {log}; "
)
# Copy outputs into proper position.
def infer_vcf_ext(vcf):
if vcf.endswith(".vcf.gz"):
return "z"
elif vcf.endswith(".bcf"):
return "b"
else:
raise ValueError(
"invalid VCF extension. Only '.vcf.gz' and '.bcf' are supported."
)
def copy_vcf(origin_vcf, dest_vcf, dest_idx):
if dest_vcf and dest_vcf != origin_vcf:
dest_vcf_format = infer_vcf_ext(dest_vcf)
shell(
"bcftools view --threads {snakemake.threads} --output {dest_vcf:q} --output-type {dest_vcf_format} {origin_vcf:q} {log}"
)
origin_idx = str(origin_vcf) + ".tbi"
if dest_idx and dest_idx != origin_idx:
shell(
"bcftools index --threads {snakemake.threads} --output {dest_idx:q} {dest_vcf:q} {log}"
)
results_base = run_dir / "results" / "variants"
# Copy main VCF output
vcf_temp = results_base / "diploidSV.vcf.gz"
vcf_final = snakemake.output.get("vcf")
idx_final = snakemake.output.get("idx")
copy_vcf(vcf_temp, vcf_final, idx_final)
# Copy candidate small indels VCF
cand_indel_vcf_temp = results_base / "candidateSmallIndels.vcf.gz"
cand_indel_vcf_final = snakemake.output.get("cand_indel_vcf")
cand_indel_idx_final = snakemake.output.get("cand_indel_idx")
copy_vcf(cand_indel_vcf_temp, cand_indel_vcf_final, cand_indel_idx_final)
# Copy candidates structural variants VCF
cand_sv_vcf_temp = results_base / "candidateSV.vcf.gz"
cand_sv_vcf_final = snakemake.output.get("cand_sv_vcf")
cand_sv_idx_final = snakemake.output.get("cand_sv_idx")
copy_vcf(cand_sv_vcf_temp, cand_sv_vcf_final, cand_sv_idx_final)
MAPDAMAGE2¶
mapDamage2 is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
Example¶
This wrapper can be used in the following way:
rule mapdamage2:
input:
ref="genome.fasta",
bam="mapped/{sample}.bam",
output:
log="results/{sample}/Runtime_log.txt", # output folder is infered from this file, so it needs to be the same folder for all output files
GtoA3p="results/{sample}/3pGtoA_freq.txt",
CtoT5p="results/{sample}/5pCtoT_freq.txt",
dnacomp="results/{sample}/dnacomp.txt",
frag_misincorp="results/{sample}/Fragmisincorporation_plot.pdf",
len="results/{sample}/Length_plot.pdf",
lg_dist="results/{sample}/lgdistribution.txt",
misincorp="results/{sample}/misincorporation.txt",
# rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file
params:
extra="--no-stats", # optional parameters for mapdamage2 (except -i, -r, -d, --rescale)
log:
"logs/{sample}/mapdamage2.log",
threads: 1 # MapDamage2 is not threaded
wrapper:
"v2.2.1/bio/mapdamage2"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, https://ginolhac.github.io/mapDamage/
Software dependencies¶
mapdamage2=2.2.1
python=3.10.8
pysam=0.20.0
Input/Output¶
Input:
- reference genome
- SAM/BAM/CRAM alignemnt
Output:
Runtime_log.txt
: log file with a summary of command lines used and timestamps.- Fragmisincorporation_plot.pdf, a pdf file that displays both fragmentation and misincorporation patterns.
- Length_plot.pdf, a pdf file that displays length distribution of singleton reads per strand and cumulative frequencies of C->T at 5’-end and G->A at 3’-end are also displayed per strand.
- misincorporation.txt, contains a table with occurrences for each type of mutations and relative positions from the reads ends.
- 5pCtoT_freq.txt, contains frequencies of Cytosine to Thymine mutations per position from the 5’-ends.
- 3pGtoA_freq.txt, contains frequencies of Guanine to Adenine mutations per position from the 3’-ends.
- dnacomp.txt, contains a table of the reference genome base composition per position, inside reads and adjacent regions.
- lgdistribution.txt, contains a table with read length distributions per strand.
- Stats_out_MCMC_hist.pdf, MCMC histogram for the damage parameters and log likelihood.
- Stats_out_MCMC_iter.csv, values for the damage parameters and log likelihood in each MCMC iteration.
- Stats_out_MCMC_trace.pdf, a MCMC trace plot for the damage parameters and log likelihood.
- Stats_out_MCMC_iter_summ_stat.csv, summary statistics for the damage parameters estimated posterior distributions.
- Stats_out_post_pred.pdf, empirical misincorporation frequency and posterior predictive intervals from the fitted model.
- Stats_out_MCMC_correct_prob.csv, position specific probability of a C->T and G->A misincorporation is due to damage.
- dnacomp_genome.txt, contains the global reference genome base composition (computed by seqtk).
- Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores.
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
import os.path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
in_bam = snakemake.input.get("bam", "")
if in_bam:
in_bam = "--input " + in_bam
output_folder = os.path.dirname(snakemake.output.get("log", ""))
if not output_folder:
raise ValueError("mapDamage2 rule needs output 'log'.")
rescaled_bam = snakemake.output.get("rescaled_bam", "")
if rescaled_bam:
rescaled_bam = "--rescale-out " + rescaled_bam
shell(
"mapDamage "
"{in_bam} "
"--reference {snakemake.input.ref} "
"--folder {output_folder} "
"{rescaled_bam} "
"{extra} "
"{log}"
)
MASHMAP¶
Compute local alignment boundaries between long DNA sequences with MashMap
URL: https://github.com/marbl/MashMap
Example¶
This wrapper can be used in the following way:
rule test_mashmap:
input:
ref="reference.fasta.gz", # This can be a txt file with a path to a fasta-file per line
query="read.fasta.gz",
output:
"mashmap.out",
threads: 2
params:
extra="-s 1000 --pi 99",
log:
"logs/mashmap.log",
wrapper:
"v2.2.1/bio/mashmap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- input.ref may be either a path to a fasta file or a text file containing a list of paths to several fasta files.
- input.query may be either a path to a fastq file or a text file containing a list of paths to several fastq files.
Software dependencies¶
mashmap=3.0.1
gsl=2.7
gzip=1.12
Input/Output¶
Input:
ref
: Path to reference filequery
: Path to query file (fasta, fastq)
Output:
- Path to the alignment file
Params¶
extra
: Optional parameters for MashMap
Authors¶
- Thibault Dayris
Code¶
#!/usr/bin/python3.8
# coding: utf-8
""" Snakemake wrapper for MashMap """
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
max_threads = snakemake.threads
# Handling input file types (either a fasta file, or a text file with a list of paths to fasta files)
ref = snakemake.input["ref"]
if ref.endswith(".txt"):
ref = f"--refList {ref}"
elif ref.endswith(".gz"):
ref = f"--ref <( gzip --decompress --stdout {ref} )"
max_threads -= 1
else:
ref = f"--ref {ref}"
if max_threads < 1:
raise ValueError(
"Reference fasta on-the-fly g-unzipping consumed one thread."
f" Please increase the number of available threads by {1 - max_threads}."
)
# Handling query file format (either a fastq file or a text file with a list of fastq files)
query = snakemake.input["query"]
if query.endswith(".txt"):
query = f"--queryList {query}"
else:
query = f"--query {query}"
shell(
"mashmap "
"{ref} "
"{query} "
"--output {snakemake.output} "
"--threads {snakemake.threads} "
"{extra} "
"{log}"
)
MERQURY¶
Evaluate genome assemblies with k-mers and more.
URL: https://github.com/marbl/merqury
Example¶
This wrapper can be used in the following way:
rule run_merqury_haploid:
input:
fasta="hap1.fasta",
meryldb="meryldb",
output:
# meryldb output
filt="results/haploid/meryldb.filt",
hist="results/haploid/meryldb.hist",
hist_ploidy="results/haploid/meryldb.hist.ploidy",
# general output
completeness_stats="results/haploid/out.completeness.stats",
dist_only_hist="results/haploid/out.dist.only.hist",
qv="results/haploid/out.qv",
spectra_asm_hist="results/haploid/out.spectra_asm.hist",
spectra_asm_ln_png="results/haploid/out.spectra_asm.png",
# haplotype-specific output
fas1_only_bed="results/haploid/hap1.bed",
fas1_only_wig="results/haploid/hap1.wig",
fas1_only_hist="results/haploid/hap1.hist",
fas1_qv="results/haploid/hap1.qv",
fas1_spectra_cn_hist="results/haploid/hap1.spectra.hist",
fas1_spectra_cn_ln_png="results/haploid/hap1.spectra.png",
log:
std="logs/haploid.log",
spectra_cn="logs/haploid.spectra-cn.log",
threads: 1
wrapper:
"v2.2.1/bio/merqury"
rule run_merqury_diploid:
input:
fasta=["hap1.fasta", "hap2.fasta"],
meryldb="meryldb",
output:
# meryldb output
filt="results/diploid/meryldb.filt",
hist="results/diploid/meryldb.hist",
hist_ploidy="results/diploid/meryldb.hist.ploidy",
# general output
completeness_stats="results/diploid/out.completeness.stats",
dist_only_hist="results/diploid/out.dist.only.hist",
only_hist="results/diploid/out.only.hist",
qv="results/diploid/out.qv",
spectra_asm_hist="results/diploid/out.spectra_asm.hist",
spectra_asm_ln_png="results/diploid/out.spectra_asm.png",
spectra_cn_hist="results/diploid/out.spectra_cn.hist",
spectra_cn_ln_png="results/diploid/out.spectra_cn.png",
# haplotype-specific output
fas1_only_bed="results/diploid/hap1.bed",
fas1_only_wig="results/diploid/hap1.wig",
fas1_only_hist="results/diploid/hap1.hist",
fas1_qv="results/diploid/hap1.qv",
fas1_spectra_cn_hist="results/diploid/hap1.spectra.hist",
fas1_spectra_cn_ln_png="results/diploid/hap1.spectra.png",
fas2_only_bed="results/diploid/hap2.bed",
fas2_only_wig="results/diploid/hap2.wig",
fas2_only_hist="results/diploid/hap2.hist",
fas2_qv="results/diploid/hap2.qv",
fas2_spectra_cn_hist="results/diploid/hap2.spectra.hist",
fas2_spectra_cn_ln_png="results/diploid/hap2.spectra.png",
log:
std="logs/diploid.log",
spectra_cn="logs/diploid.spectra-cn.log",
threads: 1
wrapper:
"v2.2.1/bio/merqury"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Merqury does not return non-zero exit code when it fails, so always include (at least) one PNG file in your rule’s output.
- Merqury does not allow for extra params.
Software dependencies¶
merqury=1.3
Input/Output¶
Input:
- one on two assembly fasta file(s)
- meryl database
Output:
- annotation quality files
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import os
import tempfile
from pathlib import Path
from snakemake.shell import shell
meryldb_parents = snakemake.input.get("meryldb_parents", "")
out_prefix = "out"
log_tmp = "__LOG__.tmp"
def save_output(out_prefix, ext, cwd, file):
if file is None:
return 0
src = f"{out_prefix}{ext}"
dest = cwd / file
shell("cat {src} > {dest}")
with tempfile.TemporaryDirectory() as tmpdir:
cwd = Path.cwd()
# Create symlinks for input files
for input in snakemake.input:
src = Path(input)
dst = Path(tmpdir) / input
src = Path(os.path.relpath(src.resolve(), dst.resolve().parent))
dst.symlink_to(src)
os.chdir(tmpdir)
shell(
"merqury.sh"
" {snakemake.input.meryldb}"
" {meryldb_parents}"
" {snakemake.input.fasta}"
" {out_prefix}"
" > {log_tmp} 2>&1"
)
### Saving LOG files
save_output(log_tmp, "", cwd, snakemake.log.get("std"))
for type in ["spectra_cn"]:
save_output(
f"logs/{out_prefix}",
"." + type.replace("_", "-") + ".log",
cwd,
snakemake.log.get(type),
)
### Saving OUTPUT files
# EXT: replace all "_" with "."
meryldb = Path(snakemake.input.meryldb.rstrip("/")).stem
for type in ["filt", "hist", "hist_ploidy"]:
save_output(
meryldb, "." + type.replace("_", "."), cwd, snakemake.output.get(type)
)
# EXT: replace last "_" with "."
for type in [
"completeness_stats",
"dist_only_hist",
"only_hist",
"qv",
"hapmers_count",
"hapmers_blob_png",
]:
save_output(
out_prefix,
"." + type[::-1].replace("_", ".", 1)[::-1],
cwd,
snakemake.output.get(type),
)
# EXT: replace first "_" with "-", and remaining with "."
for type in [
"spectra_asm_hist",
"spectra_asm_ln_png",
"spectra_asm_fl_png",
"spectra_asm_st_png",
"spectra_cn_hist",
"spectra_cn_ln_png",
"spectra_cn_fl_png",
"spectra_cn_st_png",
]:
save_output(
out_prefix,
"." + type.replace("_", ".").replace(".", "-", 1),
cwd,
snakemake.output.get(type),
)
input_fas = snakemake.input.fasta
if isinstance(input_fas, str):
input_fas = [input_fas]
for fas in range(1, len(input_fas) + 1):
prefix = Path(input_fas[fas - 1]).name.removesuffix(".fasta")
# EXT: remove everything until first "_" and replace last "_" with "."
for type in [f"fas{fas}_only_bed", f"fas{fas}_only_wig"]:
save_output(
prefix,
type[type.find("_") :][::-1].replace("_", ".", 1)[::-1],
cwd,
snakemake.output.get(type),
)
# EXT: remove everything until first "_" and replace all "_" with "."
for type in [f"fas{fas}_only_hist", f"fas{fas}_qv"]:
save_output(
f"{out_prefix}.{prefix}",
type[type.find("_") :].replace("_", "."),
cwd,
snakemake.output.get(type),
)
# EXT: remove everything until first "_", replace first "_" with "-", and remaining with "."
for type in [
f"fas{fas}_spectra_cn_hist",
f"fas{fas}_spectra_cn_ln_png",
f"fas{fas}_spectra_cn_fl_png",
f"fas{fas}_spectra_cn_st_png",
]:
save_output(
f"{out_prefix}.{prefix}",
"." + type[type.find("_") + 1 :].replace("_", ".").replace(".", "-", 1),
cwd,
snakemake.output.get(type),
)
MERYL¶
For meryl, the following wrappers are available:
MERYL COUNT¶
A genomic k-mer counter (and sequence utility) with nice features.
URL: https://github.com/marbl/meryl
Example¶
This wrapper can be used in the following way:
rule meryl_count:
input:
fasta="{genome}.fasta",
output:
directory("{genome}/"),
log:
"logs/meryl_count/{genome}.log",
params:
command="count",
extra="k=32",
threads: 2
resources:
mem_mb=2048,
wrapper:
"v2.2.1/bio/meryl/count"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The command param allows to specify how to count the kmers: count (canonical kmers) [default], count-forward (only forward kmers), or count-reverse (only reverse kmers).
- The extra param allows for additional program arguments (kmer size k is mandatory).
Software dependencies¶
meryl=2013
snakemake-wrapper-utils=0.5.2
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
command = snakemake.params.get("command", "count")
assert command in [
"count",
"count-forward",
"count-reverse",
], "invalid command specified."
mem_gb = get_mem(snakemake, out_unit="GiB")
shell(
"meryl"
" {command}"
" threads={snakemake.threads}"
" memory={mem_gb}"
" {extra}"
" {snakemake.input}"
" output {snakemake.output}"
" {log}"
)
MERYL SETS¶
A genomic k-mer counter (and sequence utility) with nice features.
URL: https://github.com/marbl/meryl
Example¶
This wrapper can be used in the following way:
rule meryl_union:
input:
"{genome}",
"{genome}",
output:
directory("{genome}_union/"),
log:
"logs/{genome}.union.log",
params:
command="union-sum",
wrapper:
"v2.2.1/bio/meryl/sets"
rule meryl_intersect:
input:
"{genome}",
"{genome}",
output:
directory("{genome}_intersect/"),
log:
"logs/{genome}.intersect.log",
params:
command="intersect-max",
wrapper:
"v2.2.1/bio/meryl/sets"
rule meryl_subtract:
input:
"{genome}",
"{genome}",
output:
directory("{genome}_subtract/"),
log:
"logs/{genome}.subtract.log",
params:
command="subtract",
wrapper:
"v2.2.1/bio/meryl/sets"
rule meryl_difference:
input:
"{genome}",
"{genome}",
output:
directory("{genome}_difference/"),
log:
"logs/{genome}.difference.log",
params:
command="difference",
wrapper:
"v2.2.1/bio/meryl/sets"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The command param allows to specify how to handle the kmer sets: union (number of inputs) [default], union-min (union with minimum count), union-max (union with maximum count), union-sum (union with sum of the counts), intersect (intersect with counts in the first input), intersect-min (intersect with minimum count), intersect-max (intersect with maximum count), intersect-sum (intersect with sum of counts), subtract (counts from first input, subtracting counts from the other inputs), difference (counts from first input, but none of the other inputs), or symmetric-difference (exactly one input).
Software dependencies¶
meryl=2013
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
command = snakemake.params.get("command", "union")
assert command in [
"union",
"union-min",
"union-max",
"union-sum",
"intersect",
"intersect-min",
"intersect-max",
"intersect-sum",
"subtract",
"difference",
"symmetric-difference",
], "invalid command specified."
shell("meryl {command} {snakemake.input} output {snakemake.output} {log}")
MERYL STATS¶
A genomic k-mer counter (and sequence utility) with nice features.
URL: https://github.com/marbl/meryl
Example¶
This wrapper can be used in the following way:
rule meryl_stats:
input:
"{genome}",
output:
"{genome}.stats",
log:
"logs/meryl_stats/{genome}.log",
params:
command="statistics",
wrapper:
"v2.2.1/bio/meryl/stats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The command param allows to specify which stats to print: statistics (display total, unique, distinct kmers) [default], histogram (display kmer frequency), or print (display kmers).
Software dependencies¶
meryl=2013
Input/Output¶
Input:
- meryl database(s)
Output:
- meryl stats (either the kmers, statistics, or histogram)
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
command = snakemake.params.get("command", "statistics")
assert command in [
"statistics",
"histogram",
"print",
], "invalid command specified."
shell("meryl {command} {snakemake.input} > {snakemake.output} {log}")
MICROPHASER¶
For microphaser, the following wrappers are available:
MICROPHASER BUILD_REFERENCE¶
Create a reference of all normal peptides in a sample
Example¶
This wrapper can be used in the following way:
rule microphaser_build:
input:
# all normal peptides from the complete proteome as nucleotide sequences
ref_peptides="germline/peptides.fasta",
output:
# a binary of the normal peptides amino acid sequences
bin="out/peptides.bin",
# the amino acid sequences in FASTA format
peptides="out/peptides.fasta",
log:
"logs/microphaser/build_reference.log"
params:
extra="--peptide-length 9", # optional, desired peptide length in amino acids.
wrapper:
"v2.2.1/bio/microphaser/build_reference"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information see, https://github.com/koesterlab/microphaser.
Software dependencies¶
microphaser=0.8.0
Input/Output¶
Input:
- peptide reference (nucleotide sequences from microphaser germline)
Output:
- peptide reference in amino acid FASTA format
- binary peptide reference for filtering
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"microphaser build_reference "
"{extra} "
"--reference {snakemake.input.ref_peptides} "
"--output {snakemake.output.bin} "
"> {snakemake.output.peptides} "
"{log}"
)
MICROPHASER FILTER¶
Translate and filter neopeptides from microphaser output
Example¶
This wrapper can be used in the following way:
rule microphaser_filter:
input:
# the info file of the tumor sample to filter
tsv="somatic/info.tsv",
# All normal peptides to filter against
ref_peptides="germline/peptides.bin",
output:
# the filtered neopeptides
tumor="out/peptides.mt.fasta",
# the normal peptides matching the filtered neopeptides
normal="out/peptides.wt.fasta",
# the info data of the filtered neopeptides
tsv="out/peptides.info.tsv",
# the info data of the removed neopeptides
removed_tsv="out/peptides.removed.tsv",
# the removed neopeptides
removed_fasta="out/peptides.removed.fasta",
log:
"logs/microphaser/filter.log",
params:
extra="--peptide-length 9", # optional, desired peptide length in amino acids.
wrapper:
"v2.2.1/bio/microphaser/filter"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information see, https://github.com/koesterlab/microphaser.
Software dependencies¶
microphaser=0.8.0
Input/Output¶
Input:
- neopeptides fasta (nucleotide sequences from microphaser somatic)
- information tsv (from microphaser somatic)
- sample-specific normal/wildtype pepetides (binary created using microphaser build)
Output:
- filtered neopeptides (removed self-identical peptides) in amino acid FASTA format
- corresponding normal peptides in amino acid FASTA format
- filtered information tsv
- self-identical peptides removed from the neopeptide set (tsv)
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"microphaser filter "
"{extra} "
"--tsv {snakemake.input.tsv} "
"--reference {snakemake.input.ref_peptides} "
"--normal-output {snakemake.output.normal} "
"--tsv-output {snakemake.output.tsv} "
"--similar-removed {snakemake.output.removed_tsv} "
"--removed-peptides {snakemake.output.removed_fasta} "
" > {snakemake.output.tumor} "
"{log}"
)
MICROPHASER NORMAL¶
Predict sample-specific normal peptides with integrated germline variants from NGS (whole exome/genome) data
Example¶
This wrapper can be used in the following way:
rule microphaser_normal:
input:
bam="mapped/{sample}.sorted.bam",
index="mapped/{sample}.sorted.bam.bai",
ref="genome.fasta",
annotation="genome.gtf",
variants="calls/{sample}.bcf",
output:
# all peptides from the healthy proteome
peptides="out/{sample}.fasta",
tsv="out/{sample}.tsv",
log:
"logs/microphaser/somatic/{sample}.log",
params:
extra="--window-len 9", # optional, desired peptide length in nucleotide bases, e.g. 27 (9 AA) for MHC-I ligands.
wrapper:
"v2.2.1/bio/microphaser/normal"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information see, https://github.com/koesterlab/microphaser.
Software dependencies¶
microphaser=0.8.0
Input/Output¶
Input:
- bam file
- bcf file
- fasta reference
- gtf annotation file
Output:
- sample-specific peptide fasta (nucleotide sequences)
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"microphaser normal {snakemake.input.bam} "
"{extra} "
"--ref {snakemake.input.ref} "
"--variants {snakemake.input.variants} "
"--tsv {snakemake.output.tsv} "
"> {snakemake.output.peptides} "
"< {snakemake.input.annotation} "
"{log}"
)
MICROPHASER SOMATIC¶
Predict mutated neopeptides and their wildtype counterparts from NGS (whole exome/genome) data
Example¶
This wrapper can be used in the following way:
rule microphaser_somatic:
input:
bam="mapped/{sample}.sorted.bam",
index="mapped/{sample}.sorted.bam.bai",
ref="genome.fasta",
annotation="genome.gtf",
variants="calls/{sample}.bcf",
output:
# sequences neopeptides arisen from somatic variants
tumor="out/{sample}.mt.fasta",
# sequences of the normal, unmutated counterpart to every neopeptide
normal="out/{sample}.wt.fasta",
# info data of the somatic neopeptides
tsv="out/{sample}.info.tsv",
log:
"logs/microphaser/somatic/{sample}.log",
params:
extra="--window-len 9", # optional, desired peptide length in nucleotide bases, e.g. 27 (9 AA) for MHC-I ligands.
wrapper:
"v2.2.1/bio/microphaser/somatic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information see, https://github.com/koesterlab/microphaser.
Software dependencies¶
microphaser=0.8.0
Input/Output¶
Input:
- bam file
- bcf file
- fasta reference
- gtf annotation file
Output:
- mutated peptide fasta (nucleotide sequences)
- wildtype peptide fasta (nucleotide sequences)
- information tsv
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"microphaser somatic {snakemake.input.bam} "
"{extra} "
"--ref {snakemake.input.ref} "
"--variants {snakemake.input.variants} "
"--normal-output {snakemake.output.normal} "
"--tsv {snakemake.output.tsv} "
"> {snakemake.output.tumor} "
"< {snakemake.input.annotation} "
"{log}"
)
MINIMAP2¶
For minimap2, the following wrappers are available:
MINIMAP2¶
A versatile pairwise aligner for genomic and spliced nucleotide sequences.
URL: https://lh3.github.io/minimap2
Example¶
This wrapper can be used in the following way:
rule minimap2_paf:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"],
output:
"aligned/{input1}_aln.paf",
log:
"logs/minimap2/{input1}.log",
params:
extra="-x map-pb", # optional
sorting="coordinate", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"v2.2.1/bio/minimap2/aligner"
rule minimap2_sam:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"],
output:
"aligned/{input1}_aln.sam",
log:
"logs/minimap2/{input1}.log",
params:
extra="-x map-pb", # optional
sorting="none", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"v2.2.1/bio/minimap2/aligner"
rule minimap2_sam_sorted:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"],
output:
"aligned/{input1}_aln.sorted.sam",
log:
"logs/minimap2/{input1}.log",
params:
extra="-x map-pb", # optional
sorting="queryname", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"v2.2.1/bio/minimap2/aligner"
rule minimap2_bam_sorted:
input:
target="target/{input1}.mmi", # can be either genome index or genome fasta
query=["query/reads1.fasta", "query/reads2.fasta"],
output:
"aligned/{input1}_aln.sorted.bam",
log:
"logs/minimap2/{input1}.log",
params:
extra="-x map-pb", # optional
sorting="coordinate", # optional: Enable sorting. Possible values: 'none', 'queryname' or 'coordinate'
sort_extra="", # optional: extra arguments for samtools/picard
threads: 3
wrapper:
"v2.2.1/bio/minimap2/aligner"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments for minimap2.
- The sort param allows to enable sorting (if output not PAF), and can be either ‘none’, ‘queryname’ or ‘coordinate’.
- The sort_extra allows for extra arguments for samtools/picard
Software dependencies¶
minimap2=2.26
samtools
snakemake-wrapper-utils=0.6.1
Authors¶
- Tom Poorten
- Michael Hall
- Filipe G. Vieira
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_output=False, param_name="sort_extra"
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
sort = snakemake.params.get("sorting", "none")
sort_extra = snakemake.params.get("sort_extra", "")
out_ext = infer_out_format(snakemake.output[0])
pipe_cmd = ""
if out_ext != "PAF":
# Add option for SAM output
extra += " -a"
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
if out_ext != "SAM":
# Simply convert to output format using samtools view.
pipe_cmd = f"| samtools view -h {samtools_opts}"
elif sort in ["coordinate", "queryname"]:
# Add name flag if needed.
if sort == "queryname":
sort_extra += " -n"
# Sort alignments.
pipe_cmd = f"| samtools sort {sort_extra} {samtools_opts}"
else:
raise ValueError(f"Unexpected value for params.sort: {sort}")
shell(
"(minimap2"
" -t {snakemake.threads}"
" {extra} "
" {snakemake.input.target}"
" {snakemake.input.query}"
" {pipe_cmd}"
" > {snakemake.output[0]}"
") {log}"
)
MINIMAP2 INDEX¶
creates a minimap2 index
URL: https://lh3.github.io/minimap2
Example¶
This wrapper can be used in the following way:
rule minimap2_index:
input:
target="target/{input1}.fasta"
output:
"{input1}.mmi"
log:
"logs/minimap2_index/{input1}.log"
params:
extra="" # optional additional args
threads: 3
wrapper:
"v2.2.1/bio/minimap2/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
minimap2=2.26
Authors¶
- Tom Poorten
Code¶
__author__ = "Tom Poorten"
__copyright__ = "Copyright 2017, Tom Poorten"
__email__ = "tom.poorten@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(minimap2 -t {snakemake.threads} {extra} "
"-d {snakemake.output[0]} {snakemake.input.target}) {log}"
)
MLST¶
Scan contig files against traditional PubMLST typing schemes
Example¶
This wrapper can be used in the following way:
rule run_mlst:
input:
#Input assembly
assembly="{sample}.fasta",
output:
#Tab delimited mlst designation
mlst="{sample}_mlst.txt",
params:
#extra parameters should be space delimited
# SYNOPSIS
# Automatic MLST calling from assembled contigs
# USAGE
# % mlst --list # list known schemes
# % mlst [options] <contigs.{fasta,gbk,embl}[.gz] # auto-detect scheme
# % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
# GENERAL
# --help This help
# --version Print version and exit(default ON)
# --check Just check dependencies and exit (default OFF)
# --quiet Quiet - no stderr output (default OFF)
# --threads [N] Number of BLAST threads (suggest GNU Parallel instead) (default '1')
# --debug Verbose debug output to stderr (default OFF)
# SCHEME
# --scheme [X] Don't autodetect, force this scheme on all inputs (default '')
# --list List available MLST scheme names (default OFF)
# --longlist List allelles for all MLST schemes (default OFF)
# --exclude [X] Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
# OUTPUT
# --csv Output CSV instead of TSV (default OFF)
# --json [X] Also write results to this file in JSON format (default '')
# --label [X] Replace FILE with this name instead (default '')
# --nopath Strip filename paths from FILE column (default OFF)
# --novel [X] Save novel alleles to this FASTA file (default '')
# --legacy Use old legacy output with allele header row (requires --scheme) (default OFF)
# SCORING
# --minid [n.n] DNA %identity of full allelle to consider 'similar' [~] (default '95')
# --mincov [n.n] DNA %cov to report partial allele at all [?] (default '10')
# --minscore [n.n] Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
# PATHS
# --blastdb [X] BLAST database
# --datadir [X] PubMLST data
# HOMEPAGE
# https://github.com/tseemann/mlst - Torsten Seemann
extra="--nopath",
log:
"logs/{sample}.mlst.log",
threads: 1
wrapper:
"v2.2.1/bio/mlst"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more inforamtion see https://github.com/tseemann/mlst
Software dependencies¶
mlst=2.23.0
Input/Output¶
Input:
- Genomic assembly (fasta format)
Output:
- Returns a tab-separated line containing the filename, matching PubMLST scheme name, ST (sequence type) and the allele IDs. Other output formats are also available (eg. CSV, JSON)
Authors¶
- Torsten Seeman (mlst tool) - https://github.com/tseemann/mlst
- Max Cummins (Snakemake wrapper [unaffiliated with Torsten Seeman])
Code¶
__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"mlst"
" {snakemake.params.extra}"
" {snakemake.input.assembly}"
" > {snakemake.output.mlst}"
" {log}"
)
MOSDEPTH¶
fast BAM/CRAM depth calculation
Example¶
This wrapper can be used in the following way:
rule mosdepth:
input:
bam="aligned/{dataset}.bam",
bai="aligned/{dataset}.bam.bai",
output:
"mosdepth/{dataset}.mosdepth.global.dist.txt",
"mosdepth/{dataset}.per-base.bed.gz", # produced unless --no-per-base specified
summary="mosdepth/{dataset}.mosdepth.summary.txt", # this named output is required for prefix parsing
log:
"logs/mosdepth/{dataset}.log",
params:
extra="--fast-mode", # optional
# additional decompression threads through `--threads`
threads: 4 # This value - 1 will be sent to `--threads`
wrapper:
"v2.2.1/bio/mosdepth"
rule mosdepth_bed:
input:
bam="aligned/{dataset}.bam",
bai="aligned/{dataset}.bam.bai",
bed="test.bed",
output:
"mosdepth_bed/{dataset}.mosdepth.global.dist.txt",
"mosdepth_bed/{dataset}.mosdepth.region.dist.txt",
"mosdepth_bed/{dataset}.regions.bed.gz",
summary="mosdepth_bed/{dataset}.mosdepth.summary.txt", # this named output is required for prefix parsing
log:
"logs/mosdepth_bed/{dataset}.log",
params:
extra="--no-per-base --use-median", # optional
# additional decompression threads through `--threads`
threads: 4 # This value - 1 will be sent to `--threads`
wrapper:
"v2.2.1/bio/mosdepth"
rule mosdepth_by_threshold:
input:
bam="aligned/{dataset}.bam",
bai="aligned/{dataset}.bam.bai",
output:
"mosdepth_by_threshold/{dataset}.mosdepth.global.dist.txt",
"mosdepth_by_threshold/{dataset}.mosdepth.region.dist.txt",
"mosdepth_by_threshold/{dataset}.regions.bed.gz",
"mosdepth_by_threshold/{dataset}.thresholds.bed.gz", # needs to go with params.thresholds spec
summary="mosdepth_by_threshold/{dataset}.mosdepth.summary.txt", # this named output is required for prefix parsing
log:
"logs/mosdepth_by/{dataset}.log",
params:
by="500", # optional, window size, specifies --by for mosdepth.region.dist.txt and regions.bed.gz
thresholds="1,5,10,30", # optional, specifies --thresholds for thresholds.bed.gz
# additional decompression threads through `--threads`
threads: 4 # This value - 1 will be sent to `--threads`
wrapper:
"v2.2.1/bio/mosdepth"
rule mosdepth_quantize_precision:
input:
bam="aligned/{dataset}.bam",
bai="aligned/{dataset}.bam.bai",
output:
"mosdepth_quantize_precision/{dataset}.mosdepth.global.dist.txt",
"mosdepth_quantize_precision/{dataset}.quantized.bed.gz", # optional, needs to go with params.quantize spec
summary="mosdepth_quantize_precision/{dataset}.mosdepth.summary.txt", # this named output is required for prefix parsing
log:
"logs/mosdepth_quantize_precision/{dataset}.log",
params:
extra="--no-per-base", # optional
quantize="0:1:5:150", # optional, specifies --quantize for quantized.bed.gz
precision="5", # optional, set decimals of precision
# additional decompression threads through `--threads`
threads: 4 # This value - 1 will be sent to `--threads`
wrapper:
"v2.2.1/bio/mosdepth"
rule mosdepth_cram:
input:
bam="aligned/{dataset}.cram",
bai="aligned/{dataset}.cram.crai",
bed="test.bed",
fasta="genome.fasta",
output:
"mosdepth_cram/{dataset}.mosdepth.global.dist.txt",
"mosdepth_cram/{dataset}.mosdepth.region.dist.txt",
"mosdepth_cram/{dataset}.regions.bed.gz",
summary="mosdepth_cram/{dataset}.mosdepth.summary.txt", # this named output is required for prefix parsing
log:
"logs/mosdepth_cram/{dataset}.log",
params:
extra="--no-per-base --use-median", # optional
# additional decompression threads through `--threads`
threads: 4 # This value - 1 will be sent to `--threads`
wrapper:
"v2.2.1/bio/mosdepth"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The by param allows to specify (integer) window-sizes (incompatible with input BED).
- The threshold param allows to, for or each interval in –by, write number of bases covered by at least threshold bases. Specify multiple integer values separated by ‘,’.
- The precision param allows to specify output floating point precision.
- The extra param allows for additional program arguments.
- For more information see, https://github.com/brentp/mosdepth
Software dependencies¶
mosdepth=0.3.3
Input/Output¶
Input:
- BAM/CRAM files
- reference genome (optional)
- BED file (optional)
Output:
- Several coverage summary files.
Authors¶
- William Rowell
- David Lähnemann
- Filipe Vieira
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
import sys
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
bed = snakemake.input.get("bed", "")
by = snakemake.params.get("by", "")
if by:
if bed:
sys.exit(
"Either provide a bed input file OR a window size via params.by, not both."
)
else:
by = f"--by {by}"
if bed:
by = f"--by {bed}"
quantize_out = False
thresholds_out = False
regions_bed_out = False
region_dist_out = False
for file in snakemake.output:
if ".per-base." in file and "--no-per-base" in extra:
sys.exit(
"You asked not to generate per-base output (--no-per-base), but your rule specifies a '.per-base.' output file. Remove one of the two."
)
if ".quantized.bed.gz" in file:
quantize_out = True
if ".thresholds.bed.gz" in file:
thresholds_out = True
if ".mosdepth.region.dist.txt" in file:
region_dist_out = True
if ".regions.bed.gz" in file:
regions_bed_out = True
if by and not regions_bed_out:
sys.exit(
"You ask for by-region output. Please also specify *.regions.bed.gz as a rule output."
)
if by and not region_dist_out:
sys.exit(
"You ask for by-region output. Please also specify *.mosdepth.region.dist.txt as a rule output."
)
if (region_dist_out or regions_bed_out) and not by:
sys.exit(
"You specify *.regions.bed.gz and/or *.mosdepth.region.dist.txt as a rule output. You also need to ask for by-region output via 'input.bed' or 'params.by'."
)
quantize = snakemake.params.get("quantize", "")
if quantize:
if not quantize_out:
sys.exit(
"You ask for quantized output via params.quantize. Please also specify *.quantized.bed.gz as a rule output."
)
quantize = f"--quantize {quantize}"
if not quantize and quantize_out:
sys.exit(
"The rule has output *.quantized.bed.gz specified. Please also specify params.quantize to actually generate it."
)
thresholds = snakemake.params.get("thresholds", "")
if thresholds:
if not thresholds_out:
sys.exit(
"You ask for --thresholds output via params.thresholds. Please also specify *.thresholds.bed.gz as a rule output."
)
thresholds = f"--thresholds {thresholds}"
if not thresholds and thresholds_out:
sys.exit(
"The rule has output *.thresholds.bed.gz specified. Please also specify params.thresholds to actually generate it."
)
precision = snakemake.params.get("precision", "")
if precision:
precision = f"MOSDEPTH_PRECISION={precision}"
fasta = snakemake.input.get("fasta", "")
if fasta:
fasta = f"--fasta {fasta}"
# mosdepth takes additional threads through its option --threads
# One thread for mosdepth
# Other threads are *additional* decompression threads passed to the '--threads' argument
threads = "" if snakemake.threads <= 1 else "--threads {}".format(snakemake.threads - 1)
# named output summary = "*.mosdepth.summary.txt" is required
prefix = snakemake.output.summary.replace(".mosdepth.summary.txt", "")
shell(
"({precision} mosdepth {threads} {fasta} {by} {quantize} {thresholds} {extra} {prefix} {snakemake.input.bam}) {log}"
)
MSISENSOR¶
For msisensor, the following wrappers are available:
MSISENSOR MSI¶
Score your MSI with MSIsensor
Example¶
This wrapper can be used in the following way:
rule test_msisensor_msi:
input:
normal = "example.normal.bam",
tumor = "example.tumor.bam",
microsat = "example.microsate.sites"
output:
"example.msi",
"example.msi_dis",
"example.msi_germline",
"example.msi_somatic"
message:
"Testing MSIsensor msi"
threads:
1
log:
"example.log"
params:
out_prefix = "example.msi"
wrapper:
"v2.2.1/bio/msisensor/msi"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
msisensor=0.5
Input/Output¶
Input:
- A microsatellite and homopolymer list from MSIsensor Scan
- A pair of normal/tumoral bams
Output:
- A text file containing MSI scores
- A TSV formatted file containing read count distribution
- A TSV formatted file containing somatic sites
- A TSV formatted file containing germline sites
Authors¶
Code¶
"""Snakemake script for MSISensor msi"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from os.path import commonprefix
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")
# Detemining common prefix in output files
# to fill the requested parameter '-o'
prefix = commonprefix(snakemake.output)
shell(
"msisensor msi" # Tool and its sub-command
" -d {snakemake.input.microsat}" # Path to homopolymer/microsat file
" -n {snakemake.input.normal}" # Path to normal bam
" -t {snakemake.input.tumor}" # Path to tumor bam
" -o {prefix}" # Path to output distribution file
" -b {snakemake.threads}" # Maximum number of threads used
" {extra}" # Optional extra parameters
" {log}" # Logging behavior
)
MSISENSOR SCAN¶
Scan homopolymers and microsatelites with MSIsensor
Example¶
This wrapper can be used in the following way:
rule test_msisensor_scan:
input:
"genome.fasta"
output:
"microsat.list"
message:
"Testing MSISensor scan"
threads:
1
params:
extra = ""
log:
"logs/msisensor_scan.log"
wrapper:
"v2.2.1/bio/msisensor/scan"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
msisensor=0.5
Input/Output¶
Input:
- A (multi)fasta formatted file
Output:
- A text file containing homopolymers and microsatelites
Authors¶
- Thibault Dayris
Code¶
"""Snakemake script for MSISensor Scan"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")
shell(
"msisensor scan " # Tool and its sub-command
"-d {snakemake.input} " # Path to fasta file
"-o {snakemake.output} " # Path to output file
"{extra} " # Optional extra parameters
"{log}" # Logging behavior
)
MULTIQC¶
Generate qc report using multiqc.
Example¶
This wrapper can be used in the following way:
rule multiqc_dir:
input:
expand("samtools_stats/{sample}.txt", sample=["a", "b"])
output:
"qc/multiqc.html"
params:
extra="" # Optional: extra parameters for multiqc.
log:
"logs/multiqc.log"
wrapper:
"v2.2.1/bio/multiqc"
rule multiqc_file:
input:
expand("samtools_stats/{sample}.txt", sample=["a"])
output:
"qc/multiqc_a.html"
params:
extra="", # Optional: extra parameters for multiqc.
use_input_files_only=True, # Optional, use only a.txt and don't search folder samtools_stats for files
log:
"logs/multiqc.log"
wrapper:
"v2.2.1/bio/multiqc"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
multiqc=1.14
Input/Output¶
Input:
- input directory containing qc files, default behaviour is to extract folder path from the provided files or parent folder if a folder is provided.
Output:
- qc report (html)
Params¶
use_input_files_only
: if this variable is set to True input will be used as it is, i.e no folder will be extract from provided file names
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
# Set this to False if multiqc should use the actual input directly
# instead of parsing the folders where the provided files are located
use_input_files_only = snakemake.params.get("use_input_files_only", False)
if not use_input_files_only:
input_data = set(path.dirname(fp) for fp in snakemake.input)
else:
input_data = set(snakemake.input)
output_dir = path.dirname(snakemake.output[0])
output_name = path.basename(snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"multiqc"
" {extra}"
" --force"
" -o {output_dir}"
" -n {output_name}"
" {input_data}"
" {log}"
)
MUSCLE¶
build multiple sequence alignments using MUSCLE.
URL: https://drive5.com/muscle5/manual/
Example¶
This wrapper can be used in the following way:
rule muscle_fasta:
input:
fasta="{sample}.fa", # Input fasta file
output:
alignment="{sample}.fas", # Output alignment file
log:
"logs/muscle/{sample}.log",
params:
extra="-refineiters 50", # Additional arguments
threads: 2
wrapper:
"v2.2.1/bio/muscle"
rule muscle_super5:
input:
fasta="{sample}.fa",
output:
alignment="{sample}.super5.fas",
log:
"logs/muscle/{sample}.super5.log",
params:
super5 = True,
extra="-refineiters 50",
threads: 2
wrapper:
"v2.2.1/bio/muscle"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
muscle=5.1
Params¶
super5
: specifies whether to use the Super5 algorithm to align sequences
Authors¶
- Nikos Tsardakas Renhuldt
Code¶
__author__ = "Nikos Tsardakas Renhuldt"
__copyright__ = "Copyright 2021, Nikos Tsardakas Renhuldt"
__email__ = "nikos.tsardakas_renhuldt@tbiokem.lth.se"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mode = "-align"
if snakemake.params.get("super5"):
mode = "-super5"
shell(
"muscle"
" -threads {snakemake.threads}"
" {mode} {snakemake.input.fasta}"
" {extra}"
" -output {snakemake.output.alignment}"
" {log}"
)
NANOSIM-H¶
NanoSim-H is a simulator of Oxford Nanopore reads that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of Nanopore sequencing technology.
Example¶
This wrapper can be used in the following way:
rule nanosimh:
input:
"{sample}.fa"
output:
reads = "{sample}.simulated.fa",
log = "{sample}.simulated.log",
errors = "{sample}.simulated.errors.txt"
params:
extra = "",
num_reads = 10,
perfect_reads = True,
min_read_len = 10,
log:
"logs/nanosim-h/test/{sample}.log"
wrapper:
"v2.2.1/bio/nanosim-h"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
nanosim-h=1.1.0.4
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for NanoSim-H."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
def is_header(query):
return query.startswith(">")
def get_length_of_longest_sequence(fh):
current_length = 0
all_lengths = []
for line in fh:
if not is_header(line):
current_length += len(line.rstrip())
else:
all_lengths.append(current_length)
current_length = 0
all_lengths.append(current_length)
return max(all_lengths)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
prefix = snakemake.params.get("prefix", snakemake.output.reads.rpartition(".")[0])
num_reads = snakemake.params.get("num_reads", 10000)
profile = snakemake.params.get("profile", "ecoli_R9_2D")
perfect_reads = snakemake.params.get("perfect_reads", False)
min_read_len = snakemake.params.get("min_read_len", 50)
max_read_len = snakemake.params.get("max_read_len", 0)
# need to do this as the default read length of infinity can cause nanosim-h to
# hang if the reference is short
if max_read_len == 0:
with open(snakemake.input[0]) as fh:
max_read_len = get_length_of_longest_sequence(fh)
perfect_reads_flag = "--perfect " if perfect_reads else ""
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"nanosim-h {extra} "
"{perfect_reads_flag} "
"--max-len {max_read_len} "
"--min-len {min_read_len} "
"--profile {profile} "
"--number {num_reads} "
"--out-pref {prefix} "
"{snakemake.input} {log}"
)
NEXTFLOW¶
Run nextflow pipeline
Example¶
This wrapper can be used in the following way:
rule chipseq_pipeline:
input:
input="design.csv",
fasta="data/genome.fasta",
gtf="data/genome.gtf",
# any --<argname> pipeline file arguments can be given here as <argname>=<path>
output:
"results/multiqc/broadPeak/multiqc_report.html",
params:
pipeline="nf-core/chipseq",
revision="1.2.1",
profile=["test", "docker"],
# any --<argname> pipeline arguments can be given here as <argname>=<value>
handover: True
wrapper:
"v2.2.1/utils/nextflow"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
This wrapper can e.g. be used to run nf-core pipelines. In each of the nf-core pipeline descriptions, you will find available parameters and the output file structure (under “aws results”). The latter can be used to set the desired output files for this wrapper.
Software dependencies¶
nextflow
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2021, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
revision = snakemake.params.get("revision")
profile = snakemake.params.get("profile", [])
extra = snakemake.params.get("extra", "")
if isinstance(profile, str):
profile = [profile]
args = []
if revision:
args += ["-revision", revision]
if profile:
args += ["-profile", ",".join(profile)]
print(args)
# TODO pass threads in case of single job
# TODO limit parallelism in case of pipeline
# TODO handle other resources
add_parameter = lambda name, value: args.append("--{} {}".format(name, value))
for name, files in snakemake.input.items():
if isinstance(files, list):
# TODO check how multiple input files under a single arg are usually passed to nextflow
files = ",".join(files)
add_parameter(name, files)
for name, value in snakemake.params.items():
if (
name != "pipeline"
and name != "revision"
and name != "profile"
and name != "extra"
):
add_parameter(name, value)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
args = " ".join(args)
pipeline = snakemake.params.pipeline
shell("nextflow run {pipeline} {args} {extra} {log}")
NGS-DISAMBIGUATE¶
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.
Example¶
This wrapper can be used in the following way:
rule disambiguate:
input:
a="mapped/{sample}.a.bam",
b="mapped/{sample}.b.bam"
output:
a_ambiguous='disambiguate/{sample}.graft.ambiguous.bam',
b_ambiguous='disambiguate/{sample}.host.ambiguous.bam',
a_disambiguated='disambiguate/{sample}.graft.bam',
b_disambiguated='disambiguate/{sample}.host.bam',
summary='qc/disambiguate/{sample}.txt'
params:
algorithm="bwa",
# optional: Prefix to use for output. If omitted, a
# suitable value is guessed from the output paths. Prefix
# is used for the intermediate output paths, as well as
# sample name in summary file.
prefix="{sample}",
extra=""
wrapper:
"v2.2.1/bio/ngs-disambiguate"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ngs-disambiguate=2018.05.03
bamtools=2.5.1
Input/Output¶
Input:
- species a bam file (name sorted)
- species b bam file (name sorted)
Output:
- bam file with ambiguous alignments for species a
- bam file with ambiguous alignments for species b
- bam file with unambiguous alignments for species a
- bam file with unambiguous alignments for species b
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for ngs-disambiguate (from Astrazeneca)."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
prefix = snakemake.params.get("prefix", None)
extra = snakemake.params.get("extra", "")
output_dir = path.dirname(snakemake.output.a_ambiguous)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# If prefix is not given, we use the summary path to derive the most
# probable sample name (as the summary path is least likely to contain)
# additional suffixes. This is better than using a random id as prefix,
# the prefix is also used as the sample name in the summary file.
if prefix is None:
prefix = path.splitext(path.basename(snakemake.output.summary))[0]
# Run command.
shell(
"ngs_disambiguate"
" {extra}"
" -o {output_dir}"
" -s {prefix}"
" -a {snakemake.params.algorithm}"
" {snakemake.input.a}"
" {snakemake.input.b}"
)
# Move outputs into expected positions.
output_base = path.join(output_dir, prefix)
output_map = {
output_base + ".ambiguousSpeciesA.bam": snakemake.output.a_ambiguous,
output_base + ".ambiguousSpeciesB.bam": snakemake.output.b_ambiguous,
output_base + ".disambiguatedSpeciesA.bam": snakemake.output.a_disambiguated,
output_base + ".disambiguatedSpeciesB.bam": snakemake.output.b_disambiguated,
output_base + "_summary.txt": snakemake.output.summary,
}
for src, dest in output_map.items():
if src != dest:
shell("mv {src} {dest}")
NONPAREIL¶
For nonpareil, the following wrappers are available:
NONPAREIL INFER¶
Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve “nearly complete coverage”.
URL: https://nonpareil.readthedocs.io/en/latest/
Example¶
This wrapper can be used in the following way:
rule nonpareil:
input:
"reads/{sample}",
output:
redund_sum="results/{sample}.npo",
redund_val="results/{sample}.npa",
mate_distr="results/{sample}.npc",
log="results/{sample}.log",
log:
"logs/{sample}.log",
params:
alg="kmer",
extra="-X 1 -k 3 -F",
threads: 2
resources:
mem_mb=50,
wrapper:
"v2.2.1/bio/nonpareil/infer"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For a PDF version of the manual, see https://nonpareil.readthedocs.io/_/downloads/en/latest/pdf/
Software dependencies¶
nonpareil=3.4.1
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- reads in FASTA/Q format (can be gziped or bziped)
Output:
redund_sum
: redundancy summary TSV file with six columns, representing sequencing effort, summary of the distribution of redundancy (average redundancy, standard deviation, quartile 1, median, and quartile 3).redund_val
: redundancy values TSV file with three columns (similar to redundancy summary, but provides ALL results), representing sequencing effort, ID of the replicate and estimated redundancy value.mate_distr
: mate distribution file, with the number of reads in the dataset matching a query read.log
: log of internal Nonpareil processing.
Params¶
alg
: nonpareil algorithm, either kmer or alignment (mandatory).extra
: additional program arguments
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from os import path
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mem_mb = get_mem(snakemake, out_unit="MiB")
uncomp = ""
in_name, in_ext = path.splitext(snakemake.input[0])
if in_ext in [".gz", ".bz2"]:
uncomp = "zcat" if in_ext == ".gz" else "bzcat"
in_name, in_ext = path.splitext(in_name)
# Infer output format
if in_ext in [".fa", ".fas", ".fasta"]:
in_format = "fasta"
elif in_ext in [".fq", ".fastq"]:
in_format = "fastq"
else:
raise ValueError("invalid input format")
# Redundancy summary
redund_sum = snakemake.output.get("redund_sum", "")
if redund_sum:
redund_sum = f"-o {redund_sum}"
# Redundancy values
redund_val = snakemake.output.get("redund_val", "")
if redund_val:
redund_val = f"-a {redund_val}"
# Mate distribution
mate_distr = snakemake.output.get("mate_distr", "")
if mate_distr:
mate_distr = f"-C {mate_distr}"
# Log
out_log = snakemake.output.get("log", "")
if out_log:
out_log = f"-l {out_log}"
with tempfile.NamedTemporaryFile() as tmp:
in_uncomp = snakemake.input[0]
if uncomp:
in_uncomp = tmp.name
shell("{uncomp} {snakemake.input[0]} > {in_uncomp}")
shell(
"nonpareil"
" -t {snakemake.threads}"
" -R {mem_mb}"
" -T {snakemake.params.alg}"
" -s {in_uncomp}"
" -f {in_format}"
" {extra}"
" {redund_sum}"
" {redund_val}"
" {mate_distr}"
" {out_log}"
" {log}"
)
NONPAREIL PLOT¶
Plot Nonpareil results.
URL: https://nonpareil.readthedocs.io/en/latest/
Example¶
This wrapper can be used in the following way:
rule test_nonpareil_plot:
input:
npo="{sample}.npo",
output:
pdf="results/{sample}.pdf",
model="results/{sample}.RData",
threads: 1
log:
"logs/{sample}.log",
params:
label="Plot",
col="blue",
enforce_consistency=True,
star=95,
correction_factor=True,
weights_exp="-1.1,-1.2,-0.9,-1.3,-1",
skip_model=False,
wrapper:
"v2.2.1/bio/nonpareil/plot"
use rule test_nonpareil_plot as test_nonpareil_plot_multiple with:
input:
npo=["a.npo", "b.npo"],
output:
pdf="results/samples.pdf",
model="results/samples.RData",
log:
"logs/samples.log",
params:
label="Plot of 2 samples",
labels="Model A,Model B",
col="blue,red",
enforce_consistency=True,
star=95,
correction_factor=True,
use rule test_nonpareil_plot as test_nonpareil_plot_nomodel with:
output:
pdf="results/{sample}.nomodel.pdf",
model="results/{sample}.RData",
log:
"logs/{sample}.nomodel.log",
params:
label="Plot",
col="blue",
enforce_consistency=True,
star=95,
correction_factor=True,
weights_exp="-1.1,-1.2,-0.9,-1.3,-1",
skip_model=True,
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
nonpareil=3.4.1
Params¶
label
: Plot title.labels
: Curve labels.col
: Curve colors.enforce_consistency
: Fails verbosely on insufficient data, otherwise it warns about the inconsistencies and attempts the estimations.star
: Objective coverage in percentage (i.e., coverage value considered near-complete).correction_factor
: Apply overlap-dependent correction factor, otherwise redundancy is assumed to equal coverage.weights_exp
: Vector of values to be tested as exponent of the weights distribution.skip_model
: Skip model estimation.
Authors¶
- Filipe G. Vieira
Code¶
# __author__ = "Filipe G. Vieira"
# __copyright__ = "Copyright 2023, Filipe G. Vieira"
# __license__ = "MIT"
# This script plots results (NPO file) from NonPareil
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file <- file(snakemake@log[[1]], open = "wt")
base::sink(log.file)
base::sink(log.file, type = "message")
# Loading libraries (order matters)
base::library(package = "Nonpareil", character.only = TRUE)
base::message("Libraries loaded")
# Set input and output files
in_files = snakemake@input[["npo"]]
out_pdf = snakemake@output[[1]]
base::message("Input files: ")
base::print(in_files)
base::message("Saving plot to file: ")
base::print(out_pdf)
# Set parameters
params <- list("label" = ifelse("label" %in% base::names(snakemake@params), snakemake@params[["label"]], NA),
"labels" = NA,
"col" = NA,
"enforce_consistency" = ifelse("enforce_consistency" %in% base::names(snakemake@params), as.logical(snakemake@params[["enforce_consistency"]]), FALSE),
"star" = ifelse("star" %in% base::names(snakemake@params), snakemake@params[["star"]], 95),
"correction_factor" = ifelse("correction_factor" %in% base::names(snakemake@params), as.logical(snakemake@params[["correction_factor"]]), FALSE),
"weights_exp" = NA,
"skip_model" = ifelse("skip_model" %in% base::names(snakemake@params), as.logical(snakemake@params[["skip_model"]]), FALSE)
)
# Not sure why, by using "ifelse" only keeps the first element of the vector
if ("labels" %in% base::names(snakemake@params)) {
params[["labels"]] = unlist(strsplit(snakemake@params[["labels"]], ","))
}
if ("col" %in% base::names(snakemake@params)) {
params[["col"]] = unlist(strsplit(snakemake@params[["col"]], ","))
}
if ("weights_exp" %in% base::names(snakemake@params)) {
params[["weights_exp"]] = as.numeric(unlist(strsplit(snakemake@params[["weights_exp"]], ",")))
}
base::message("Options provided:")
utils::str(params)
# Infer model
pdf(out_pdf)
curves <- Nonpareil.curve.batch(in_files,
label = params[["label"]],
labels = params[["labels"]],
col = params[["col"]],
enforce.consistency = params[["enforce_consistency"]],
star = params[["star"]],
correction.factor = params[["correction_factor"]],
weights.exp = params[["weights_exp"]],
skip.model = params[["skip_model"]],
plot = FALSE
)
# Get stats
stats <- summary(curves)
# Fix names
colnames(stats) <- c("Redundancy", "Avg. coverage", "Seq. effort", "Model correlation", "Required seq. effort", "Diversity")
# If model not infered, set its values to NA
stats[,4] <- sapply(stats[,4], function(x){if(length(x) == 0){NA} else {x}})
stats[,5] <- sapply(stats[,5], function(x){if(length(x) == 0){NA} else {x}})
# Convert to Gb
stats[,3] <- stats[,3] / 1e9
stats[,5] <- stats[,5] / 1e9
# Round
stats <- round(stats, digits = 2)
# Print stats to log
base::print(stats)
# Save plot
plot(curves, legend.opts = FALSE)
# Add legend
legend("bottomright", legend = paste0(paste(colnames(stats), t(stats), sep=": "), c("",""," Gb",""," Gb","")), cex = 0.5)
if (length(in_files) > 1) {
Nonpareil.legend(curves, "topleft", cex = 0.5)
}
# Save model
if ("model" %in% base::names(snakemake@output)) {
save(curves, file=snakemake@output[["model"]])
}
base::message("Nonpareil plot saved")
dev.off()
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
base::sink(type = "message")
base::sink()
OPEN-CRAVAT¶
For open-cravat, the following wrappers are available:
OPENCRAVAT MODULE¶
Install OpenCRAVAT modules. Annotate variant calls with OpenCRAVAT. For more details, see https://github.com/KarchinLab/open-cravat/wiki.
Example¶
This wrapper can be used in the following way:
rule opencravat_module:
output:
# add any other desired modules as separate directory outputs
directory("modules/annotators/biogrid"),
log:
"logs/open-cravat/module.log"
wrapper:
"v2.2.1/bio/open-cravat/module"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
open-cravat=2.3.1
Authors¶
- Rick Kim
Code¶
__author__ = "Rick Kim"
__copyright__ = "Copyright 2020, Rick Kim"
__license__ = "GPLv3"
from snakemake.shell import shell
import cravat
import re
import pathlib
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
onames = []
for o in snakemake.output:
onames.append(o)
if type(onames) == str:
onames = [onames]
elif type(onames) == list:
onames = onames
else:
onames = [str(onames)]
for oname in onames:
if os.path.exists(oname):
continue
[o2, module_name] = os.path.split(oname)
[modules_dir, module_type] = os.path.split(o2)
module_type = module_type[:-1]
modules_dir_cur = cravat.admin_util.get_modules_dir()
if modules_dir_cur != modules_dir:
cravat.admin_util.set_modules_dir(modules_dir)
cmd = ["oc", "module", "install", module_name, "-y"]
cmd = " ".join(cmd)
shell("{cmd} {log}")
OPENCRAVAT RUN¶
Runs OpenCRAVAT. Annotate variant calls with OpenCRAVAT. For more details, see https://github.com/KarchinLab/open-cravat/wiki.
Example¶
This wrapper can be used in the following way:
rule opencravat:
input:
'example_input.tsv',
'modules/commons/hg38wgs',
'modules/converters/cravat-converter',
'modules/mappers/hg38',
'modules/annotators/biogrid',
'modules/annotators/clinvar',
'modules/postaggregators/tagsampler',
'modules/postaggregators/varmeta',
'modules/postaggregators/vcfinfo',
'modules/reporters/excelreporter',
'modules/reporters/tsvreporter',
'modules/reporters/csvreporter',
output:
'example_input.tsv.xlsx',
'example_input.tsv.variant.tsv',
'example_input.tsv.variant.csv'
log:
"logs/open-cravat/run.log"
threads: 1 # set number of threads for parallel processing
wrapper:
"v2.2.1/bio/open-cravat/run"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
open-cravat=2.3.1
Authors¶
- Rick Kim
Code¶
__author__ = "Rick Kim"
__copyright__ = "Copyright 2020, Rick Kim"
__license__ = "GPLv3"
from snakemake.shell import shell
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
inputfiles = []
annotators = []
reporters = []
modules_dir = set()
for v in snakemake.input:
if os.path.isfile(v):
inputfiles.append(v)
elif os.path.isdir(v):
(module_group_dir, module_name) = os.path.split(v)
(in_modules_dir, module_group) = os.path.split(module_group_dir)
modules_dir.add(in_modules_dir)
if module_group == "annotators":
annotators.append(module_name)
elif module_group == "reporters" and module_name.endswith("reporter"):
reporters.append(module_name[:-8])
if len(modules_dir) > 1:
print(f'Multiple modules directory detected: {",".join(list(modules_dir))}')
exit()
cmd = ["oc", "run"]
cmd.extend(inputfiles)
genome = snakemake.params.get("genome", "hg38")
mp = snakemake.threads
cmd.extend(["-l", genome])
cmd.extend(["--mp", str(mp)])
if len(annotators) > 0:
cmd.append("-a")
cmd.extend(annotators)
if len(reporters) > 0:
cmd.append("-t")
cmd.extend(reporters)
extra = snakemake.params.get("extra", "")
if len(extra) > 0 and type(extra) == str:
cmd.extend(extra.split(" "))
shell("{cmd} {log}")
OPTITYPE¶
Precision 4-digit HLA-I-typing from NGS data based on integer linear programming. Use razers3 beforehand to generate input fastq files only mapping to HLA-regions. Please see https://github.com/FRED-2/OptiType
Example¶
This wrapper can be used in the following way:
rule optitype:
input:
# list of input reads
reads=["reads/{sample}_1.fished.fastq", "reads/{sample}_2.fished.fastq"]
output:
pdf="optitype/{sample}_coverage_plot.pdf",
tsv="optitype/{sample}_result.tsv",
log:
"logs/optitype/{sample}.log"
params:
# Type of sequencing data. Can be 'dna' or 'rna'. Default is 'dna'.
sequencing_type="dna",
# optiype config file, optional
config="",
# additional parameters
extra=""
wrapper:
"v2.2.1/bio/optitype"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
optitype=1.3.5
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster, David Lähnemann"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from tempfile import TemporaryDirectory
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# get sequencing type
seq_type = snakemake.params.get("sequencing_type", "dna")
seq_type = "--{}".format(seq_type)
# check if non-default config.ini is used
config = snakemake.params.get("config", "")
if any(config):
config = "--config {}".format(config)
with TemporaryDirectory() as tempdir:
shell(
"(OptiTypePipeline.py"
" --input {snakemake.input.reads}"
" --outdir {tempdir}"
" --prefix tmp_prefix"
" {seq_type}"
" {config}"
" {extra}; "
" mv {tempdir}/tmp_prefix_coverage_plot.pdf {snakemake.output.pdf:q} ;"
" mv {tempdir}/tmp_prefix_result.tsv {snakemake.output.tsv:q} )"
" {log}"
)
PALADIN¶
For paladin, the following wrappers are available:
PALADIN ALIGN¶
Align nucleotide reads to a protein fasta file (that has been indexed with paladin index). PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Example¶
This wrapper can be used in the following way:
rule paladin_align:
input:
reads=["reads/reads.left.fq.gz"],
index="index/prot.fasta.bwt",
output:
"paladin_mapped/{sample}.bam" # will output BAM format if output file ends with ".bam", otherwise SAM format
log:
"logs/paladin/{sample}.log"
threads: 4
wrapper:
"v2.2.1/bio/paladin/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
paladin=1.4.6
samtools=1.17
Input/Output¶
Input:
- nucleotide reads (fastq)
- indexed protein fasta file (output of paladin index or prepare)
Output:
- mapped reads (SAM or BAM format)
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for PALADIN alignment"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
r = snakemake.input.get("reads")
assert (
r is not None
), "reads are required as input. If you have paired end reads, please merge them first (e.g. with PEAR)"
index = snakemake.input.get("index")
assert (
index is not None
), "please index your assembly and provide the basename (with'.bwt' extension) via the 'index' input param"
index_base = str(index).rsplit(".bwt")[0]
outfile = snakemake.output
# if bam output, pipe to bam!
output_cmd = " | samtools view -Sb - > " if str(outfile).endswith(".bam") else " -o "
min_orf_len = snakemake.params.get("f", "250")
shell(
"paladin align -f {min_orf_len} -t {snakemake.threads} {extra} {index_base} {r} {output_cmd} {outfile}"
)
PALADIN INDEX¶
Index a protein fasta file for mapping with paladin. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Example¶
This wrapper can be used in the following way:
rule paladin_index:
input:
"prot.fasta",
output:
"index/prot.fasta.bwt"
log:
"logs/paladin/prot_index.log"
params:
reference_type=3
wrapper:
"v2.2.1/bio/paladin/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
paladin=1.4.6
samtools=1.17
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Paladin Index."""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
# this wrapper temporarily copies your assembly into the output dir
# so that all the paladin output files end up in the desired spot
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
input_assembly = snakemake.input
annotation = snakemake.input.get("gff", "")
paladin_index = str(snakemake.output)
reference_type = snakemake.params.get("reference_type", "3")
assert int(reference_type) in [1, 2, 3, 4]
ref_type_cmd = "-r" + str(reference_type)
output_base = paladin_index.rsplit(".bwt")[0]
shell("cp {input_assembly} {output_base}")
shell("paladin index {ref_type_cmd} {output_base} {annotation} {extra} {log}")
shell("rm -f {output_base}")
PALADIN PREPARE¶
Download and prepare uniprot refs for paladin mapping. PALADIN is a protein sequence alignment tool designed for the accurate functional characterization of metagenomes.
Example¶
This wrapper can be used in the following way:
rule paladin_prepare:
output:
"uniprot_sprot.fasta.gz",
"uniprot_sprot.fasta.gz.pro"
log:
"logs/paladin/prepare_sprot.log"
params:
reference_type=1, # 1=swiss-prot, 2=uniref90
wrapper:
"v2.2.1/bio/paladin/prepare"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
paladin=1.4.6
samtools=1.17
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Paladin Prepare"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
reference_type = snakemake.params.get(
"reference_type", "1"
) # download swissprot as default
assert int(reference_type) in [1, 2]
ref_type_cmd = "-r" + str(reference_type)
shell("paladin prepare {ref_type_cmd} {extra} {log}")
PANDORA¶
For pandora, the following wrappers are available:
PANDORA INDEX¶
Index population reference graph (PRG) sequences.
URL: https://github.com/rmcolq/pandora/wiki/Usage#build-index
Example¶
This wrapper can be used in the following way:
rule pandora_index:
input:
"{gene}/prg.fa",
output:
index="{gene}/prg.fa.k15.w14.idx",
kmer_prgs=directory("{gene}/kmer_prgs"),
log:
"pandora_index/{gene}.log",
params:
options="-v -k 15 -w 14",
threads: 1
wrapper:
"v2.2.1/bio/pandora/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pandora=0.9.1
Input/Output¶
Input:
- A PRG file (made by make_prg <https://github.com/iqbal-lab-org/make_prg>) to index
Output:
index
: A pandora index filekmer_prgs
: A directory of the index kmer PRGs in GFA format
Params¶
options
: Any options other than threads (see docs)
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for indexing population reference graph (PRG) sequences with
pandora
"""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2021, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=False)
options = snakemake.params.get("options", "")
shell("pandora index -t {snakemake.threads} {options} {snakemake.input} {log}")
PBMM2¶
For pbmm2, the following wrappers are available:
PBMM2 ALIGN¶
Align reads using pbmm2, a minimap2 SMRT wrapper for PacBio data https://github.com/PacificBiosciences/pbmm2/
Example¶
This wrapper can be used in the following way:
rule pbmm2_align:
input:
reference="target/{reference}.fasta", # can be either genome index or genome fasta
query="{query}.bam", # can be either unaligned bam, fastq, or fasta
output:
bam="aligned/{query}.{reference}.bam",
index="aligned/{query}.{reference}.bam.bai",
log:
"logs/pbmm2_align/{query}.{reference}.log",
params:
preset="CCS", # SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED
sample="", # sample name for @RG header
extra="--sort", # optional additional args
loglevel="INFO",
threads: 12
wrapper:
"v2.2.1/bio/pbmm2/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pbmm2=1.10.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
tmp_root = snakemake.params.get("tmp_root", None)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory(dir=tmp_root) as tmp_dir:
shell(
"""
(TMPDIR={tmp_dir}; \
pbmm2 align --num-threads {snakemake.threads} \
--preset {snakemake.params.preset} \
--sample {snakemake.params.sample} \
--log-level {snakemake.params.loglevel} \
{extra} \
{snakemake.input.reference} \
{snakemake.input.query} \
{snakemake.output.bam}) {log}
"""
)
PBMM2 INDEX¶
Indexes a reference using pbmm2, a minimap2 SMRT wrapper for PacBio data https://github.com/PacificBiosciences/pbmm2/
Example¶
This wrapper can be used in the following way:
rule pbmm2_index:
input:
reference="target/{reference}.fasta",
output:
"target/{reference}.mmi",
log:
"logs/pbmm2_index/{reference}.log",
params:
preset="CCS", # SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED
extra="", # optional additional args
threads: 8
wrapper:
"v2.2.1/bio/pbmm2/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pbmm2=1.12.0
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"""
(pbmm2 index \
--num-threads {snakemake.threads} \
--preset {snakemake.params.preset} \
--log-level DEBUG \
{extra} \
{snakemake.input.reference} {snakemake.output}) {log}
"""
)
PEAR¶
PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger
Example¶
This wrapper can be used in the following way:
rule pear_merge:
input:
read1="reads/reads.left.fq.gz",
read2="reads/reads.right.fq.gz"
output:
assembled="pear/reads_pear_assembled.fq.gz",
discarded="pear/reads_pear_discarded.fq.gz",
unassembled_read1="pear/reads_pear_unassembled_r1.fq.gz",
unassembled_read2="pear/reads_pear_unassembled_r2.fq.gz",
log:
'logs/pear.log'
params:
pval=".01",
extra=""
threads: 4
resources:
mem_mb=4000 # define amount of memory to be used by pear
wrapper:
"v2.2.1/bio/pear"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pear=0.9.6
Authors¶
- Tessa Pierce
Code¶
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
r1 = snakemake.input.get("read1")
r2 = snakemake.input.get("read2")
assert r1 is not None and r2 is not None, "r1 and r2 files are required as input"
assembled = snakemake.output.get("assembled")
assert assembled is not None, "require 'assembled' outfile"
gzip = True if assembled.endswith(".gz") else False
out_base, out_end = assembled.rsplit(".f")
out_end = ".f" + out_end
df_assembled = out_base + ".assembled.fastq"
df_discarded = out_base + ".discarded.fastq"
df_unassembled_r1 = out_base + ".unassembled.forward.fastq"
df_unassembled_r2 = out_base + ".unassembled.reverse.fastq"
df_outputs = [df_assembled, df_discarded, df_unassembled_r1, df_unassembled_r2]
discarded = snakemake.output.get("discarded", out_base + ".discarded" + out_end)
unassembled_r1 = snakemake.output.get(
"unassembled_read1", out_base + ".unassembled_r1" + out_end
)
unassembled_r2 = snakemake.output.get(
"unassembled_read2", out_base + ".unassembled_r2" + out_end
)
final_outputs = [assembled, discarded, unassembled_r1, unassembled_r2]
def move_files(in_list, out_list, gzip):
for f, o in zip(in_list, out_list):
if f != o:
if gzip:
shell("gzip -9 -c {f} > {o}")
shell("rm -f {f}")
else:
shell("cp {f} {o}")
shell("rm -f {f}")
elif gzip:
shell("gzip -9 {f}")
pval = float(snakemake.params.get("pval", ".01"))
max_mem = snakemake.resources.get("mem_mb", "4000")
extra = snakemake.params.get("extra", "")
shell(
"pear -f {r1} -r {r2} -p {pval} -j {snakemake.threads} -y {max_mem} {extra} -o {out_base} {log}"
)
move_files(df_outputs, final_outputs, gzip)
PICARD¶
For picard, the following wrappers are available:
PICARD ADDORREPLACEREADGROUPS¶
Add or replace read groups with picard tools.
Example¶
This wrapper can be used in the following way:
rule replace_rg:
input:
"mapped/{sample}.bam",
output:
"fixed-rg/{sample}.bam",
log:
"logs/picard/replace_rg/{sample}.log",
params:
extra="--RGLB lib1 --RGPL illumina --RGPU {sample} --RGSM {sample}",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/addorreplacereadgroups"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard AddOrReplaceReadGroups"
" {java_opts} {extra}"
" --INPUT {snakemake.input}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output}"
" {log}"
)
PICARD BEDTOINTERVALLIST¶
picard BedToIntervalList converts a BED file to Picard Interval List format.
Example¶
This wrapper can be used in the following way:
rule bed_to_interval_list:
input:
bed="resources/a.bed",
dict="resources/genome.dict",
output:
"a.interval_list",
log:
"logs/picard/bedtointervallist/a.log",
params:
# optional parameters
extra="--SORT true", # sort output interval list before writing
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/bedtointervallist"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#BedToIntervalList
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
bed
: region filedict
: genome dictionary file (from samtools dict or picard CreateSequenceDictionary )
Output:
- interval_list Picard format
Authors¶
- Fabian Kilpert
Code¶
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard BedToIntervalList"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bed}"
" --SEQUENCE_DICTIONARY {snakemake.input.dict}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output}"
" {log}"
)
PICARD COLLECTALIGNMENTSUMMARYMETRICS¶
Collect metrics on aligned reads with picard tools.
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
ref="genome.fasta",
bam="mapped/{sample}.bam",
output:
"stats/{sample}.summary.txt",
log:
"logs/picard/alignment-summary/{sample}.log",
params:
# optional parameters (e.g. relax checks as below)
extra="--VALIDATION_STRINGENCY LENIENT --METRIC_ACCUMULATION_LEVEL null --METRIC_ACCUMULATION_LEVEL SAMPLE",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/collectalignmentsummarymetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#CollectAlignmentSummaryMetrics
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectAlignmentSummaryMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" --REFERENCE_SEQUENCE {snakemake.input.ref}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" {log}"
)
PICARD COLLECTGCBIASMETRICS¶
Run picard CollectGcBiasMetrics to generate QC metrics pertaining to GC bias.
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
# BAM aligned to reference genome
bam="mapped/a.bam",
# reference genome FASTA from which GC-context is inferred
ref="genome.fasta",
output:
metrics="results/a.gcmetrics.txt",
chart="results/a.gc.pdf",
summary="results/a.summary.txt",
params:
# optional additional parameters, for example,
extra="--MINIMUM_GENOME_FRACTION 1E-5",
log:
"logs/picard/a.gcmetrics.log",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/collectgcbiasmetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information, see https://broadinstitute.github.io/picard/command-line-overview.html#CollectGcBiasMetrics
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- BAM file of RNA-seq data aligned to genome
- REF_FLAT formatted file of transcriptome annotations
Output:
- GC metrics text file
- GC metrics PDF figure
- GC summary metrics text file
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectGcBiasMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.metrics}"
" --CHART {snakemake.output.chart}"
" --SUMMARY_OUTPUT {snakemake.output.summary}"
" --REFERENCE_SEQUENCE {snakemake.input.ref}"
" {log}"
)
PICARD COLLECTHSMETRICS¶
Collects hybrid-selection (HS) metrics for a SAM or BAM file using picard.
Example¶
This wrapper can be used in the following way:
rule picard_collect_hs_metrics:
input:
bam="mapped/{sample}.bam",
reference="genome.fasta",
# Baits and targets should be given as interval lists. These can
# be generated from bed files using picard BedToIntervalList.
bait_intervals="regions.intervals",
target_intervals="regions.intervals",
output:
"stats/hs_metrics/{sample}.txt",
params:
# Optional extra arguments. Here we reduce sample size
# to reduce the runtime in our unit test.
extra="--SAMPLE_SIZE 1000",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
log:
"logs/picard_collect_hs_metrics/{sample}.log",
wrapper:
"v2.2.1/bio/picard/collecthsmetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#CollectHSMetrics
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard CollectHSMetrics."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectHsMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" --REFERENCE_SEQUENCE {snakemake.input.reference}"
" --BAIT_INTERVALS {snakemake.input.bait_intervals}"
" --TARGET_INTERVALS {snakemake.input.target_intervals}"
" {log}"
)
PICARD COLLECTINSERTSIZEMETRICS¶
Collect metrics on insert size of paired end reads with picard tools.
Example¶
This wrapper can be used in the following way:
rule insert_size:
input:
"mapped/{sample}.bam",
output:
txt="stats/{sample}.isize.txt",
pdf="stats/{sample}.isize.pdf",
log:
"logs/picard/insert_size/{sample}.log",
params:
# optional parameters (e.g. relax checks as below)
extra="--VALIDATION_STRINGENCY LENIENT --METRIC_ACCUMULATION_LEVEL null --METRIC_ACCUMULATION_LEVEL SAMPLE",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/collectinsertsizemetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#CollectInsertSizeMetrics
Software dependencies¶
picard=3.0.0
r-base=4.3.0
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- bam file
Output:
txt
: textual representation of metricspdf
: insert size histogram
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectInsertSizeMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output.txt}"
" --Histogram_FILE {snakemake.output.pdf}"
" {log}"
)
PICARD COLLECTMULTIPLEMETRICS¶
A picard
meta-metrics tool that collects multiple classes of metrics.
You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below).
Tool Extension(s) for the output files CollectAlignmentSummaryMetrics .alignment_summary_metrics CollectInsertSizeMetrics .insert_size_metrics,
.insert_size_histogram.pdf
QualityScoreDistribution .quality_distribution_metrics,
.quality_distribution.pdf
MeanQualityByCycle .quality_by_cycle_metrics,
.quality_by_cycle.pdf
CollectBaseDistributionByCycle .base_distribution_by_cycle_metrics,
.base_distribution_by_cycle.pdf
CollectGcBiasMetrics .gc_bias.detail_metrics,
.gc_bias.summary_metrics,
.gc_bias.pdf
RnaSeqMetrics .rna_metrics CollectSequencingArtifactMetrics .bait_bias_detail_metrics,
.bait_bias_summary_metrics,
.error_summary_metrics,
.pre_adapter_detail_metrics,
.pre_adapter_summary_metrics
CollectQualityYieldMetrics .quality_yield_metrics
URL: https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics
Example¶
This wrapper can be used in the following way:
rule collect_multiple_metrics:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta",
output:
# Through the output file extensions the different tools for the metrics can be selected
# so that it is not necessary to specify them under params with the "PROGRAM" option.
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
multiext(
"stats/{sample}",
".alignment_summary_metrics",
".insert_size_metrics",
".insert_size_histogram.pdf",
".quality_distribution_metrics",
".quality_distribution.pdf",
".quality_by_cycle_metrics",
".quality_by_cycle.pdf",
".base_distribution_by_cycle_metrics",
".base_distribution_by_cycle.pdf",
".gc_bias.detail_metrics",
".gc_bias.summary_metrics",
".gc_bias.pdf",
".rna_metrics",
".bait_bias_detail_metrics",
".bait_bias_summary_metrics",
".error_summary_metrics",
".pre_adapter_detail_metrics",
".pre_adapter_summary_metrics",
".quality_yield_metrics",
),
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=4096,
log:
"logs/picard/multiple_metrics/{sample}.log",
params:
# optional parameters
# REF_FLAT is required if RnaSeqMetrics are used
extra="--VALIDATION_STRINGENCY LENIENT --METRIC_ACCUMULATION_LEVEL null --METRIC_ACCUMULATION_LEVEL SAMPLE --REF_FLAT ref_flat.txt",
wrapper:
"v2.2.1/bio/picard/collectmultiplemetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. -XX:ParallelGCThreads=10 (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- BAM file (.bam)
- FASTA reference sequence file (.fasta or .fa)
Output:
- multiple metrics text files (_metrics) AND
- multiple metrics pdf files (.pdf)
- the appropriate extensions for the output files must be used depending on the desired tools
Authors¶
- David Laehnemann
- Antonie Vietor
- Filipe G. Vieira
Code¶
__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
exts_to_prog = {
".alignment_summary_metrics": "CollectAlignmentSummaryMetrics",
".insert_size_metrics": "CollectInsertSizeMetrics",
".insert_size_histogram.pdf": "CollectInsertSizeMetrics",
".quality_distribution_metrics": "QualityScoreDistribution",
".quality_distribution.pdf": "QualityScoreDistribution",
".quality_by_cycle_metrics": "MeanQualityByCycle",
".quality_by_cycle.pdf": "MeanQualityByCycle",
".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle",
".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle",
".gc_bias.detail_metrics": "CollectGcBiasMetrics",
".gc_bias.summary_metrics": "CollectGcBiasMetrics",
".gc_bias.pdf": "CollectGcBiasMetrics",
".rna_metrics": "RnaSeqMetrics",
".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics",
".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics",
".error_summary_metrics": "CollectSequencingArtifactMetrics",
".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics",
".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics",
".quality_yield_metrics": "CollectQualityYieldMetrics",
}
# Select programs to run from output files
progs = set()
for file in snakemake.output:
matched = False
for ext in exts_to_prog:
if file.endswith(ext):
progs.add(exts_to_prog[ext])
matched = True
if not matched:
raise ValueError(
"Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
)
programs = "--PROGRAM null --PROGRAM " + " --PROGRAM ".join(progs)
# Infer common output prefix
output_file = str(snakemake.output[0])
for ext in exts_to_prog:
if output_file.endswith(ext):
out = output_file[: -len(ext)]
break
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectMultipleMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {out}"
" --REFERENCE_SEQUENCE {snakemake.input.ref}"
" {programs}"
" {log}"
)
# Under some circumstances, some picard programs might not produce an output (https://github.com/snakemake/snakemake-wrappers/issues/357). To avoid snakemake errors, the output files of those programs are created empty (if they do not exist).
for ext in [
ext for ext, prog in exts_to_prog.items() if prog in ["CollectInsertSizeMetrics"]
]:
for file in snakemake.output:
if file.endswith(ext) and not Path(file).is_file():
Path(file).touch()
PICARD COLLECTRNASEQMETRICS¶
Run picard CollectRnaSeqMetrics to generate QC metrics for RNA-seq data.
URL: https://broadinstitute.github.io/picard/command-line-overview.html#CollectRnaSeqMetrics
Example¶
This wrapper can be used in the following way:
rule alignment_summary:
input:
# BAM aligned, splicing-aware, to reference genome
bam="mapped/a.bam",
# Reference genome
#ref="ref.fasta",
# Annotation file containing transcript, gene, and exon data
refflat="annotation.refFlat",
output:
"results/a.rnaseq_metrics.txt",
params:
# strand is optional (defaults to NONE) and pertains to the library preparation
# options are FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND, and NONE
strand="NONE",
# optional additional parameters, for example,
extra="--VALIDATION_STRINGENCY STRICT",
log:
"logs/picard/rnaseq-metrics/a.log",
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/collectrnaseqmetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.2
Input/Output¶
Input:
- BAM file of RNA-seq data aligned to genome
- REF_FLAT formatted file of transcriptome annotations
- reference FASTA (optional)
Output:
- RNA-Seq metrics text file
Authors¶
- Brett Copeland
- Filipe G. Vieira
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
strand = snakemake.params.get("strand", "NONE")
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
ref = snakemake.input.get("ref", "")
if ref:
ref = f"--REFERENCE_SEQUENCE {ref}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectRnaSeqMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" {ref}"
" --REF_FLAT {snakemake.input.refflat}"
" --STRAND_SPECIFICITY {strand}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output}"
" {log}"
)
PICARD COLLECTTARGETEDPCRMETRICS¶
Collect metric information for target pcr metrics runs, with picard tools.
Example¶
This wrapper can be used in the following way:
rule CollectTargetedPcrMetrics:
input:
bam="mapped/{sample}.bam",
amplicon_intervals="amplicon.interval_list",
target_intervals="target.interval_list",
output:
"stats/{sample}.pcr.txt",
log:
"logs/picard/collecttargetedpcrmetrics/{sample}.log",
params:
extra="",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/collecttargetedpcrmetrics"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#CollectTargetedPcrMetrics
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@mail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CollectTargetedPcrMetrics"
" {java_opts} {extra}"
" --INPUT {snakemake.input.bam}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" --AMPLICON_INTERVALS {snakemake.input.amplicon_intervals}"
" --TARGET_INTERVALS {snakemake.input.target_intervals}"
" {log}"
)
PICARD CREATESEQUENCEDICTIONARY¶
Create a .dict file for a given FASTA file
Example¶
This wrapper can be used in the following way:
rule create_dict:
input:
"genome.fasta",
output:
"genome.dict",
log:
"logs/picard/create_dict.log",
params:
extra="", # optional: extra arguments for picard.
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/createsequencedictionary"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#CreateSequenceDictionary
Software dependencies¶
picard=2.27.4
snakemake-wrapper-utils=0.5.2
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard CreateSequenceDictionary"
" {java_opts} {extra}"
" --REFERENCE {snakemake.input[0]}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" {log}"
)
PICARD MARKDUPLICATES¶
Mark PCR and optical duplicates with picard tools.
URL: https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates
Example¶
This wrapper can be used in the following way:
rule markduplicates_bam:
input:
bams="mapped/{sample}.bam",
# optional to specify a list of BAMs; this has the same effect
# of marking duplicates on separate read groups for a sample
# and then merging
output:
bam="dedup_bam/{sample}.bam",
metrics="dedup_bam/{sample}.metrics.txt",
log:
"logs/dedup_bam/{sample}.log",
params:
extra="--REMOVE_DUPLICATES true",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/markduplicates"
use rule markduplicates_bam as markduplicateswithmatecigar_bam with:
output:
bam="dedup_bam/{sample}.matecigar.bam",
idx="dedup_bam/{sample}.mcigar.bai",
metrics="dedup_bam/{sample}.matecigar.metrics.txt",
log:
"logs/dedup_bam/{sample}.matecigar.log",
params:
withmatecigar=True,
extra="--REMOVE_DUPLICATES true",
use rule markduplicates_bam as markduplicates_sam with:
output:
bam="dedup_sam/{sample}.sam",
metrics="dedup_sam/{sample}.metrics.txt",
log:
"logs/dedup_sam/{sample}.log",
params:
extra="--REMOVE_DUPLICATES true",
use rule markduplicates_bam as markduplicates_cram with:
input:
bams="mapped/{sample}.bam",
ref="ref/genome.fasta",
output:
bam="dedup_cram/{sample}.cram",
idx="dedup_cram/{sample}.cram.crai",
metrics="dedup_cram/{sample}.metrics.txt",
log:
"logs/dedup_cram/{sample}.log",
params:
extra="--REMOVE_DUPLICATES true",
embed_ref=True, # set true if the fasta reference should be embedded into the cram
withmatecigar=False,
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- –TMP_DIR is automatically set by resources.tmpdir
Software dependencies¶
picard=3.0.0
samtools=1.17
snakemake-wrapper-utils=0.6.1
Params¶
java_opts
: allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: allows for additional program arguments.embed_ref
: allows to embed the fasta reference into the cramwithmatecigar
: allows to run MarkDuplicatesWithMateCigar instead.
Authors¶
- Johannes Köster
- Christopher Schröder
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster, Christopher Schröder"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
from snakemake_wrapper_utils.samtools import get_samtools_opts, infer_out_format
log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
# the --SORTING_COLLECTION_SIZE_RATIO default of 0.25 might
# indicate a JVM memory overhead of that proportion
java_opts = get_java_opts(snakemake, java_mem_overhead_factor=0.3)
samtools_opts = get_samtools_opts(snakemake)
tool = "MarkDuplicates"
if snakemake.params.get("withmatecigar", False):
tool = "MarkDuplicatesWithMateCigar"
bams = snakemake.input.bams
if isinstance(bams, str):
bams = [bams]
bams = list(map("--INPUT {}".format, bams))
output = snakemake.output.bam
output_fmt = infer_out_format(output)
convert = ""
if output_fmt == "CRAM":
output = "/dev/stdout"
# NOTE: output format inference should be done by snakemake-wrapper-utils. Keeping it here for backwards compatibility.
if snakemake.params.get("embed_ref", False):
samtools_opts += ",embed_ref"
convert = f" | samtools view {samtools_opts}"
elif output_fmt == "BAM" and snakemake.output.get("idx"):
extra += " --CREATE_INDEX"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(picard {tool}" # Tool and its subcommand
" {java_opts}" # Automatic java option
" {extra}" # User defined parmeters
" {bams}" # Input bam(s)
" --TMP_DIR {tmpdir}"
" --OUTPUT {output}" # Output bam
" --METRICS_FILE {snakemake.output.metrics}" # Output metrics
" {convert}) {log}" # Logging
)
output_prefix = Path(snakemake.output.bam).with_suffix("")
if snakemake.output.get("idx"):
if output_fmt == "BAM" and snakemake.output.idx != str(output_prefix) + ".bai":
shell("mv {output_prefix}.bai {snakemake.output.idx}")
PICARD MERGESAMFILES¶
Merge sam/bam files using picard tools.
Example¶
This wrapper can be used in the following way:
rule merge_bams:
input:
expand("mapped/{sample}.bam", sample=["a", "b"]),
output:
"merged.bam",
log:
"logs/picard_mergesamfiles.log",
params:
extra="--VALIDATION_STRINGENCY LENIENT",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/mergesamfiles"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#MergeSamFiles
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
inputs = " ".join("--INPUT {}".format(in_) for in_ in snakemake.input)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard MergeSamFiles"
" {java_opts} {extra}"
" {inputs}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" {log}"
)
PICARD MERGEVCFS¶
Merge vcf files using picard tools.
Example¶
This wrapper can be used in the following way:
rule merge_vcfs:
input:
vcfs=["snvs.chr1.vcf", "snvs.chr2.vcf"],
output:
"snvs.vcf",
log:
"logs/picard/mergevcfs.log",
params:
extra="",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/mergevcfs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#MergeVcfs
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.6.1
Authors¶
- Johannes Köster
Code¶
"""Snakemake wrapper for picard MergeSamFiles."""
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2018, Johannes Köster"
__email__ = "johannes.koester@protonmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
inputs = " ".join("--INPUT {}".format(f) for f in snakemake.input.vcfs)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard MergeVcfs"
" {java_opts} {extra}"
" {inputs}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" {log}"
)
PICARD REVERTSAM¶
Reverts SAM or BAM files to a previous state. .
Example¶
This wrapper can be used in the following way:
rule revert_bam:
input:
"mapped/{sample}.bam",
output:
"revert/{sample}.bam",
log:
"logs/picard/revert_sam/{sample}.log",
params:
extra="--SANITIZE true", # optional: Extra arguments for picard.
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/revertsam"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#RevertSam
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard RevertSam."""
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard RevertSam"
" {java_opts} {extra}"
" --INPUT {snakemake.input[0]}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" {log}"
)
PICARD SAMTOFASTQ¶
Converts a SAM or BAM file to FASTQ.
Example¶
This wrapper can be used in the following way:
rule bam_to_fastq:
input:
"mapped/{sample}.bam",
output:
fastq1="reads/{sample}.R1.fastq",
fastq2="reads/{sample}.R2.fastq",
log:
"logs/picard/sam_to_fastq/{sample}.log",
params:
extra="", # optional: Extra arguments for picard.
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/samtofastq"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fastq1 = snakemake.output.fastq1
fastq2 = snakemake.output.get("fastq2", None)
fastq_unpaired = snakemake.output.get("unpaired_fastq", None)
if not isinstance(fastq1, str):
raise ValueError("f1 needs to be provided")
output = f"--FASTQ {fastq1}"
if isinstance(fastq2, str):
output += f" --SECOND_END_FASTQ {fastq2}"
if isinstance(fastq_unpaired, str):
if not isinstance(fastq2, str):
raise ValueError("f2 is required if fastq_unpaired is set")
output += f" --UNPAIRED_FASTQ {fastq_unpaired}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard SamToFastq"
" {java_opts} {extra}"
" --INPUT {snakemake.input[0]}"
" --TMP_DIR {tmpdir}"
" {output}"
" {log}"
)
PICARD SORTSAM¶
Sort sam/bam files using picard tools.
Example¶
This wrapper can be used in the following way:
rule sort_bam:
input:
"mapped/{sample}.bam",
output:
"sorted/{sample}.bam",
log:
"logs/picard/sort_sam/{sample}.log",
params:
sort_order="coordinate",
extra="--VALIDATION_STRINGENCY LENIENT", # optional: Extra arguments for picard.
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/sortsam"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The java_opts param allows for additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).
- The extra param allows for additional program arguments.
- –TMP_DIR is automatically set by resources.tmpdir
- For more information see, https://broadinstitute.github.io/picard/command-line-overview.html#SortSam
Software dependencies¶
picard=3.0.0
snakemake-wrapper-utils=0.5.3
Authors¶
- Julian de Ruiter
Code¶
"""Snakemake wrapper for picard SortSam."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
sort_order = snakemake.params.get("sort_order", "coordinate")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"picard SortSam"
" {java_opts} {extra}"
" --INPUT {snakemake.input[0]}"
" --TMP_DIR {tmpdir}"
" --OUTPUT {snakemake.output[0]}"
" --SORT_ORDER {sort_order}"
" {log}"
)
PINDEL¶
For pindel, the following wrappers are available:
PINDEL¶
Call variants with pindel.
URL: https://gmt.genome.wustl.edu/packages/pindel
Example¶
This wrapper can be used in the following way:
pindel_types = ["D", "BP", "INV", "TD", "LI", "SI", "RP"]
rule pindel:
input:
ref="genome.fasta",
# samples to call
samples=["mapped/a.bam"],
# bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
config="pindel_config.txt",
output:
expand("pindel/all_{type}", type=pindel_types),
params:
extra="", # optional parameters (except -i, -f, -o, -j, -J)
log:
"logs/pindel.log",
threads: 4
wrapper:
"v2.2.1/bio/pindel/call"
rule pindel_include_regions:
input:
ref="genome.fasta",
samples=["mapped/a.bam"],
config="pindel_config.txt",
include_bed="regions.bed",
output:
expand("pindel/all_included_{type}", type=pindel_types),
log:
"logs/pindel_j.log",
threads: 4
wrapper:
"v2.2.1/bio/pindel/call"
rule pindel_exclude_regions:
input:
ref="genome.fasta",
samples=["mapped/a.bam"],
config="pindel_config.txt",
exclude_bed="regions.bed",
output:
expand("pindel/all_excluded_{type}", type=pindel_types),
log:
"logs/pindel_include.log",
threads: 4
wrapper:
"v2.2.1/bio/pindel/call"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
The include and exclude BED file arguments are incompatible with each other. Either supply one of them or none of them.
Software dependencies¶
pindel=0.2.5b9
Input/Output¶
Input:
- reference genome fasta file
- one or more bam files
- bam configuration file, see http://gmt.genome.wustl.edu/packages/pindel/quick-start.html
- bed file of regions to include (optional)
- bed file of regions to exclude (optional)
Output:
- One file for each variant type. For a more detailed description of the output format, see https://gmt.genome.wustl.edu/packages/pindel/user-manual.html#example-output-record.
Authors¶
- Johannes Köster, Niklas Mähler
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
include_bed = snakemake.input.get("include_bed", "")
exclude_bed = snakemake.input.get("exclude_bed", "")
if include_bed and exclude_bed:
raise Exception("supply either include_bed or exclude_bed, not both")
if include_bed:
include_bed = f"-j {include_bed}"
if exclude_bed:
exclude_bed = f"-J {exclude_bed}"
output_prefix = snakemake.output[0].rsplit("_", 1)[0]
shell(
"pindel "
"-T {snakemake.threads} "
"{extra} "
"{include_bed} "
"{exclude_bed} "
"-i {snakemake.input.config} "
"-f {snakemake.input.ref} "
"-o {output_prefix} {log}"
)
PINDEL2VCF¶
Convert pindel output to vcf.
Example¶
This wrapper can be used in the following way:
rule pindel2vcf:
input:
ref="genome.fasta",
pindel="pindel/all_{type}"
output:
"pindel/all_{type}.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.{type}.log"
wrapper:
"v2.2.1/bio/pindel/pindel2vcf"
rule pindel2vcf_multi_input:
input:
ref="genome.fasta",
pindel=["pindel/all_D", "pindel/all_INV"]
output:
"pindel/all.vcf"
params:
refname="hg38", # mandatory, see pindel manual
refdate="20170110", # mandatory, see pindel manual
extra="" # extra params (except -r, -p, -R, -d, -v)
log:
"logs/pindel/pindel2vcf.log"
wrapper:
"v2.2.1/bio/pindel/pindel2vcf"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pindel=0.2.5b9
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster, Patrik Smeds"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
expected_endings = [
"INT",
"D",
"SI",
"INV",
"INV_final",
"TD",
"LI",
"BP",
"CloseEndMapped",
"RP",
]
def split_file_name(file_parts, file_ending_index):
return (
"_".join(file_parts[:file_ending_index]),
"_".join(file_parts[file_ending_index:]),
)
def process_input_path(input_file):
"""
:params input_file: Input file from rule, ex /path/to/file/all_D or /path/to/file/all_INV_final
:return: ""/path/to/file", "all"
"""
file_path, file_name = os.path.split(input_file)
file_parts = file_name.split("_")
# seperate ending and name, to name: all ending: D or name: all ending: INV_final
file_name, file_ending = split_file_name(
file_parts, -2 if file_name.endswith("_final") else -1
)
if not file_ending in expected_endings:
raise Exception("Unexpected variant type: " + file_ending)
return file_path, file_name
with tempfile.TemporaryDirectory() as tmpdirname:
input_flag = "-p"
input_file = snakemake.input.get("pindel")
if isinstance(input_file, list) and len(input_file) > 1:
input_flag = "-P"
input_path, input_name = process_input_path(input_file[0])
input_file = os.path.join(input_path, input_name)
for variant_input in snakemake.input.pindel:
if not variant_input.startswith(input_file):
raise Exception(
"Unable to extract common path from multi file input, expect path is: "
+ input_file
)
if not os.path.isfile(variant_input):
raise Exception('Input "' + input_file + '" is not a file!')
os.symlink(
os.path.abspath(variant_input),
os.path.join(tmpdirname, os.path.basename(variant_input)),
)
input_file = os.path.join(tmpdirname, input_name)
shell(
"pindel2vcf {snakemake.params.extra} {input_flag} {input_file} -r {snakemake.input.ref} -R {snakemake.params.refname} -d {snakemake.params.refdate} -v {snakemake.output[0]} {log}"
)
PLASS¶
Plass (Protein-Level ASSembler) is software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.
Example¶
This wrapper can be used in the following way:
rule plass_paired:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"]
output:
"plass/prot.fasta"
log:
"logs/plass.log"
params:
extra=""
threads: 4
wrapper:
"v2.2.1/bio/plass"
rule plass_single:
input:
single=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
output:
"plass/prot_single.fasta"
log:
"logs/plass_single.log"
params:
extra=""
threads: 4
wrapper:
"v2.2.1/bio/plass"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
plass=4.687d7
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for PLASS Protein-Level Assembler."""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2018, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
# allow multiple input files for single assembly
left = snakemake.input.get("left")
single = snakemake.input.get("single")
assert (
left is not None or single is not None
), "please check read inputs: either left/right or single read file inputs are required"
if left:
left = (
[snakemake.input.left]
if isinstance(snakemake.input.left, str)
else snakemake.input.left
)
right = snakemake.input.get("right")
assert (
right is not None
), "please input 'right' reads or specify that the reads are 'single'"
right = (
[snakemake.input.right]
if isinstance(snakemake.input.right, str)
else snakemake.input.right
)
assert len(left) == len(
right
), "left input needs to contain the same number of files as the right input"
input_str_left = " " + " ".join(left)
input_str_right = " " + " ".join(right)
input_cmd = input_str_left + " " + input_str_right
else:
single = (
[snakemake.input.single]
if isinstance(snakemake.input.single, str)
else snakemake.input.single
)
input_cmd = " " + " ".join(single)
outdir = path.dirname(snakemake.output[0])
tmpdir = path.join(outdir, "tmp")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"plass assemble {input_cmd} {snakemake.output} {tmpdir} --threads {snakemake.threads} {snakemake.params.extra} {log}"
)
PRESEQ¶
For preseq, the following wrappers are available:
PRESEQ LC_EXTRAP¶
preseq
estimates the library complexity of existing sequencing data to then estimate the yield of future experiments based on their design.
URL: https://github.com/smithlabcode/preseq
Example¶
This wrapper can be used in the following way:
rule preseq_lc_extrap_bam:
input:
"samples/{sample}.sorted.bam"
output:
"test_bam/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bam/{sample}.log"
wrapper:
"v2.2.1/bio/preseq/lc_extrap"
rule preseq_lc_extrap_bed:
input:
"samples/{sample}.sorted.bed"
output:
"test_bed/{sample}.lc_extrap"
params:
"-v" #optional parameters
log:
"logs/test_bed/{sample}.log"
wrapper:
"v2.2.1/bio/preseq/lc_extrap"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
preseq=3.2.0
Input/Output¶
Input:
- bed files containing duplicates and sorted by chromosome, start position, strand position and finally strand OR
- bam files containing duplicates and sorted by using bamtools or samtools sort.
Output:
- lc_extrap (.lc_extrap)
Authors¶
- Antonie Vietor
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
params = ""
if (os.path.splitext(snakemake.input[0])[-1]) == ".bam":
if "-bam" not in (snakemake.input[0]):
params = "-bam "
shell(
"(preseq lc_extrap {params} {snakemake.params} {snakemake.input[0]} -output {snakemake.output[0]}) {log}"
)
PRETEXT¶
For pretext, the following wrappers are available:
PRETEXT MAP¶
Embeds bedgraph data into Pretext contact maps.
URL: https://github.com/wtsi-hpag/PretextGraph
Example¶
This wrapper can be used in the following way:
rule pretext_graph:
input:
bedgraph="{a}.bedgraph",
map="map.pretext",
output:
"{a}.pretext",
log:
"logs/{a}.pretext_graph.log",
params:
graph_name="graph_name",
extra="",
wrapper:
"v2.2.1/bio/pretext/graph"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments.
Software dependencies¶
pretextgraph=0.0.6
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
if snakemake.input[0].endswith(".gz"):
pipe = "gunzip -c"
elif snakemake.input[0].endswith(".bz2"):
pipe = "bunzip2 -c"
else:
pipe = "cat"
shell(
"({pipe}"
" {snakemake.input.bedgraph} | "
"PretextGraph"
" -i {snakemake.input.map}"
" -n {snakemake.params.graph_name}"
" {extra}"
" -o {snakemake.output}"
") {log}"
)
PRETEXT MAP¶
Paired REad TEXTure Mapper. Converts SAM formatted read pairs into genome contact maps.
URL: https://github.com/wtsi-hpag/PretextMap
Example¶
This wrapper can be used in the following way:
rule pretext_map:
input:
"a.bam",
output:
"map.pretext",
log:
"logs/pretext_map.log",
params:
extra="--sortby length --sortorder descend --mapq 10",
wrapper:
"v2.2.1/bio/pretext/map"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional arguments.
Software dependencies¶
pretextmap=0.1.9
samtools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import infer_out_format
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
if snakemake.input[0].endswith(".gz"):
pipe = "gunzip -c"
elif snakemake.input[0].endswith(".bz2"):
pipe = "bunzip2 -c"
elif infer_out_format(snakemake.input[0]) in ["SAM", "BAM", "CRAM"]:
pipe = "samtools view -h"
else:
pipe = "cat"
shell(
"({pipe}"
" {snakemake.input} | "
"PretextMap"
" {extra}"
" -o {snakemake.output}"
") {log}"
)
PRETEXT MAP¶
Commandline image generator for Pretext contact maps.
URL: https://github.com/wtsi-hpag/PretextSnapshot
Example¶
This wrapper can be used in the following way:
rule pretext_snapshot_png:
input:
"map.pretext",
output:
all=directory("all_maps/"),
full="full_map.png",
log:
"logs/pretext_snapshot_png.log",
params:
extra="--resolution 1080",
wrapper:
"v2.2.1/bio/pretext/snapshot"
rule pretext_snapshot_jpg:
input:
"map.pretext",
output:
all=directory("all_maps/"),
full="full_map.jpg",
log:
"logs/pretext_snapshot_jpg.log",
params:
extra="--resolution 1080",
wrapper:
"v2.2.1/bio/pretext/snapshot"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Outputs must all have the same format (PNG/BMP/JPG).
- The extra param allows for additional arguments.
Software dependencies¶
pretextsnapshot=0.0.4
Input/Output¶
Input:
- pretext contact map
Output:
- full image (mandatory)
- all images (optional)
- specific sequences (optional)
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
out_maps = snakemake.output.keys()
sequences = "=" + ", =".join(snakemake.output.keys())
format = Path(snakemake.output.full).suffix.removeprefix(".")
if format == "jpg":
format = "jpeg"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"PretextSnapshot"
" --map {snakemake.input[0]}"
" {extra}"
" --sequences {sequences:q}"
" --format {format}"
" --folder {tmpdir}"
" --prefix out_"
" {log}"
)
if snakemake.output.get("full"):
shell("mv {tmpdir}/out_FullMap.{format} {snakemake.output.full}")
if snakemake.output.get("all"):
Path(snakemake.output.all).mkdir(parents=True, exist_ok=True)
shell("mv {tmpdir}/out_*.{format} {snakemake.output.all}/.")
PRIMERCLIP¶
Primer trimming on sam file, https://github.com/swiftbiosciences/primerclip
Example¶
This wrapper can be used in the following way:
rule primerclip:
input:
v2.2.1_file="v2.2.1_file",
alignment_file="mapped/{sample}.bam"
output:
alignment_file="mapped/{sample}.trimmed.bam"
log:
"logs/primerclip/{sample}.log"
params:
extra=""
wrapper:
"v2.2.1/bio/primerclip"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools==1.9
primerclip==0.3.8
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
master_file = snakemake.input.master_file
in_alignment_file = snakemake.input.alignment_file
out_alignment_file = snakemake.output.alignment_file
# Check inputs/arguments.
if not isinstance(master_file, str):
raise ValueError("master_file, path to the master file")
if not isinstance(in_alignment_file, str):
raise ValueError("in_alignment_file, path to the input alignment file")
if not isinstance(out_alignment_file, str):
raise ValueError("out_alignment_file, path to the output file")
samtools_input_command = "samtools view -h " + in_alignment_file
samtools_output_command = " | head -n -3 | samtools view -Sh"
if out_alignment_file.endswith(".cram"):
samtools_output_command += "C -o " + out_alignment_file
elif out_alignment_file.endswith(".sam"):
samtools_output_command += " -o " + out_alignment_file
else:
samtools_output_command += "b -o " + out_alignment_file
shell(
"{samtools_input_command} |"
" primerclip"
" {master_file}"
" /dev/stdin"
" /dev/stdout"
" {samtools_output_command}"
" {log}"
)
PROSOLO¶
For prosolo, the following wrappers are available:
PROSOLO FDR CONTROL¶
ProSolo can control the false discovery rate of any combination of its defined single cell events (like the presence of an alternative allele or the dropout of an allele).
Example¶
This wrapper can be used in the following way:
rule prosolo_fdr_control:
input:
"variant_calling/{sc}.{bulk}.prosolo.bcf"
output:
"fdr_control/{sc}.{bulk}.prosolo.fdr.bcf"
threads:
1
params:
# comma-separated set of events for whose (joint)
# false discovery rate you want to control
events = "ADO_TO_REF,HET",
# false discovery rate to control for
fdr = 0.05
log:
"logs/prosolo_{sc}_{bulk}.fdr.log"
wrapper:
"v2.2.1/bio/prosolo/control-fdr"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
prosolo=0.6.1
Input/Output¶
Input:
- Variants called with prosolo in vcf or bcf format, including the fine-grained posterior probabilities for single cell events.
Output:
- bcf file with all variants that satisfy the chosen false discovery rate threshold with regard to the specified events.
Authors¶
- David Lähnemann
Code¶
"""Snakemake wrapper for ProSolo FDR control"""
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"( prosolo control-fdr"
" {snakemake.input}"
" --events {snakemake.params.events}"
" --var SNV"
" --fdr {snakemake.params.fdr}"
" --output {snakemake.output} )"
"{log} "
)
PROSOLO¶
ProSolo calls variants or other events (like allele dropout) in a single cell sample against a bulk background sample. The single cell should stem from the same population of cells as the bulk background sample. The single cell sample should be amplified using multiple displacement amplification to match ProSolo’s statistical model.
Example¶
This wrapper can be used in the following way:
rule prosolo_calling:
input:
single_cell = "data/mapped/{sc}.sorted.bam",
single_cell_index = "data/mapped/{sc}.sorted.bam.bai",
bulk = "data/mapped/{bulk}.sorted.bam",
bulk_index = "data/mapped/{bulk}.sorted.bam.bai",
ref = "data/genome.fa",
ref_idx = "data/genome.fa.fai",
candidates = "data/{sc}.{bulk}.prosolo_candidates.bcf",
output:
"variant_calling/{sc}.{bulk}.prosolo.bcf"
params:
extra = ""
threads:
1
log:
"logs/prosolo_{sc}_{bulk}.log"
wrapper:
"v2.2.1/bio/prosolo/single-cell-bulk"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
prosolo=0.6.1
Input/Output¶
Input:
- A position-sorted single cell bam file, with its index.
- A position-sorted bulk bam file, with its index.
- A reference genome sequence in fasta format, with its index.
- A vcf or bcf file specifying candidate sites to perform calling on.
Output:
- Variants called in bcf format, with fine-grained posterior probabilities for single cell events.
Authors¶
- David Lähnemann
Code¶
"""Snakemake wrapper for ProSolo single-cell-bulk calling"""
__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"( prosolo single-cell-bulk "
"--omit-indels "
" {snakemake.params.extra} "
"--candidates {snakemake.input.candidates} "
"--output {snakemake.output} "
"{snakemake.input.single_cell} "
"{snakemake.input.bulk} "
"{snakemake.input.ref} ) "
"{log} "
)
PTRIMMER¶
Tool to trim off the primer sequence from mutiplex amplicon sequencing
Example¶
This wrapper can be used in the following way:
rule ptrimmer_pe:
input:
r1="resources/a.lane1_R1.fastq.gz",
r2="resources/a.lane1_R2.fastq.gz",
primers="resources/primers.txt"
output:
r1="results/a.lane1_R1.fq.gz",
r2="results/a.lane1_R2.fq.gz"
log:
"logs/ptrimmer/a.lane.log"
wrapper:
"v2.2.1/bio/ptrimmer"
rule ptrimmer_se:
input:
r1="resources/a.lane1_R1.fastq.gz",
primers="resources/primers.txt"
output:
r1="results/a.lane1_R1.fq",
log:
"logs/ptrimmer/a.lane1.log"
wrapper:
"v2.2.1/bio/ptrimmer"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ptrimmer=1.3.3
Authors¶
- Felix Mölder
Code¶
__author__ = "Felix Mölder"
__copyright__ = "Copyright 2020, Felix Mölder"
__email__ = "felix.moelder@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
import ntpath
input_reads = "-f {r1}".format(r1=snakemake.input.r1)
out_r1 = ntpath.basename(snakemake.output.r1)
output_reads = "-d {o1}".format(o1=out_r1)
if snakemake.input.get("r2", ""):
seqmode = "pair"
input_reads = "{reads} -r {r2}".format(reads=input_reads, r2=snakemake.input.r2)
out_r2 = ntpath.basename(snakemake.output.r2)
output_reads = "{reads} -e {o2}".format(reads=output_reads, o2=out_r2)
else:
seqmode = "single"
primers = snakemake.input.primers
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
ptrimmer_params = "-t {mode} {in_reads} -a {primers} {out_reads}".format(
mode=seqmode, in_reads=input_reads, primers=primers, out_reads=output_reads
)
process_r1 = "mv {out_read} {final_output_path}".format(
out_read=out_r1, final_output_path=snakemake.output.r1
)
process_r2 = ""
if snakemake.input.get("r2", ""):
process_r2 = "&& mv {out_read} {final_output_path}".format(
out_read=out_r2, final_output_path=snakemake.output.r2
)
shell("(ptrimmer {ptrimmer_params} && {process_r1} {process_r2}) {log}")
PURGE_DUPS¶
For purge_dups, the following wrappers are available:
PURGE_DUPS CALCUTS¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups_calcuts:
input:
"pbcstat.stat",
output:
"out/calcuts.cutoffs",
log:
"logs/calcuts.log",
params:
extra="-l 2 -m 4 -u 8",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/calcuts"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("calcuts {extra} {snakemake.input[0]} > {snakemake.output[0]} {log}")
PURGE_DUPS GET_SEQS¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups_get_seqs:
input:
fas="genome.fasta",
bed="purge_dups.bed",
output:
hap="out/get_seqs.hap.fasta",
purged="out/get_seqs.purged.fasta",
log:
"logs/get_seqs.log",
params:
extra="",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/get_seqs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if Path(snakemake.input.bed).stat().st_size:
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"get_seqs {extra} -p {tmpdir}/out {snakemake.input.bed} {snakemake.input.fas} {log}"
)
if snakemake.output.get("hap"):
shell("cat {tmpdir}/out.hap.fa > {snakemake.output.hap}")
if snakemake.output.get("purged"):
shell("cat {tmpdir}/out.purged.fa > {snakemake.output.purged}")
else:
# If BED file empty, copy input to output since `get_seqs` will segfault
log = Path(snakemake.log[0])
log.write_text(
"WARN: Input BED file is empty. Input FASTA file will be copied to output."
)
shell("cp {snakemake.input.fas} {snakemake.output.hap}")
Path(snakemake.output.purged).touch()
PURGE_DUPS NGSCSTAT¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups_ngscstat:
input:
bam="reads.bam",
output:
cov="out/ngscstat.cov",
stat="out/ngscstat.stat",
log:
"logs/ngscstat.log",
params:
extra="",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/ngscstat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell("ngscstat {extra} -O {tmpdir} {snakemake.input} {log}")
if snakemake.output.get("cov"):
shell("cat {tmpdir}/TX.base.cov > {snakemake.output.cov}")
if snakemake.output.get("stat"):
shell("cat {tmpdir}/TX.stat > {snakemake.output.stat}")
PURGE_DUPS PBCSTAT¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups_pbcstat:
input:
paf="HiFi_dataset_01.paf.gz",
output:
cov="out/pbcstat.cov",
stat="out/pbcstat.stat",
log:
"logs/pbcstat.log",
params:
extra="",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/pbcstat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with tempfile.TemporaryDirectory() as tmpdir:
shell("pbcstat {extra} -O {tmpdir} {snakemake.input} {log}")
if snakemake.output.get("cov"):
shell("cat {tmpdir}/PB.base.cov > {snakemake.output.cov}")
if snakemake.output.get("stat"):
shell("cat {tmpdir}/PB.stat > {snakemake.output.stat}")
PURGE_DUPS¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups:
input:
paf="split.self.paf.gz",
#cov="pbcstat.cov",
#cutoff="calcuts.cutoffs",
output:
"out/purge_dups.bed",
log:
"logs/purge_dups.log",
params:
extra="-2",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/purge_dups"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
cov = snakemake.input.get("cov", "")
if cov:
cov = f"-c {cov}"
cutoff = snakemake.input.get("cutoff", "")
if cutoff:
cutoff = f"-T {cutoff}"
shell(
"purge_dups {cov} {cutoff} {extra} {snakemake.input.paf} > {snakemake.output[0]} {log}"
)
PURGE_DUPS SPLIT_FA¶
Purge haplotigs and overlaps in an assembly based on read depth
URL: https://github.com/dfguan/purge_dups
Example¶
This wrapper can be used in the following way:
rule purge_dups_split_fa:
input:
"{a}.fasta",
output:
"out/{a}.split",
log:
"logs/{a}.split_fa.log",
params:
extra="",
threads: 1
wrapper:
"v2.2.1/bio/purge_dups/split_fa"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
purge_dups=1.2.6
Authors¶
- Filipe Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("split_fa {extra} {snakemake.input[0]} > {snakemake.output[0]} {log}")
PYFASTAQ¶
For pyfastaq, the following wrappers are available:
PYFASTAQ REPLACE_BASES¶
Replaces all occurrences of one letter with another.
Example¶
This wrapper can be used in the following way:
rule replace_bases:
input:
"{sample}.rna.fa"
output:
"{sample}.dna.fa",
params:
old_base = "U",
new_base = "T",
log:
"logs/fastaq/replace_bases/test/{sample}.log"
wrapper:
"v2.2.1/bio/pyfastaq/replace_bases"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pyfastaq=3.17.0
Authors¶
- Michael Hall
Code¶
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"fastaq replace_bases"
" {snakemake.input[0]}"
" {snakemake.output[0]}"
" {snakemake.params.old_base}"
" {snakemake.params.new_base}"
" {log}"
)
PYROE¶
For pyroe, the following wrappers are available:
PYROE ID-TO-NAME¶
Create a 2-column tab-separated file mapping IDs to names
URL: https://pyroe.readthedocs.io/en/latest/geneid_to_name.html
Example¶
This wrapper can be used in the following way:
rule test_pyroe_idtoname:
input:
"annotation.{format}",
output:
"id2name.{format}.tsv",
threads: 1
log:
"logs/{format}.log",
params:
extra="",
wrapper:
"v2.2.1/bio/pyroe/idtoname"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Format is automatically inferred from input files.
Software dependencies¶
pyroe=0.9.3
Input/Output¶
Input:
- Path to genome annotation (GTF or GFF3)
Output:
- Path to gene id <-> gene names mapping
Params¶
extra
: Optional parameters to be passed to pyroe
Authors¶
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
if str(snakemake.input).endswith(("gtf", "gtf.gz")):
extra += " --format GTF "
elif str(snakemake.input).endswith(("gff", "gff.gz", "gff3", "gff3.gz")):
extra += " --format GFF3 "
shell("pyroe id-to-name {extra} {snakemake.input} {snakemake.output} {log}")
PYROE MAKE-SPLICED+INTRONIC¶
Build splici reference files for Alevin-fry. The splici index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species.
Example¶
This wrapper can be used in the following way:
rule test_pyroe_makesplicedintronic:
input:
fasta="genome.fasta",
gtf="annotation.gtf",
spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA)
unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA)
output:
fasta="splici_full/spliced_intronic_sequences.fasta",
gene_id_to_name="splici_full/gene_id_to_name.tsv",
t2g="splici_full/t2g.tsv",
threads: 1
log:
"logs/pyroe.log",
params:
read_length=91, # Required
flank_trim_length=5, # Optional l
extra="", # Optional parameters
wrapper:
"v2.2.1/bio/pyroe/makesplicedintronic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pyroe=0.9.2
bedtools=2.31.0
Input/Output¶
Input:
gtf
: Path to the genome annotation (GTF formatted)fasta
: Path to the genome sequence (Fasta formatted)spliced
: Optional path to additional spliced sequences (Fasta formatted)unspliced
: Optional path to unspliced sequences (Fasta formatted)
Output:
fasta
: Path to spliced+intronic sequences (Fasta formatted)gene_id_to_name
: Path to a TSV formatted text file containing gene_id <-> gene_name correspondencet2g
: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence
Params¶
read_length
: The read length of the single-cell experiment being processed (determines flank size). Default is 100.extra
: Optional parameters to be passed to pyroe
Authors¶
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
# pyroe uses the flank-length value to name its output files
# in the result directory. We need this value to acquired output
# files and let snakemake-wrapper choose its output file names.
read_length = snakemake.params.get("read_length", 101)
flank_trim_length = snakemake.params.get("flank_trim_length", 5)
flank_length = read_length - flank_trim_length
spliced = snakemake.input.get("spliced", "")
if spliced:
spliced = "--extra-spliced " + spliced
unspliced = snakemake.input.get("unspliced", "")
if unspliced:
unspliced = "--extra-unspliced " + unspliced
with TemporaryDirectory() as tempdir:
shell(
"pyroe make-spliced+intronic "
"{extra} {spliced} "
"{unspliced} "
"{snakemake.input.fasta} "
"{snakemake.input.gtf} "
"{read_length} "
"{tempdir} "
"{log}"
)
if snakemake.output.get("fasta", False):
shell(
"mv --verbose "
"{tempdir}/splici_fl{flank_length}.fa "
"{snakemake.output.fasta} {log}"
)
if snakemake.output.get("gene_id_to_name", False):
shell(
"mv --verbose "
"{tempdir}/gene_id_to_name.tsv "
"{snakemake.output.gene_id_to_name} {log}"
)
if snakemake.output.get("t2g", False):
shell(
"mv --verbose "
"{tempdir}/splici_fl{flank_length}_t2g_3col.tsv "
"{snakemake.output.t2g} {log} "
)
PYROE MAKE-SPLICED+UNSPLICED¶
Build spliceu reference files for Alevin-fry. The spliceu (the spliced + unspliced) transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene.
Example¶
This wrapper can be used in the following way:
rule test_pyroe_makesplicedunspliced:
input:
fasta="genome.fasta",
gtf="annotation.gtf",
spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA)
unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA)
output:
gene_id_to_name="gene_id_to_name.tsv",
fasta="spliceu.fa",
g2g="spliceu_g2g.tsv",
t2g_3col="spliceu_t2g_3col.tsv",
t2g="spliceu_t2g.tsv",
threads: 1
log:
"logs/pyroe.log",
params:
extra="", # Optional parameters
wrapper:
"v2.2.1/bio/pyroe/makeunspliceunspliced/"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
pyroe=0.9.2
bedtools=2.31.0
Input/Output¶
Input:
gtf
: Path to the genome annotation (GTF formatted)fasta
: Path to the genome sequence (Fasta formatted)spliced
: Optional path to additional spliced sequences (Fasta formatted)unspliced
: Optional path to unspliced sequences (Fasta formatted)
Output:
fasta
: Path to spliced+unspliced sequences (Fasta formatted)gene_id_to_name
: Path to a TSV formatted text file containing gene_id <-> gene_name correspondencet2g_3col
: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondencet2g
: Path to a TSV formatted text file containing the transcript_id <-> gene_nameg2g
: Path to a TSV formatted text file containing the gene_id <-> gene_name
Params¶
extra
: Optional parameters to be passed to pyroe
Authors¶
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
spliced = snakemake.input.get("spliced", "")
if spliced:
spliced = "--extra-spliced " + spliced
unspliced = snakemake.input.get("unspliced", "")
if unspliced:
unspliced = "--extra-unspliced " + unspliced
with TemporaryDirectory() as tempdir:
shell(
"pyroe make-spliced+unspliced "
"{extra} {spliced} "
"{unspliced} "
"{snakemake.input.fasta} "
"{snakemake.input.gtf} "
"{tempdir} "
"{log}"
)
if snakemake.output.get("fasta", False):
shell("mv --verbose {tempdir}/spliceu.fa {snakemake.output.fasta} {log}")
if snakemake.output.get("gene_id_to_name", False):
shell(
"mv --verbose "
"{tempdir}/gene_id_to_name.tsv "
"{snakemake.output.gene_id_to_name} {log}"
)
if snakemake.output.get("t2g_3col", False):
shell(
"mv --verbose "
"{tempdir}/spliceu_t2g_3col.tsv "
"{snakemake.output.t2g_3col} {log} "
)
if snakemake.output.get("t2g", False):
shell("mv --verbose {tempdir}/spliceu_t2g.tsv {snakemake.output.t2g} {log} ")
if snakemake.output.get("g2g", False):
shell("mv --verbose {tempdir}/spliceu_g2g.tsv {snakemake.output.g2g} {log} ")
QUALIMAP¶
For qualimap, the following wrappers are available:
QUALIMAP RNASEQ¶
Run qualimap bamqc to create a QC report for aligned NGS data data. It can be used for WGS, WES, RNA, ChIP-Seq, etc.
URL: http://qualimap.conesalab.org/doc_html/analysis.html#bam-qc
Example¶
This wrapper can be used in the following way:
rule qualimap:
input:
# BAM aligned, splicing-aware, to reference genome
bam="mapped/a.bam",
output:
directory("qc/a"),
log:
"logs/qualimap/bamqc/a.log",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=4096,
wrapper:
"v2.2.1/bio/qualimap/bamqc"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
qualimap=2.2.2d
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- BAM file of data aligned to genome
Output:
- QC report in TXT format (genome_results.txt)
Authors¶
- Fritjof Lammers
- Brett Copeland
Code¶
__author__ = "Fritjof Lammers"
__copyright__ = "Copyright 2022, Fritjof Lammers"
__email__ = "f.lammers@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
if java_opts:
java_opts_str = f'JAVA_OPTS="{java_opts}"'
else:
java_opts_str = ""
# unset DISPLAY environment variable to avoid X11 error message issued by qualimap
if os.environ.get("DISPLAY"):
del os.environ["DISPLAY"]
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"{java_opts_str} qualimap bamqc {extra} "
"-bam {snakemake.input.bam} "
"-outdir {snakemake.output} "
"{log}"
)
QUALIMAP RNASEQ¶
Run qualimap rnaseq to create a QC report for RNA-seq data.
Example¶
This wrapper can be used in the following way:
rule qualimap:
input:
# BAM aligned, splicing-aware, to reference genome
bam="mapped/a.bam",
# GTF containing transcript, gene, and exon data
gtf="annotation.gtf"
output:
directory("qc/a")
log:
"logs/qualimap/rna-seq/a.log"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
wrapper:
"v2.2.1/bio/qualimap/rnaseq"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information, see http://qualimap.conesalab.org/doc_html/analysis.html#rnaseqqc.
Software dependencies¶
qualimap=2.2.2d
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- BAM file of RNA-seq data aligned to genome
- GTF file containing genome annotations
Output:
- QC report in html/pdf format
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
java_opts = snakemake.params.get("java_opts", "")
if java_opts:
java_opts_str = f'JAVA_OPTS="{java_opts}"'
else:
java_opts_str = ""
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"{java_opts_str} qualimap rnaseq {extra} "
"-bam {snakemake.input.bam} -gtf {snakemake.input.gtf} "
"-outdir {snakemake.output} "
"{log}"
)
QUAST¶
Quality Assessment Tool for Genome Assemblies
URL: https://github.com/ablab/quast
Example¶
This wrapper can be used in the following way:
rule quast:
input:
fasta="genome.fasta",
ref="genome.fasta",
#gff="annotations.gff",
#pe1="reads_R1.fastq",
#pe2="reads_R2.fastq",
#pe12="reads.fastq",
#mp1="matereads_R1.fastq",
#mp2="matereads_R2.fastq",
#mp12="matereads.fastq",
#single="single.fastq",
#pacbio="pacbio.fas",
#nanopore="nanopore.fastq",
#ref_bam="ref.bam",
#ref_sam="ref.sam",
#bam=["s1.bam","s2.bam"],
#sam=["s1.sam","s2.sam"],
#sv_bedpe="sv.bed",
output:
multiext("{sample}/report.", "html", "tex", "txt", "pdf", "tsv"),
multiext("{sample}/transposed_report.", "tex", "txt", "tsv"),
multiext(
"{sample}/basic_stats/",
"cumulative_plot.pdf",
"GC_content_plot.pdf",
"gc.icarus.txt",
"genome_GC_content_plot.pdf",
"NGx_plot.pdf",
"Nx_plot.pdf",
),
multiext(
"{sample}/contigs_reports/",
"all_alignments_genome.tsv",
"contigs_report_genome.mis_contigs.info",
"contigs_report_genome.stderr",
"contigs_report_genome.stdout",
),
"{sample}/contigs_reports/minimap_output/genome.coords_tmp",
"{sample}/icarus.html",
"{sample}/icarus_viewers/contig_size_viewer.html",
"{sample}/quast.log",
log:
"logs/{sample}.quast.log",
params:
extra="--min-contig 5 --min-identity 95.0",
wrapper:
"v2.2.1/bio/quast"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
quast=5.2.0
Input/Output¶
Input:
- Sequences in FASTA format
- Reference genome (optional)
- GFF (optional)
- Paired end read (optional)
- Mate-pair reads (optional)
- Unpaired reads (optional)
- PacBio SMRT reads (optional)
- Oxford Nanopore reads (optional)
- Mapped reads against the reference in SAM/BAM (optional)
- Mapped reads against each of the assemblies in SAM/BAM (same order; optional)
- Structural variants in BEDPE (optional)
Output:
- Assessment summary in plain text format
- Tab-separated version of the summary
- LaTeX version of the summary
- Icarus main menu with links to interactive viewers
- PDF report of all plots combined with all tables
- HTML version of the report with interactive plots inside
- Report on misassemblies
- Report on unaligned and partially unaligned contigs
- Report on k-mer-based metrics
- Report on mapped reads statistics.
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
ref = snakemake.input.get("ref", "")
if ref:
ref = f"-r {ref}"
gff = snakemake.input.get("gff", "")
if gff:
gff = f"--features {gff}"
pe1 = snakemake.input.get("pe1", "")
if pe1:
pe1 = f"--pe1 {pe1}"
pe2 = snakemake.input.get("pe2", "")
if pe2:
pe2 = f"--pe2 {pe2}"
pe12 = snakemake.input.get("pe12", "")
if pe12:
pe12 = f"--pe12 {pe12}"
mp1 = snakemake.input.get("mp1", "")
if mp1:
mp1 = f"--mp1 {mp1}"
mp2 = snakemake.input.get("mp2", "")
if mp2:
mp2 = f"--mp2 {mp2}"
mp12 = snakemake.input.get("mp12", "")
if mp12:
mp12 = f"--mp12 {mp12}"
single = snakemake.input.get("single", "")
if single:
single = f"--single {single}"
pacbio = snakemake.input.get("pacbio", "")
if pacbio:
pacbio = f"--pacbio {pacbio}"
nanopore = snakemake.input.get("nanopore", "")
if nanopore:
nanopore = f"--nanopore {nanopore}"
ref_bam = snakemake.input.get("ref_bam", "")
if ref_bam:
ref_bam = f"--ref-bam {ref_bam}"
ref_sam = snakemake.input.get("ref_sam", "")
if ref_sam:
ref_sam = f"--ref-sam {ref_sam}"
bam = snakemake.input.get("bam", "")
if bam:
if isinstance(bam, list):
bam = ",".join(bam)
bam = f"--bam {bam}"
sam = snakemake.input.get("sam", "")
if sam:
if isinstance(sam, list):
sam = ",".join(sam)
sam = f"--sam {sam}"
sv_bedpe = snakemake.input.get("sv_bedpe", "")
if sv_bedpe:
sv_bedpe = f"--sv-bedpe {sv_bedpe}"
output_dir = os.path.commonpath(snakemake.output)
shell(
"quast --threads {snakemake.threads} {ref} {gff} {pe1} {pe2} {pe12} {mp1} {mp2} {mp12} {single} {pacbio} {nanopore} {ref_bam} {ref_sam} {bam} {sam} {sv_bedpe} {extra} -o {output_dir} {snakemake.input.fasta} {log}"
)
RAGTAG¶
For ragtag, the following wrappers are available:
RAGTAG-CORRECTION¶
Homology-based misassembly correction.
URL: https://github.com/malonge/RagTag/wiki/correct
Example¶
This wrapper can be used in the following way:
rule correction:
input:
query="fasta/{query}.fasta",
ref="fasta/{reference}.fasta",
output:
fasta="{query}_corrected_{reference}/ragtag.correct.fasta",
agp="{query}_corrected_{reference}/ragtag.correct.agp",
params:
extra="",
threads: 16
log:
"logs/ragtag/{query}_{reference}.log",
wrapper:
"v2.2.1/bio/ragtag/correction"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Multiple threads can be used during Minimap/Unimap alignment.
Software dependencies¶
ragtag=2.1.0
Input/Output¶
Input:
ref
: reference fasta file (uncompressed or bgzipped)query
: query fasta file (uncompressed or bgzipped)
Output:
fasta
: The corrected query assembly in FASTA format.agp
: The AGP file defining the exact coordinates of query sequence breaks.
Params¶
extra
: additional parameters
Authors¶
- Curro Campuzano Jiménez
Code¶
"""Snakemake wrapper for ragtag-correction."""
__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n
assert snakemake.output.keys(), "Output must contain at least one named file."
valid_keys = ["agp", "fasta"]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py correct"
" {snakemake.input.ref}"
" {snakemake.input.query}"
" {extra}"
" -o {tmpdir} -t {snakemake.threads}"
" {log}"
)
for key in valid_keys:
outfile = snakemake.output.get(key)
if outfile:
shell("mv {tmpdir}/ragtag.correct.{key} {outfile}")
RAGTAG-MERGE¶
Scaffold merging.
URL: https://github.com/malonge/RagTag/wiki/merge
Example¶
This wrapper can be used in the following way:
rule merge:
input:
fasta="input/{assembly}.fasta",
agps=expand("input/{scaffold}.agp", scaffold=["scf1", "scf2"]),
#bam = "input/Hi-C.bam",
output:
fasta="{assembly}_merged.fasta",
agp="{assembly}_merged.agp",
#links = "{assembly}_merged.links",
params:
extra="",
log:
"logs/ragtag/{assembly}_merged.log",
wrapper:
"v2.2.1/bio/ragtag/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ragtag=2.1.0
Input/Output¶
Input:
ref
: assembly fasta file (uncompressed or bgzipped).agps
: scaffolding AGP files.bam
: Optional. Hi-C alignments in BAM format.
Output:
fasta
: The merged scaffolds in FASTA format.agp
: The merged scaffold results in AGP format.links
: Optional. If Hi-C alignments in BAM format were given.
Params¶
extra
: additional parameters. Do not use with ‘-b’, add the bam file to the input instead.
Authors¶
- Curro Campuzano Jiménez
Code¶
"""Snakemake wrapper for ragtag-merge."""
__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
fasta_file = snakemake.input.get("fasta")
# Check fasta_file is no
assert fasta_file, "Input must contain only one fasta file."
agp_files = snakemake.input.get("agps")
assert len(agp_files) >= 2, "Input must contain at least 2 agp files. Given: %r." % len(
agp_files
)
bam_file = snakemake.input.get("bam")
# Add Hi-C BAM file to params if present
if bam_file:
extra += f" -b {bam_file}"
# Raise warning if links file is expected but no Hi-C BAM file is given
if snakemake.output.get("links") and not bam_file:
raise "Links file is present but no Hi-C BAM file is given."
# Check that all keys in snakemake output are valid are either agp, fasta or links
assert snakemake.output.keys(), "Output must contain at least one named file."
valid_keys = ["agp", "fasta", "links"]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py merge"
" {fasta_file}"
" {agp_files}"
" {extra}"
" -o {tmpdir}"
" {log}"
)
for key in valid_keys:
outfile = snakemake.output.get(key)
if outfile:
shell("mv {tmpdir}/ragtag.merge.{key} {outfile}")
RAGTAG-PATH¶
Homology-based assembly patching.
URL: https://github.com/malonge/RagTag/wiki/patch
Example¶
This wrapper can be used in the following way:
rule patch:
input:
query="fasta/{query}.fasta",
ref="fasta/{reference}.fasta",
output:
agp="{query}_{reference}.agp",
fasta="{query}_{reference}.fasta",
rename_agp="{query}_{reference}.rename.agp",
rename_fasta="{query}_{reference}.rename.fasta",
ctg_agp="{query}_{reference}.ctg.agp",
ctg_fasta="{query}_{reference}.ctg.fasta",
comps_fasta="{query}_{reference}.comps.fasta",
asm_dir=directory("{query}_{reference}_asm"), # Assembly alignment files
params:
extra="",
threads: 16
log:
"logs/ragtag/{query}_patch_{reference}.log",
wrapper:
"v2.2.1/bio/ragtag/patch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Multiple threads can be used during Minimap/Unimap alignment.
Software dependencies¶
ragtag=2.1.0
Input/Output¶
Input:
ref
: reference fasta file (uncompressed or bgzipped)query
: query fasta file (uncompressed or bgzipped)
Output:
fasta
: The final FASTA file containing the patched assemblyagp
: The final AGP file defining how ragtag.patch.fasta is built.rename_agp
: Optional. An AGP file defining the new names for query sequencesrename_fasta
: Optional. A FASTA file with the original query sequence, but with new names.comps_fasta
: Optional. The split target assembly and the renamed query assembly combined into one FASTA file.ctg_agp
: Optional. An AGP file defining how the target assembly was split at gapsctg_fasta
: Optional. The target assembly split at gapsasm_dir
: Optional. A directory containing Assembly alignment files.
Params¶
extra
: additional parameters
Authors¶
- Curro Campuzano Jiménez
Code¶
"""Snakemake wrapper for ragtag-patch."""
__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
assert snakemake.output.keys(), "Output must contain at least one named file."
valid_keys = [
"agp",
"fasta",
"rename_agp",
"rename_fasta",
"comps_fasta",
"ctg_agp",
"ctg_fasta",
"asm_dir",
]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py patch"
" {snakemake.input.ref}"
" {snakemake.input.query}"
" {extra}"
" -o {tmpdir} -t {snakemake.threads}"
" {log}"
)
for key in valid_keys[:-1]:
outfile = snakemake.output.get(key)
if outfile:
extension = key.replace("_", ".")
shell("mv {tmpdir}/ragtag.patch.{extension} {outfile}")
outdir = snakemake.output.get("asm_dir")
if outdir:
# Move files into directory outdir
shell("mkdir -p {outdir} && mv {tmpdir}/ragtag.patch.asm.* {outdir}")
RAGTAG-SCAFFOLD¶
Homology-based assembly scaffolding.
URL: https://github.com/malonge/RagTag/wiki/scaffold
Example¶
This wrapper can be used in the following way:
rule scaffold:
input:
query="fasta/{query}.fasta",
ref="fasta/{reference}.fasta",
output:
fasta="{query}_scaffold_{reference}/ragtag.scaffold.fasta",
agp="{query}_scaffold_{reference}/ragtag.scaffold.agp",
stats="{query}_scaffold_{reference}/ragtag.scaffold.stats",
params:
extra="",
threads: 16
log:
"logs/ragtag/{query}_scaffold_{reference}.log",
wrapper:
"v2.2.1/bio/ragtag/scaffold"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Multiple threads can be used during Minimap/Unimap alignment.
Software dependencies¶
ragtag=2.1.0
Input/Output¶
Input:
ref
: reference fasta file (uncompressed or bgzipped)query
: query fasta file (uncompressed or bgzipped)
Output:
fasta
: The scaffolds in FASTA format, defined by the ordering and orientations of ragtag.scaffold.agp.agp
: The ordering and orientations of query sequences in AGP format.stats
: Summary statistics for the scaffolding process.
Params¶
extra
: additional parameters
Authors¶
- Curro Campuzano Jiménez
Code¶
"""Snakemake wrapper for ragtag-scaffold."""
__author__ = "Curro Campuzano Jiménez"
__copyright__ = "Copyright 2023, Curro Campuzano Jiménez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
assert snakemake.output.keys(), "Output must contain at least one named file."
valid_keys = ["agp", "fasta", "stats"]
for key in snakemake.output.keys():
assert (
key in valid_keys
), "Invalid key in output. Valid keys are: %r. Given: %r." % (valid_keys, key)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"ragtag.py scaffold"
" {snakemake.input.ref}"
" {snakemake.input.query}"
" {extra}"
" -o {tmpdir} -t {snakemake.threads}"
" {log}"
)
for key in valid_keys:
outfile = snakemake.output.get(key)
if outfile:
shell("mv {tmpdir}/ragtag.scaffold.{key} {outfile}")
RASUSA¶
Randomly subsample sequencing reads to a specified coverage.
URL: https://github.com/mbhall88/rasusa
Example¶
This wrapper can be used in the following way:
rule subsample:
input:
r1="{sample}.r1.fq",
r2="{sample}.r2.fq",
output:
r1="{sample}.subsampled.r1.fq",
r2="{sample}.subsampled.r2.fq",
params:
options="--seed 15",
genome_size="3mb", # required, unless `bases` is given
coverage=20, # required, unless `bases is given
#bases="2gb"
log:
"logs/subsample/{sample}.log",
wrapper:
"v2.2.1/bio/rasusa"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
rasusa=0.7.1
Input/Output¶
Input:
- Reads to subsample in FASTA/Q format. Input files can be named or unnamed.
Output:
- File paths to write subsampled reads to. If using paired-end data, make sure there are two output files in the same order as the input.
Params¶
bases
: Explicitly set the number of bases required e.g., 4.3kb, 7Tb, 9000, 4.1MB
If this option is given,coverage
andgenome_size
are ignoredcoverage
: The desired coverage to sub-sample the reads to.
Ifbases
is not provided, this option andgenome_size
are requiredgenome_size
: Genome size to calculate coverage with respect to. e.g., 4.3kb, 7Tb, 9000, 4.1MB
Alternatively, a FASTA/Q index file can be provided and the genome size will be set to the sum of all reference sequences.
Ifbases
is not provided, this option andcoverage
are requiredoptions
: Any other options as listed in the docs.
Authors¶
- Michael Hall
Code¶
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
options = snakemake.params.get("options", "")
bases = snakemake.params.get("bases")
if bases is not None:
options += " -b {}".format(bases)
else:
covg = snakemake.params.get("coverage")
gsize = snakemake.params.get("genome_size")
if covg is None or gsize is None:
raise ValueError(
"If `bases` is not given, then `coverage` and `genome_size` must be"
)
options += " -g {gsize} -c {covg}".format(gsize=gsize, covg=covg)
shell("rasusa {options} -i {snakemake.input} -o {snakemake.output} 2> {snakemake.log}")
RAZERS3¶
Mapping (short) reads against a reference sequence. Can have multiple output formats, please see https://github.com/seqan/seqan/tree/master/apps/razers3
Example¶
This wrapper can be used in the following way:
rule razers3:
input:
# list of input reads
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
output:
# output format is automatically inferred from file extension. Can be bam/sam or other formats.
"mapped/{sample}.bam"
log:
"logs/razers3/{sample}.log"
params:
# the reference genome
genome="genome.fasta",
# additional parameters
extra=""
threads: 8
wrapper:
"v2.2.1/bio/razers3"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
razers3=3.5.8
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"(razers3"
" -tc {snakemake.threads}"
" {extra}"
" -o {snakemake.output[0]}"
" {snakemake.params.genome}"
" {snakemake.input.reads})"
" {log}"
)
RBT¶
For rbt, the following wrappers are available:
RBT COLLAPSE-READS-TO-FRAGMENTS BAM¶
Calculate consensus reads from read groups marked by PicardTools MarkDuplicates or UmiAwareMarkDuplicatesWithMateCigar.
URL: https://github.com/rust-bio/rust-bio-tools
Example¶
This wrapper can be used in the following way:
rule calc_consensus_reads:
input:
"mapped/{sample}.marked.bam",
output:
consensus_r1="results/consensus/{sample}.1.fq",
consensus_r2="results/consensus/{sample}.2.fq",
consensus_se="results/consensus/{sample}.se.fq",
skipped="results/consensus/{sample}.skipped.bam",
params:
extra="--annotate-record-ids",
log:
"logs/consensus/{sample}.log",
wrapper:
"v2.2.1/bio/rbt/collapse_reads_to_fragments-bam"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
rust-bio-tools=0.42.0
Authors¶
- Felix Mölder
Code¶
__author__ = "Felix Mölder"
__copyright__ = "Copyright 2022, Felix Mölder"
__email__ = "felix.moelder@uk-essen.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"rbt collapse-reads-to-fragments bam {extra} {snakemake.input[0]} {snakemake.output} {log}"
)
RBT CSV-REPORT¶
Creates an html report of qc data stored in a csv file. For more details, visit https://github.com/rust-bio/rust-bio-tools
Example¶
This wrapper can be used in the following way:
rule csv_report:
input:
# a csv formatted file containing the data for the report
"report.csv",
output:
# path to the resulting report directory
directory("qc_data"),
params:
extra="--sort-column 'contig length'",
log:
"logs/rbt-csv-report",
wrapper:
"v2.2.1/bio/rbt/csvreport"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
rust-bio-tools=0.42.0
Input/Output¶
Input:
- A csv file containing the qc report
Output:
- QC report folder including html document and .xlsx file
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("rbt csv-report {snakemake.input} {snakemake.output} {extra} {log}")
REBALER¶
Reference-based long read assemblies of bacterial genomes
Example¶
This wrapper can be used in the following way:
rule rebaler:
input:
reference="ref.fa",
reads="{sample}.fq",
output:
assembly="{sample}.asm.fa",
log:
"logs/rebaler/{sample}.log",
params:
extra="",
wrapper:
"v2.2.1/bio/rebaler"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
rebaler=0.2.0
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for Rebaler - https://github.com/rrwick/Rebaler"""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2020, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
def get_named_input(name):
value = snakemake.input.get(name)
if value is None:
raise NameError("Missing input named '{}'".format(name))
return value
def get_named_output(name):
return snakemake.output.get(name, snakemake.output[0])
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
reference = get_named_input("reference")
reads = get_named_input("reads")
output = get_named_output("assembly")
shell("rebaler {extra} -t {snakemake.threads} {reference} {reads} > {output} {log}")
REFERENCE¶
For reference, the following wrappers are available:
ENSEMBL-ANNOTATION¶
Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.
Example¶
This wrapper can be used in the following way:
rule get_annotation:
output:
"refs/annotation.gtf",
params:
species="homo_sapiens",
release="105",
build="GRCh37",
flavor="", # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
# branch="plants", # optional: specify branch
log:
"logs/get_annotation.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/reference/ensembl-annotation"
rule get_annotation_gz:
output:
"refs/annotation.gtf.gz",
params:
species="homo_sapiens",
release="105",
build="GRCh37",
flavor="", # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
# branch="plants", # optional: specify branch
log:
"logs/get_annotation.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/reference/ensembl-annotation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
curl
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess
import sys
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
species = snakemake.params.species.lower()
build = snakemake.params.build
release = int(snakemake.params.release)
gtf_release = release
out_fmt = Path(snakemake.output[0]).suffixes
out_gz = (out_fmt.pop() and True) if out_fmt[-1] == ".gz" else False
out_fmt = out_fmt.pop().lstrip(".")
branch = ""
if build == "GRCh37":
if release >= 81:
# use the special grch37 branch for new releases
branch = "grch37/"
if release > 87:
gtf_release = 87
elif snakemake.params.get("branch"):
branch = snakemake.params.branch + "/"
flavor = snakemake.params.get("flavor", "")
if flavor:
flavor += "."
suffix = ""
if out_fmt == "gtf":
suffix = "gtf.gz"
elif out_fmt == "gff3":
suffix = "gff3.gz"
else:
raise ValueError(
"invalid format specified. Only 'gtf[.gz]' and 'gff3[.gz]' are currently supported."
)
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{out_fmt}/{species}/{species_cap}.{build}.{gtf_release}.{flavor}{suffix}".format(
release=release,
gtf_release=gtf_release,
build=build,
species=species,
out_fmt=out_fmt,
species_cap=species.capitalize(),
suffix=suffix,
flavor=flavor,
branch=branch,
)
try:
if out_gz:
shell("curl -L {url} > {snakemake.output[0]} {log}")
else:
shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
if snakemake.log:
sys.stderr = open(snakemake.log[0], "a")
print(
"Unable to download annotation data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided?",
file=sys.stderr,
)
exit(1)
ENSEMBL-SEQUENCE¶
Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.
Example¶
This wrapper can be used in the following way:
rule get_genome:
output:
"refs/genome.fasta",
params:
species="saccharomyces_cerevisiae",
datatype="dna",
build="R64-1-1",
release="98",
log:
"logs/get_genome.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/reference/ensembl-sequence"
rule get_chromosome:
output:
"refs/chr1.fasta",
params:
species="saccharomyces_cerevisiae",
datatype="dna",
build="R64-1-1",
release="101",
chromosome="I", # optional: restrict to chromosome
# branch="plants", # optional: specify branch
log:
"logs/get_genome.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/reference/ensembl-sequence"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
curl
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess as sp
import sys
from itertools import product
from snakemake.shell import shell
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
elif snakemake.params.get("branch"):
branch = snakemake.params.branch + "/"
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
spec = ("{build}" if int(release) > 75 else "{build}.{release}").format(
build=build, release=release
)
suffixes = ""
datatype = snakemake.params.get("datatype", "")
chromosome = snakemake.params.get("chromosome", "")
if datatype == "dna":
if chromosome:
suffixes = ["dna.chromosome.{}.fa.gz".format(chromosome)]
else:
suffixes = ["dna.primary_assembly.fa.gz", "dna.toplevel.fa.gz"]
elif datatype == "cdna":
suffixes = ["cdna.all.fa.gz"]
elif datatype == "cds":
suffixes = ["cds.all.fa.gz"]
elif datatype == "ncrna":
suffixes = ["ncrna.fa.gz"]
elif datatype == "pep":
suffixes = ["pep.all.fa.gz"]
else:
raise ValueError("invalid datatype, must be one of dna, cdna, cds, ncrna, pep")
if chromosome:
if not datatype == "dna":
raise ValueError(
"invalid datatype, to select a single chromosome the datatype must be dna"
)
spec = spec.format(build=build, release=release)
url_prefix = f"ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
success = False
for suffix in suffixes:
url = f"{url_prefix}.{suffix}"
try:
shell("curl -sSf {url} > /dev/null 2> /dev/null")
except sp.CalledProcessError:
continue
shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
success = True
break
if not success:
if len(suffixes) > 1:
url = f"{url_prefix}.[{'|'.join(suffixes)}]"
else:
url = f"{url_prefix}.{suffixes[0]}"
print(
f"Unable to download requested sequence data from Ensembl ({url}). "
"Please check whether above URL is currently available (might be a temporal server issue). "
"Apart from that, did you check that this combination of species, build, and release is actually provided?",
file=sys.stderr,
)
exit(1)
ENSEMBL-VARIATION¶
Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.
Example¶
This wrapper can be used in the following way:
rule get_variation:
# Optional: add fai as input to get VCF with annotated contig lengths (as required by GATK)
# and properly sorted VCFs.
# input:
# fai="refs/genome.fasta.fai"
output:
vcf="refs/variation.vcf.gz",
params:
species="saccharomyces_cerevisiae",
release="98", # releases <98 are unsupported
build="R64-1-1",
type="all", # one of "all", "somatic", "structural_variation"
# chromosome="21", # optionally constrain to chromosome, only supported for homo_sapiens
# branch="plants", # optional: specify branch
log:
"logs/get_variation.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/reference/ensembl-variation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bcftools=1.17
curl
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import tempfile
import subprocess
import sys
import os
from snakemake.shell import shell
from snakemake.exceptions import WorkflowError
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
build = snakemake.params.build
type = snakemake.params.type
chromosome = snakemake.params.get("chromosome", "")
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
elif snakemake.params.get("branch"):
branch = snakemake.params.branch + "/"
if release < 98 and not branch:
print("Ensembl releases <98 are unsupported.", file=open(snakemake.log[0], "w"))
exit(1)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if chromosome and type != "all":
raise ValueError(
"Parameter chromosome given but chromosome-wise download"
"is only implemented for type='all'."
)
if type == "all":
if species == "homo_sapiens" and release >= 93:
chroms = (
list(range(1, 23)) + ["X", "Y", "MT"] if not chromosome else [chromosome]
)
suffixes = ["-chr{}".format(chrom) for chrom in chroms]
else:
if chromosome:
raise ValueError(
"Parameter chromosome given but chromosome-wise download"
"is only implemented for homo_sapiens in releases >=93."
)
suffixes = [""]
elif type == "somatic":
suffixes = ["_somatic"]
elif type == "structural_variations":
suffixes = ["_structural_variations"]
else:
raise ValueError(
"Unsupported type {} (only all, somatic, structural_variations are allowed)".format(
type
)
)
species_filename = species if release >= 91 else species.capitalize()
urls = [
"ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.{ext}".format(
release=release,
species=species,
suffix=suffix,
species_filename=species_filename,
branch=branch,
ext=ext,
)
for suffix in suffixes
for ext in ["vcf.gz", "vcf.gz.csi"]
]
names = [os.path.basename(url) for url in urls if url.endswith(".gz")]
try:
gather = "curl {urls}".format(urls=" ".join(map("-O {}".format, urls)))
workdir = os.getcwd()
with tempfile.TemporaryDirectory() as tmpdir:
if snakemake.input.get("fai"):
shell(
"(cd {tmpdir}; {gather} && "
"bcftools concat -Oz --naive {names} > concat.vcf.gz && "
"bcftools reheader --fai {workdir}/{snakemake.input.fai} concat.vcf.gz "
"> {workdir}/{snakemake.output}) {log}"
)
else:
shell(
"(cd {tmpdir}; {gather} && "
"bcftools concat -Oz --naive {names} "
"> {workdir}/{snakemake.output}) {log}"
)
except subprocess.CalledProcessError as e:
if snakemake.log:
sys.stderr = open(snakemake.log[0], "a")
print(
"Unable to download variation data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided? ",
file=sys.stderr,
)
exit(1)
REFGENIE¶
Deploy biomedical reference datasets via refgenie.
Example¶
This wrapper can be used in the following way:
rule obtain_asset:
output:
# the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
fai="refs/genome.fasta"
# Multiple outputs/seek keys are possible here.
params:
genome="human_alu",
asset="fasta",
tag="default"
wrapper:
"v2.2.1/bio/refgenie"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
refgenie=0.12.1
refgenconf=0.12.2
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
import refgenconf
genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag
conf_path = os.environ["REFGENIE"]
rgc = refgenconf.RefGenConf(conf_path, writable=True)
# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)
for seek_key, out in snakemake.output.items():
path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
os.symlink(path, out)
RSEM¶
For rsem, the following wrappers are available:
RSEM CALCULATE EXPRESSION¶
Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data.
URL: http://deweylab.github.io/RSEM/rsem-calculate-expression.html
Example¶
This wrapper can be used in the following way:
rule calculate_expression:
input:
# input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end)
# an aligned to transcriptome BAM
bam="mapped/a.bam",
# Index files created by rsem-prepare-reference
reference=multiext("index/reference", ".grp", ".ti", ".transcripts.fa", ".seq", ".idx.fa", ".n2g.idx.fa"),
# reference_bowtie: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome
# reference_bowtie=multiext("index/reference", ".1.ebwt", ".2.ebwt", ".3.ebwt", ".4.ebwt", ".rev.1.ebwt", ".rev.2.ebwt"),
output:
# genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix
# this file contains per-gene quantification data for the sample
genes_results="output/a.genes.results",
# isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results
# this file contains per-transcript quantification data for the sample
isoforms_results="output/a.isoforms.results",
params:
# optional, specify if sequencing is paired-end
paired_end=True,
# additional optional parameters to pass to rsem, for example,
extra="--seed 42",
log:
"logs/rsem/calculate_expression/a.log",
threads: 2
wrapper:
"v2.2.1/bio/rsem/calculate-expression"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information, see https://github.com/deweylab/RSEM.
Software dependencies¶
rsem=1.3.3
bowtie=1.3.1
Input/Output¶
Input:
bam
: BAM file with reads aligned to transcriptomefq_one
: FASTQ file of reads (read_1 for paired-end sequencing)fq_two
: Optional second FASTQ file of reads (read_2 for paired-end sequencing)reference
: Index files created by rsem-prepare-referencereference_bowtie
: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome
Output:
genes_results
: This file contains per-gene quantification data for the sampleisoforms_results
: This file contains per-transcript quantification data for the sample
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
bam = snakemake.input.get("bam", "")
fq_one = snakemake.input.get("fq_one", "")
fq_two = snakemake.input.get("fq_two", "")
if bam:
if fq_one:
raise Exception("Only input.bam or input.fq_one expected, got both.")
input_bam = "--alignments"
input_string = bam
paired_end = snakemake.params.get("paired_end", False)
else:
input_bam = ""
if fq_one:
if isinstance(fq_one, list):
num_fq_one = len(fq_one)
input_string = ",".join(fq_one)
else:
num_fq_one = 1
input_string = fq_one
if fq_two:
paired_end = True
if isinstance(fq_two, list):
num_fq_two = len(fq_two)
if num_fq_one != num_fq_two:
raise Exception(
"Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two)
)
else:
fq_two = [fq_two]
input_string += " " + ",".join(fq_two)
else:
paired_end = False
else:
raise Exception("Expected input.bam or input.fq_one, got neither.")
if paired_end:
paired_end_string = "--paired-end"
else:
paired_end_string = ""
genes_results = snakemake.output.genes_results
if genes_results.endswith(".genes.results"):
output_prefix = genes_results[: -len(".genes.results")]
else:
raise Exception(
"output.genes_results file name malformed "
"(rsem will append .genes.results suffix)"
)
if not snakemake.output.isoforms_results.endswith(".isoforms.results"):
raise Exception(
"output.isoforms_results file name malformed "
"(rsem will append .isoforms.results suffix)"
)
reference_prefix = os.path.splitext(snakemake.input.reference[0])[0]
extra = snakemake.params.get("extra", "")
threads = snakemake.threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"rsem-calculate-expression --num-threads {snakemake.threads} {extra} "
"{paired_end_string} {input_bam} {input_string} "
"{reference_prefix} {output_prefix} "
"{log}"
)
RSEM GENERATE DATA MATRIX¶
Run rsem-generate-data-matrix to combine a set of single-sample rsem results into a single matrix.
Example¶
This wrapper can be used in the following way:
rule rsem_generate_data_matrix:
input:
# one or more expression files created by rsem-calculate-expression
["a.genes.results", "b.genes.results"],
output:
# a tsv containing each sample in the input as a column
"genes.results",
params:
# optional additional parameters
extra="",
log:
"logs/rsem/generate_data_matrix.log",
wrapper:
"v2.2.1/bio/rsem/generate-data-matrix"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information, see https://github.com/deweylab/RSEM.
Software dependencies¶
rsem=1.3.3
Input/Output¶
Input:
- a list of rsem results files
Output:
- Quantification results summarized by allele/gene/isoform per sample
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"rsem-generate-data-matrix {extra} "
"{snakemake.input} > {snakemake.output} "
"{log}"
)
RSEM PREPARE REFERENCE¶
Run rsem-prepare-reference to create index files for downstream analysis with rsem.
Example¶
This wrapper can be used in the following way:
rule prepare_reference:
input:
# reference FASTA with either the entire genome or transcript sequences
reference_genome="genome.fasta",
output:
# one of the index files created and used by RSEM (required)
seq="index/reference.seq",
# RSEM produces a number of other files which may optionally be specified as output; these may be provided so that snakemake is aware of them, but the wrapper doesn't do anything with this information other than to verify that the file path prefixes match that of output.seq.
# for example,
grp="index/reference.grp",
ti="index/reference.ti",
params:
# optional additional parameters, for example,
#extra="--gtf annotations.gtf",
# if building the index against a reference transcript set
extra="",
log:
"logs/rsem/prepare-reference.log",
wrapper:
"v2.2.1/bio/rsem/prepare-reference"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information, see https://github.com/deweylab/RSEM.
Software dependencies¶
rsem=1.3.3
Input/Output¶
Input:
- reference genome
- additional optional arguments
Output:
- index files for downstream use with rsem
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
# the reference_name argument is inferred by stripping the .seq suffix from
# the output.seq value
output_directory = os.path.dirname(os.path.abspath(snakemake.output.seq))
seq_file = os.path.basename(snakemake.output.seq)
if seq_file.endswith(".seq"):
reference_name = os.path.join(output_directory, seq_file[:-4])
else:
raise Exception("output.seq has an invalid file suffix (must be .seq)")
for output_variable, output_path in snakemake.output.items():
if not os.path.abspath(output_path).startswith(reference_name):
raise Exception(
"the path for {} is inconsistent with that of output.seq".format(
output_variable
)
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"rsem-prepare-reference --num-threads {snakemake.threads} {extra} "
"{snakemake.input.reference_genome} {reference_name} "
"{log}"
)
RUBIC¶
RUBIC detects recurrent copy number alterations using copy number breaks.
Example¶
This wrapper can be used in the following way:
rule rubic:
input:
seg="{samples}/segments.txt",
markers="{samples}/markers.txt"
output:
out_gains="{samples}/gains.txt",
out_losses="{samples}/losses.txt",
out_plots=directory("{samples}/plots") #only possible to provide output directory for plots
params:
fdr="",
genefile=""
wrapper:
"v2.2.1/bio/rubic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
r-base=3.4.1
r-rubic=1.0.3
r-data.table=1.10.4
r-pracma=2.0.4
r-ggplot2=2.2.1
r-gtable=0.2.0
r-codetools=0.2_15
r-digest=0.6.12
Params¶
fdr
: false discovery rate (optional, leave empty to use default value of 0.25)genefile
: file path to use custom gene file (optional, leave empty to use default file)
Authors¶
- Beatrice F. Tan
Code¶
# __author__ = "Beatrice F. Tan"
# __copyright__ = "Copyright 2018, Beatrice F. Tan"
# __email__ = "beatrice.ftan@gmail.com"
# __license__ = "LUMC"
library(RUBIC)
all_genes <- if (snakemake@params[["genefile"]] == "") system.file("extdata", "genes.tsv", package="RUBIC") else snakemake@params[["genefile"]]
fdr <- if (snakemake@params[["fdr"]] == "") 0.25 else snakemake@params[["fdr"]]
rbc <- rubic(fdr, snakemake@input[["seg"]], snakemake@input[["markers"]], genes=all_genes)
rbc$save.focal.gains(snakemake@output[["out_gains"]])
rbc$save.focal.losses(snakemake@output[["out_losses"]])
rbc$save.plots(snakemake@output[["out_plots"]])
SALMON¶
For salmon, the following wrappers are available:
DECOYS¶
Generate gentrome sequences and gather decoy sequences name
URL: https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/
Example¶
This wrapper can be used in the following way:
rule test_salmon_decoy:
input:
transcriptome="transcriptome.fasta.gz",
genome="genome.fasta.gz",
output:
gentrome="gentrome.fasta.gz",
decoys="decoys.txt",
threads: 2
log:
"decoys.log"
wrapper:
"v2.2.1/bio/salmon/decoys"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Provide transcriptome and genome under the same format (raw fasta, gzipped or bgzipped). In case of compressed input, this wrapper requires 2 threads: one for on-the-fly decompression and one for actual decoy sequences acquisition.
Software dependencies¶
bzip2=1.0.8
gzip=1.12
Input/Output¶
Input:
transcriptome
: Path to transcriptome sequences, fasta (gz/bz2) formatted.genome
: Path to genome sequences, fasta (gz/bz2) formatted.
Output:
gentrome
: Path to gentrome, fasta (gz/bz2) formatted.decoys
: Path to text file contianing decoy sequence names.
Authors¶
- Thibault Dayris
Code¶
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Snakemake wrapper for gentrome and decoy sequences acquisition"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2022, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
required_thread_nb = 1
genome = snakemake.input["genome"]
if genome.endswith(".gz"):
genome = f"<( gzip --stdout --decompress {genome} )"
required_thread_nb += 1 # Add a thread for gzip uncompression
elif genome.endswith(".bz2"):
genome = f"<( bzip2 --stdout --decompress {genome} )"
required_thread_nb += 1 # Add a thread for bzip2 uncompression
if snakemake.threads < required_thread_nb:
raise ValueError(
f"Salmon decoy wrapper requires exactly {required_thread_nb} threads, "
f"but only {snakemake.threads} were provided"
)
sequences = [
snakemake.input["transcriptome"],
snakemake.input["genome"],
snakemake.output["gentrome"],
]
if all(fasta.endswith(".gz") for fasta in sequences):
# Then all input sequences are gzipped. The output will also be gzipped.
pass
elif all(fasta.endswith(".bz2") for fasta in sequences):
# Then all input sequences are bgzipped. The output will also be bgzipped.
pass
elif all(fasta.endswith((".fa", ".fna", ".fasta")) for fasta in sequences):
# Then all input sequences are raw fasta. The output will also be raw fasta.
pass
else:
raise ValueError(
"Mixed compression status: Either all fasta sequences are compressed "
"with the *same* compression algorithm, or none of them are compressed."
)
# Gathering decoy sequences names
# Sed command works as follow:
# -n = do not print all lines
# s/ .*//g = Remove anything after spaces. (remove comments)
# s/>//p = Remove '>' character at the begining of sequence names. Print names.
shell("( sed -n 's/ .*//g;s/>//p' {genome} ) > {snakemake.output.decoys} {log}")
# Building big gentrome file
shell(
"cat {snakemake.input.transcriptome} {snakemake.input.genome} "
"> {snakemake.output.gentrome} {log}"
)
SALMON_INDEX¶
Index a transcriptome assembly with salmon
Example¶
This wrapper can be used in the following way:
rule salmon_index:
input:
sequences="assembly/transcriptome.fasta",
output:
multiext(
"salmon/transcriptome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
"duplicate_clusters.tsv",
"info.json",
"mphf.bin",
"pos.bin",
"pre_indexing.log",
"rank.bin",
"refAccumLengths.bin",
"ref_indexing.log",
"reflengths.bin",
"refseq.bin",
"seq.bin",
"versionInfo.json",
),
log:
"logs/salmon/transcriptome_index.log",
threads: 2
params:
# optional parameters
extra="",
wrapper:
"v2.2.1/bio/salmon/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
salmon=1.10.1
Input/Output¶
Input:
sequences
: Path to sequences to index with Salmon. This can be transcriptome sequences or gentrome.decoys
: Optional path to decoy sequences name, in case the above sequence was a gentrome.
Output:
- indexed assembly
Params¶
extra
: Optional parameters besides –tmpdir, –threads, and IO.
Authors¶
- Tessa Pierce
- Thibault Dayris
Code¶
"""Snakemake wrapper for Salmon Index."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os.path import dirname
from snakemake.shell import shell
from tempfile import TemporaryDirectory
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
decoys = snakemake.input.get("decoys", "")
if decoys:
decoys = f"--decoys {decoys}"
output = snakemake.output
if len(output) > 1:
output = dirname(snakemake.output[0])
with TemporaryDirectory() as tempdir:
shell(
"salmon index "
"--transcripts {snakemake.input.sequences} "
"--index {output} "
"--threads {snakemake.threads} "
"--tmpdir {tempdir} "
"{decoys} "
"{extra} "
"{log}"
)
SALMON QUANT¶
Quantify transcripts with salmon
URL: https://salmon.readthedocs.io/en/latest/salmon.html#quantifying-in-mapping-based-mode
Example¶
This wrapper can be used in the following way:
rule salmon_quant_reads:
input:
# If you have multiple fastq files for a single sample (e.g. technical replicates)
# use a list for r1 and r2.
r1="reads/{sample}_1.fq.gz",
r2="reads/{sample}_2.fq.gz",
index="salmon/transcriptome_index",
output:
quant="salmon/{sample}/quant.sf",
lib="salmon/{sample}/lib_format_counts.json",
log:
"logs/salmon/{sample}.log",
params:
# optional parameters
libtype="A",
extra="",
threads: 2
wrapper:
"v2.2.1/bio/salmon/quant"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Salmon accepted either a list of unpaired reads (r parameter), or two lists of the same length containing paired reads (r1 and r2 parameters). Not both.
Software dependencies¶
salmon=1.10.2
gzip=1.12
bzip2=1.0.8
Input/Output¶
Input:
index
: Path to Salmon indexed sequences, see bio/salmon/indexgtf
: Optional path to a GTF formatted genome annotationr
: Path to unpaired readsr1
: Path to upstream reads file.r2
: Path to downstream reads file.
Output:
- Path to quantification file
bam
: Path to pseudo-bam file
Params¶
libType
: Format string describing the library type, see official documentation on Library Types for list of accepted values.extra
: Optional command line parameters, besides IO parameters and threads.
Authors¶
- Tessa Pierce
- Thibault Dayris
Code¶
"""Snakemake wrapper for Salmon Quant"""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os.path import dirname
from snakemake.shell import shell
class MixedPairedUnpairedInput(Exception):
def __init__(self):
super().__init__(
"Salmon cannot quantify mixed paired/unpaired input files. "
"Please input either `r1`, `r2` (paired) or `r` (unpaired)"
)
class MissingMateError(Exception):
def __init__(self):
super().__init__(
"Salmon requires an equal number of paired reads in `r1` and `r2`,"
" or a list of unpaired reads `r`"
)
def uncompress_bz2(snake_io, salmon_threads):
"""
Provide bzip2 on-the-fly decompression
For each of these b-unzipping, a thread will be used. Therefore, the maximum number of threads given to Salmon
shall be reduced by one in order not to be killed on a cluster.
"""
# Asking forgiveness instead of permission
try:
# If no error are raised, then we have a string.
if snake_io.endswith("bz2"):
return [f"<( bzip2 --decompress --stdout {snake_io} )"], salmon_threads - 1
return [snake_io], salmon_threads
except AttributeError:
# As an error has been raise, we have a list of fastq files.
fq_files = []
for fastq in snake_io:
if fastq.endswith("bz2"):
fq_files.append(f"<( bzip2 --decompress --stdout {fastq} )")
salmon_threads -= 1
else:
fq_files.append(fastq)
return fq_files, salmon_threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
libtype = snakemake.params.get("libtype", "A")
max_threads = snakemake.threads
extra = snakemake.params.get("extra", "")
if "--validateMappings" in extra:
raise DeprecationWarning("`--validateMappings` is deprecated and has no effect")
r1 = snakemake.input.get("r1")
r2 = snakemake.input.get("r2")
r = snakemake.input.get("r")
if all(mate is not None for mate in [r1, r2]):
r1, max_threads = uncompress_bz2(r1, max_threads)
r2, max_threads = uncompress_bz2(r2, max_threads)
if len(r1) != len(r2):
raise MissingMateError()
if r is not None:
raise MixedPairedUnpairedInput()
r1_cmd = " --mates1 {}".format(" ".join(r1))
r2_cmd = " --mates2 {}".format(" ".join(r2))
read_cmd = " ".join([r1_cmd, r2_cmd])
elif r is not None:
if any(mate is not None for mate in [r1, r2]):
raise MixedPairedUnpairedInput()
r, max_threads = uncompress_bz2(r, max_threads)
read_cmd = " --unmatedReads {}".format(" ".join(r))
else:
raise MissingMateError()
gene_map = snakemake.input.get("gtf", "")
if gene_map:
gene_map = f"--geneMap {gene_map}"
bam = snakemake.output.get("bam", "")
if bam:
bam = f"--writeMappings {bam}"
outdir = dirname(snakemake.output.get("quant"))
index = snakemake.input["index"]
if isinstance(index, list):
index = dirname(index[0])
if max_threads < 1:
raise ValueError(
"On-the-fly b-unzipping have raised the required number of threads. "
f"Please request at least {1 - max_threads} more threads."
)
shell(
"salmon quant --index {index} "
" --libType {libtype} {read_cmd} --output {outdir} {gene_map} "
" --threads {max_threads} {extra} {bam} {log}"
)
SALSA2¶
A tool to scaffold long read assemblies with Hi-C data
URL: https://github.com/marbl/SALSA
Example¶
This wrapper can be used in the following way:
rule salsa2:
input:
fas="{sample}.fasta",
fai="{sample}.fasta.fai",
bed="{sample}.bed",
output:
agp="out/{sample}.agp",
fas="out/{sample}.fas",
log:
"logs/salsa2/{sample}.log",
params:
enzyme="CTTAAG", # optional
extra="--clean yes", # optional
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/salsa2"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
Software dependencies¶
salsa2=2.3
Input/Output¶
Input:
- BED file
- FASTA file
- FASTA index file
Output:
- polished assembly (FASTA format)
- polished assembly (AGP format)
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2022, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
enzyme = snakemake.params.get("enzyme", "")
if enzyme:
enzyme = f"--enzyme {enzyme}"
gfa = snakemake.input.get("gfa", "")
if gfa:
gfa = f"--gfa {gfa}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"run_pipeline.py"
" --assembly {snakemake.input.fas}"
" --length {snakemake.input.fai}"
" --bed {snakemake.input.bed}"
" {enzyme}"
" {gfa}"
" {extra}"
" --output {tmpdir}"
" {log}"
)
if snakemake.output.get("agp"):
shell("cat {tmpdir}/scaffolds_FINAL.agp > {snakemake.output.agp}")
if snakemake.output.get("fas"):
shell("cat {tmpdir}/scaffolds_FINAL.fasta > {snakemake.output.fas}")
SAMBAMBA¶
For sambamba, the following wrappers are available:
SAMBAMBA FLAGSTAT¶
Outputs some statistics drawn from read flags. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-flagstat.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_flagstat:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.stats.txt"
params:
extra="" # optional parameters
log:
"logs/sambamba-flagstat/{sample}.log"
threads: 1
wrapper:
"v2.2.1/bio/sambamba/flagstat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"sambamba flagstat {snakemake.params.extra} -t {snakemake.threads} "
"{snakemake.input[0]} > {snakemake.output[0]} "
"{log}"
)
SAMBAMBA INDEX¶
Indexing a bam file with `sambamba https://lomereiter.github.io/sambamba/docs/sambamba-index.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_index:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.bam.bai"
params:
extra="" # optional parameters
log:
"logs/sambamba-index/{sample}.log"
threads: 8
wrapper:
"v2.2.1/bio/sambamba/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"sambamba index {snakemake.params.extra} -t {snakemake.threads} "
"{snakemake.input[0]} {snakemake.output[0]} "
"{log}"
)
SAMBAMBA MARKDUP¶
Marks (default) or removes duplicate reads in BAM file. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-markdup.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_markdup:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.rmdup.bam"
params:
extra="-r" # optional parameters
log:
"logs/sambamba-markdup/{sample}.log"
threads: 8
wrapper:
"v2.2.1/bio/sambamba/markdup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
from tempfile import TemporaryDirectory
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with TemporaryDirectory() as tempdir:
shell(
"sambamba markdup {snakemake.params.extra} --nthreads {snakemake.threads} "
"--tmpdir {tempdir} {snakemake.input[0]} {snakemake.output[0]} "
"{log}"
)
SAMBAMBA MERGE¶
merge multiple BAM files into one using `sambamba https://lomereiter.github.io/sambamba/docs/sambamba-merge.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_merge:
input:
["mapped/{sample}_1.sorted.bam", "mapped/{sample}_2.sorted.bam"]
output:
"mapped/{sample}.merged.bam"
params:
extra="" # optional parameters
log:
"logs/sambamba-merge/{sample}.log"
threads: 1
wrapper:
"v2.2.1/bio/sambamba/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"sambamba merge {snakemake.params.extra} -t {snakemake.threads} "
"{snakemake.output[0]} {snakemake.input} "
"{log}"
)
SAMBAMBA SLICE¶
Fast tool for copying a slice of a BAM file. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-slice.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_slice:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai"
output:
"mapped/{sample}.region.bam"
params:
region="xx:1-10" # region to catch (contig:start-end)
log:
"logs/sambamba-slice/{sample}.log"
wrapper:
"v2.2.1/bio/sambamba/slice"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Input/Output¶
Input:
- coordinate-sorted and indexed bam file
Output:
- new bam file with specific region
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"sambamba slice "
"{snakemake.input[0]} {snakemake.params.region} > {snakemake.output[0]} "
"{log}"
)
SAMBAMBA SORT¶
Sort bam file with sambamba
Example¶
This wrapper can be used in the following way:
rule sambamba_sort:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.sorted.bam"
params:
"" # optional parameters
log:
"logs/sambamba-sort/{sample}.log"
threads: 8
wrapper:
"v2.2.1/bio/sambamba/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
from tempfile import TemporaryDirectory
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with TemporaryDirectory() as tempdir:
shell(
"sambamba sort {snakemake.params} --nthreads {snakemake.threads} "
"--tmpdir {tempdir} --out {snakemake.output[0]} {snakemake.input[0]} "
"{log}"
)
SAMBAMBA VIEW¶
Filter and/or view BAM files. See details `here https://lomereiter.github.io/sambamba/docs/sambamba-view.html`_
Example¶
This wrapper can be used in the following way:
rule sambamba_view:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.filtered.bam"
params:
extra="-f bam -F 'mapping_quality >= 50'" # optional parameters
log:
"logs/sambamba-view/{sample}.log"
threads: 8
wrapper:
"v2.2.1/bio/sambamba/view"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sambamba=1.0
Authors¶
- Jan Forster
Code¶
__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
in_file = snakemake.input[0]
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if in_file.endswith(".sam") and ("-S" not in extra or "--sam-input" not in extra):
extra += " --sam-input"
shell(
"sambamba view {extra} -t {snakemake.threads} "
"{snakemake.input[0]} > {snakemake.output[0]} "
"{log}"
)
SAMTOOLS¶
For samtools, the following wrappers are available:
SAMTOOLS CALMD¶
Calculates MD and NM tags.
Example¶
This wrapper can be used in the following way:
rule samtools_calmd:
input:
aln="{sample}.bam", # Can be 'sam', 'bam', or 'cram'
ref="genome.fasta",
output:
"{sample}.calmd.bam",
log:
"{sample}.calmd.log",
params:
extra="-E", # optional params string
threads: 2
wrapper:
"v2.2.1/bio/samtools/calmd"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads or -O/–output-fmt).
- For more information see, http://www.htslib.org/doc/samtools-calmd.html
Software dependencies¶
samtools=1.16.1
snakemake-wrapper-utils=0.5.2
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"samtools calmd {samtools_opts} {extra} {snakemake.input.aln} {snakemake.input.ref} > {snakemake.output[0]} {log}"
)
SAMTOOLS DEPTH¶
Compute the read depth at each position or region using samtools.
Example¶
This wrapper can be used in the following way:
rule samtools_depth:
input:
bams=["mapped/A.bam", "mapped/B.bam"],
bed="regionToCalcDepth.bed", # optional
output:
"depth.txt",
log:
"depth.log",
params:
# optional bed file passed to -b
extra="", # optional additional parameters as string
wrapper:
"v2.2.1/bio/samtools/depth"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads or -o).
- For more information see, http://www.htslib.org/doc/samtools-depth.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Dayne Filer
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for running samtools depth."""
__author__ = "Dayne L Filer"
__copyright__ = "Copyright 2020, Dayne L Filer"
__email__ = "dayne.filer@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# check for optional bed file
bed = snakemake.input.get("bed", "")
if bed:
bed = "-b " + bed
shell("samtools depth {samtools_opts} {extra} {bed} {snakemake.input.bams} {log}")
SAMTOOLS FAIDX¶
index reference sequence in FASTA format from reference sequence.
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input:
"{sample}.fa",
output:
"{sample}.fa.fai",
log:
"{sample}.log",
params:
extra="", # optional params string
wrapper:
"v2.2.1/bio/samtools/faidx"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -o).
- For more information see, http://www.htslib.org/doc/samtools-faidx.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Michael Chambers
- Filipe G. Vieira
Code¶
__author__ = "Michael Chambers"
__copyright__ = "Copyright 2019, Michael Chambers"
__email__ = "greenkidneybean@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_threads=False, parse_write_index=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("samtools faidx {samtools_opts} {extra} {snakemake.input[0]} {log}")
SAMTOOLS FASTQ INTERLEAVED¶
Convert a bam file back to unaligned reads in a single fastq file with samtools. For paired end reads, this results in an unsorted interleaved file.
Example¶
This wrapper can be used in the following way:
rule samtools_fastq_interleaved:
input:
"mapped/{sample}.bam",
output:
"reads/{sample}.fq",
log:
"{sample}.interleaved.log",
params:
" ",
threads: 3
wrapper:
"v2.2.1/bio/samtools/fastq/interleaved"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads or -o).
- For more information see, http://www.htslib.org/doc/samtools-fasta.html
Software dependencies¶
samtools=1.14
snakemake-wrapper-utils=0.5.2
Authors¶
- David Laehnemann
- Victoria Sack
- Filipe G. Vieira
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("samtools fastq {samtools_opts} {extra} {snakemake.input[0]} {log}")
SAMTOOLS FASTQ SEPARATE¶
Convert a bam file with paired end reads back to unaligned reads in a two separate fastq files with samtools. Reads that are not properly paired are discarded (READ_OTHER and singleton reads in samtools fastq documentation), as are secondary (0x100) and supplementary reads (0x800).
Example¶
This wrapper can be used in the following way:
rule samtools_fastq_separate:
input:
"mapped/{sample}.bam",
output:
"reads/{sample}.1.fq",
"reads/{sample}.2.fq",
log:
"{sample}.separate.log",
params:
sort="-m 4G",
fastq="-n",
# Remember, this is the number of samtools' additional threads. At least 2 threads have to be requested on cluster sumbission. This value - 2 will be sent to samtools sort -@ argument.
threads: 3
wrapper:
"v2.2.1/bio/samtools/fastq/separate"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, http://www.htslib.org/doc/samtools-fasta.html
Software dependencies¶
samtools=1.14
snakemake-wrapper-utils=0.5.2
Authors¶
- David Laehnemann
- Victoria Sack
- Filipe G. Vieira
Code¶
__author__ = "David Laehnemann, Victoria Sack"
__copyright__ = "Copyright 2018, David Laehnemann, Victoria Sack"
__email__ = "david.laehnemann@hhu.de"
__license__ = "MIT"
import os
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
params_sort = snakemake.params.get("sort", "")
params_fastq = snakemake.params.get("fastq", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Samtools takes additional threads through its option -@
# One thread is used bu Samtools sort
# One thread is used by Samtools fastq
# So snakemake.threads has to take them into account
# before allowing additional threads through samtools sort -@
threads = 0 if snakemake.threads <= 2 else snakemake.threads - 2
mem = get_mem(snakemake, "MiB")
mem = "-m {0:.0f}M".format(mem / threads) if mem and threads else ""
with tempfile.TemporaryDirectory() as tmpdir:
tmp_prefix = Path(tmpdir) / "samtools_fastq.sort"
shell(
"(samtools sort -n"
" --threads {threads}"
" {mem}"
" -T {tmp_prefix}"
" {params_sort}"
" {snakemake.input[0]} | "
"samtools fastq"
" {params_fastq}"
" -1 {snakemake.output[0]}"
" -2 {snakemake.output[1]}"
" -0 /dev/null"
" -s /dev/null"
" -F 0x900"
" - "
") {log}"
)
SAMTOOLS FASTX¶
Converts a SAM, BAM or CRAM into FASTQ or FASTA format.
Example¶
This wrapper can be used in the following way:
rule samtools_fastq:
input:
"{prefix}.sam",
output:
"{prefix}.fasta",
log:
"{prefix}.log",
message:
""
# Samtools takes additional threads through its option -@
threads: 2 # This value - 1 will be sent to -@
params:
outputtype="fasta",
extra="",
wrapper:
"v2.2.1/bio/samtools/fastx/"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads or -o).
- For more information see, http://www.htslib.org/doc/samtools-fasta.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- bam or sam file (.bam, .sam)
Output:
- fastq file (.fastq) or fasta file (.fasta)
Authors¶
- William Rowell
- Filipe G. Vieira
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"samtools {snakemake.params.outputtype} {samtools_opts} {extra} {snakemake.input} {log}"
)
SAMTOOLS FIXMATE¶
Use samtools to correct mate information after BWA mapping.
Example¶
This wrapper can be used in the following way:
rule samtools_fixmate:
input:
"mapped/{input}",
output:
"fixed/{input}",
log:
"{input}.log",
message:
"Fixing mate information in {wildcards.input}"
threads: 1
params:
extra="",
wrapper:
"v2.2.1/bio/samtools/fixmate/"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads or -O/–output-fmt).
- For more information see, http://www.htslib.org/doc/samtools-fixmate.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Thibault Dayris
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for samtools fixmate"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"samtools fixmate {samtools_opts} {extra} {snakemake.input[0]} {snakemake.output[0]} {log}"
)
SAMTOOLS FLAGSTAT¶
Use samtools to create a flagstat file from a bam or sam file.
Example¶
This wrapper can be used in the following way:
rule samtools_flagstat:
input:
"mapped/{sample}.bam",
output:
"mapped/{sample}.bam.flagstat",
log:
"{sample}.log",
params:
extra="", # optional params string
wrapper:
"v2.2.1/bio/samtools/flagstat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads).
- For more information see, http://www.htslib.org/doc/samtools-flagstat.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Christopher Preusch
- Filipe G. Vieira
Code¶
__author__ = "Christopher Preusch"
__copyright__ = "Copyright 2017, Christopher Preusch"
__email__ = "cpreusch[at]ust.hk"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"samtools flagstat {samtools_opts} {extra} {snakemake.input[0]} > {snakemake.output[0]} {log}"
)
SAMTOOLS IDXSTATS¶
Use samtools to retrieve and print stats from indexed BAM, SAM or CRAM files.
Example¶
This wrapper can be used in the following way:
rule samtools_idxstats:
input:
bam="mapped/{sample}.bam",
idx="mapped/{sample}.bam.bai",
output:
"mapped/{sample}.bam.idxstats",
log:
"logs/samtools/idxstats/{sample}.log",
params:
extra="", # optional params string
wrapper:
"v2.2.1/bio/samtools/idxstats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads).
- For more information see, http://www.htslib.org/doc/samtools-idxstats.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- indexed SAM, BAM or CRAM file (.SAM, .BAM, .CRAM)
- corresponding index files
Output:
- idxstat file (.idxstats)
Authors¶
- Antonie Vietor
- Filipe G. Vieira
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"samtools idxstats {samtools_opts} {extra} {snakemake.input.bam} > {snakemake.output[0]} {log}"
)
SAMTOOLS INDEX¶
Index bam file with samtools.
Example¶
This wrapper can be used in the following way:
rule samtools_index:
input:
"mapped/{sample}.sorted.bam",
output:
"mapped/{sample}.sorted.bam.bai",
log:
"logs/samtools_index/{sample}.log",
params:
extra="", # optional params string
threads: 4 # This value - 1 will be sent to -@
wrapper:
"v2.2.1/bio/samtools/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, http://www.htslib.org/doc/samtools-index.html
Software dependencies¶
samtools=1.17
Authors¶
- Johannes Köster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)
shell(
"samtools index {threads} {extra} {snakemake.input[0]} {snakemake.output[0]} {log}"
)
SAMTOOLS MERGE¶
Merge two bam files with samtools.
Example¶
This wrapper can be used in the following way:
rule samtools_merge:
input:
["mapped/A.bam", "mapped/B.bam"],
output:
"merged.bam",
log:
"merged.log",
params:
extra="", # optional additional parameters as string
threads: 8
wrapper:
"v2.2.1/bio/samtools/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads, –write-index, -o or -O/–output-fmt).
- For more information see, http://www.htslib.org/doc/samtools-merge.html
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("samtools merge {samtools_opts} {extra} {snakemake.input} {log}")
SAMTOOLS MPILEUP¶
Generate pileup using samtools.
Example¶
This wrapper can be used in the following way:
rule mpilup:
input:
# single or list of bam files
bam="mapped/{sample}.bam",
reference_genome="genome.fasta",
output:
"mpileup/{sample}.mpileup.gz",
log:
"logs/samtools/mpileup/{sample}.log",
params:
extra="-d 10000", # optional
wrapper:
"v2.2.1/bio/samtools/mpileup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, http://www.htslib.org/doc/samtools-mpileup.html
Software dependencies¶
samtools=1.17
pigz=2.6
Authors¶
- Patrik Smeds
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for running mpileup."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
if not snakemake.output[0].endswith(".gz"):
raise Exception(
'output file will be compressed and therefore filename should end with ".gz"'
)
shell(
"(samtools mpileup {extra} -f {snakemake.input.reference_genome} {snakemake.input.bam} | pigz > {snakemake.output}) {log}"
)
SAMTOOLS SORT¶
Sort bam file with samtools.
URL: http://www.htslib.org/doc/samtools-sort.html
Example¶
This wrapper can be used in the following way:
rule samtools_sort:
input:
"mapped/{sample}.bam",
output:
"mapped/{sample}.sorted.bam",
log:
"{sample}.log",
params:
extra="-m 4G",
threads: 8
wrapper:
"v2.2.1/bio/samtools/sort"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.5.3
Params¶
extra
: additional program arguments (not -@/–threads, –write-index, -m, -o or -O/–output-fmt).
Authors¶
- Johannes Köster
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(snakemake)
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
mem_per_thread_mb = int(get_mem(snakemake) / snakemake.threads)
with tempfile.TemporaryDirectory() as tmpdir:
tmp_prefix = Path(tmpdir) / "samtools_sort"
shell(
"samtools sort {samtools_opts} -m {mem_per_thread_mb}M {extra} -T {tmp_prefix} {snakemake.input[0]} {log}"
)
SAMTOOLS STATS¶
Generate stats using samtools.
URL: http://www.htslib.org/doc/samtools-stats.html
Example¶
This wrapper can be used in the following way:
rule samtools_stats:
input:
bam="mapped/{sample}.bam",
bed="design.bed", #Optional input, specify target regions
output:
"samtools_stats/{sample}.txt",
params:
extra="", # Optional: extra arguments.
region="xx:1000000-2000000", # Optional: region string.
log:
"logs/samtools_stats/{sample}.log",
wrapper:
"v2.2.1/bio/samtools/stats"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
samtools=1.17
snakemake-wrapper-utils=0.6.1
Params¶
extra
: additional program arguments (not -@/–threads).
Authors¶
- Julian de Ruiter
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "julianderuiter@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
bed = snakemake.input.get("bed", "")
if bed:
bed = f"-t {bed}"
samtools_opts = get_samtools_opts(
snakemake, parse_write_index=False, parse_output=False, parse_output_format=False
)
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"samtools stats {samtools_opts} {extra} {snakemake.input[0]} {bed} {region} > {snakemake.output[0]} {log}"
)
SAMTOOLS VIEW¶
Convert or filter SAM/BAM.
URL: http://www.htslib.org/doc/samtools-view.html
Example¶
This wrapper can be used in the following way:
rule samtools_view:
input:
"{sample}.sam",
output:
bam="{sample}.bam",
idx="{sample}.bai",
log:
"{sample}.log",
params:
extra="", # optional params string
region="", # optional region string
threads: 2
wrapper:
"v2.2.1/bio/samtools/view"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments (not -@/–threads, –write-index, -o or -O/–output-fmt).
- The region param allows one to specify region to extract as RNAME[:STARTPOS[-ENDPOS]] (e.g. chr1, chr2:10000000, chr3:1000-2000, ‘*’).
Software dependencies¶
samtools=1.16.1
snakemake-wrapper-utils=0.5.2
Authors¶
- Johannes Köster
- Filipe G. Vieira
- Lance Parsons
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts
samtools_opts = get_samtools_opts(snakemake)
extra = snakemake.params.get("extra", "")
region = snakemake.params.get("region", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell("samtools view {samtools_opts} {extra} {snakemake.input[0]} {region} {log}")
SEQKIT GENERIC WRAPPER¶
Run SeqKit.
URL: https://bioinf.shenwei.me/seqkit/usage/
Example¶
This wrapper can be used in the following way:
rule seqkit_seq:
input:
fasta="data/{sample}.fa",
output:
fasta="out/seq/{sample}.fa.gz",
log:
"logs/seq/{sample}.log",
params:
command="seq",
extra="--min-len 10",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_subseq_bed:
input:
fasta="data/{sample}.fa",
bed="data/{sample}.bed",
output:
fasta="out/subseq/bed/{sample}.fa.gz",
log:
"logs/subseq/bed/{sample}.log",
params:
command="subseq",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_subseq_gtf:
input:
fasta="data/{sample}.fa",
gtf="data/{sample}.gtf",
output:
fasta="out/subseq/gtf/{sample}.fa.gz",
log:
"logs/subseq/gtf/{sample}.log",
params:
command="subseq",
extra="--feature CDS",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_subseq_region:
input:
fasta="data/{sample}.fa",
output:
fasta="out/subseq/region/{sample}.fa.gz",
log:
"logs/subseq/region/{sample}.log",
params:
command="subseq",
extra="--region 1:12",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_fx2tab:
input:
fastx="data/{sample}.fastq",
output:
tsv="out/fx2tab/{sample}.tsv",
log:
"logs/fx2tab/{sample}.log",
params:
command="fx2tab",
extra="--name",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_grep_name:
input:
fastx="data/{sample}.fastq",
pattern="data/name.txt",
output:
fastx="out/grep/name/{sample}.fastq.gz",
log:
"logs/grep/name/{sample}.log",
params:
command="grep",
extra="--by-name",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_grep_seq:
input:
fastx="data/{sample}.fastq",
pattern="data/seq.txt",
output:
fastx="out/grep/seq/{sample}.fastq.gz",
log:
"logs/grep/seq/{sample}.log",
params:
command="grep",
extra="--by-seq",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_rmdup_name:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/name/{sample}.fastq.gz",
dup_num="out/rmdup/name/{sample}.num.txt",
dup_seqs="out/rmdup/name/{sample}.seq.txt",
log:
"logs/rmdup/name/{sample}.log",
params:
command="rmdup",
extra="--by-name",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_rmdup_seq:
input:
fastx="data/{sample}.fastq",
output:
fastx="out/rmdup/seq/{sample}.fastq.gz",
dup_num="out/rmdup/seq/{sample}.num.txt",
dup_seqs="out/rmdup/seq/{sample}.seq.txt",
log:
"logs/rmdup/seq/{sample}.log",
params:
command="rmdup",
extra="--by-seq",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
rule seqkit_stats:
input:
fastx="data/{sample}.fastq",
output:
stats="out/stats/{sample}.tsv",
log:
"logs/stats/{sample}.log",
params:
command="stats",
extra="--all --tabular",
threads: 2
wrapper:
"v2.2.1/bio/seqkit"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- First input and output file is considered to be the main one.
- Keys for extra input and output files need to match seqkit arguments without the -file suffix (if present).
Software dependencies¶
seqkit=2.4.0
Params¶
command
: SeqKit command to use.extra
: Optional parameters.
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra_input = " ".join(
[
f"--{key.replace('_','-')} {value}"
if key in ["bed", "gtf"]
else f"--{key.replace('_','-')}-file {value}"
for key, value in snakemake.input.items()
][1:]
)
extra_output = " ".join(
[
f"--{key.replace('_','-')} {value}"
if key in ["read1", "read2"]
else f"--{key.replace('_','-')}-file {value}"
for key, value in snakemake.output.items()
][1:]
)
shell(
"seqkit {snakemake.params.command}"
" --threads {snakemake.threads}"
" {extra_input}"
" {extra_output}"
" {extra}"
" --out-file {snakemake.output[0]}"
" {snakemake.input[0]}"
" {log}"
)
SEQTK¶
For seqtk, the following wrappers are available:
SEQTK MERGEPE¶
Interleave two paired-end FASTA/Q files
URL: https://github.com/lh3/seqtk
Example¶
This wrapper can be used in the following way:
rule seqtk_mergepe:
input:
r1="{sample}.1.fastq.gz",
r2="{sample}.2.fastq.gz",
output:
merged="{sample}.merged.fastq.gz",
params:
compress_lvl=9,
log:
"logs/seqtk_mergepe/{sample}.log",
threads: 2
wrapper:
"v2.2.1/bio/seqtk/mergepe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Multiple threads can be used during compression of the output file with pigz
.
Software dependencies¶
seqtk=1.3
pigz=2.6
Input/Output¶
Input:
- paired fastq files - can be compressed in gzip format (
*.gz
).
Output:
- a single, interleaved FASTA/Q file. By default, the output will be compressed, use the param
compress_lvl
to change this.
Params¶
compress_lvl
: Regulate the speed of compression using the specified digit, where 1 indicates the fastest compression method (less compression) and 9 indicates the slowest compression method (best compression). 0 is no compression. 11 gives a few percent better compression at a severe cost in execution time, using the zopfli algorithm. The default is 6.
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for interleaving reads from paired FASTA/Q files using seqtk."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2021, Michael Hall"
__email__ = "michael@mbh.sh"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=False)
compress_lvl = int(snakemake.params.get("compress_lvl", 6))
shell(
"(seqtk mergepe {snakemake.input} "
"| pigz -{compress_lvl} -c -p {snakemake.threads}) > {snakemake.output} {log}"
)
SEQTK-SEQ¶
Common transformations of FASTA/Q using seqtk
URL: https://github.com/lh3/seqtk
Example¶
This wrapper can be used in the following way:
rule seqtk_seq_fastq_to_fasta:
input:
"{prefix}.fastq",
output:
"{prefix}.fasta",
log:
"{prefix}.log",
params:
extra="-A",
wrapper:
"v2.2.1/bio/seqtk/seq"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
seqtk=1.4
Authors¶
- William Rowell
Code¶
"""Snakemake wrapper seqtk seq subcommand"""
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("(seqtk seq {extra} {snakemake.input} > {snakemake.output}) {log}")
SEQTK-SUBSAMPLE-PE¶
Subsample reads from paired FASTQ files
Example¶
This wrapper can be used in the following way:
rule seqtk_subsample_pe:
input:
f1="{sample}.1.fastq.gz",
f2="{sample}.2.fastq.gz"
output:
f1="{sample}.1.subsampled.fastq.gz",
f2="{sample}.2.subsampled.fastq.gz"
params:
n=3,
seed=12345
log:
"logs/seqtk_subsample/{sample}.log"
threads:
1
wrapper:
"v2.2.1/bio/seqtk/subsample/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
seqtk==1.3
pigz=2.3
Input/Output¶
Input:
- paired fastq files (can be gzip compressed)
Output:
- subsampled paired fastq files (gzip compressed)
Params¶
n
: number of reads after subsamplingseed
: seed to initialize a pseudorandom number generator
Authors¶
- Fabian Kilpert
Code¶
"""Snakemake wrapper for subsampling reads from paired FASTQ files using seqtk."""
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"( "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f1} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f1} "
"&& "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input.f2} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output.f2} "
") {log} "
)
SEQTK-SUBSAMPLE-SE¶
Subsample reads from FASTQ file
Example¶
This wrapper can be used in the following way:
rule seqtk_subsample_se:
input:
"{sample}.fastq.gz"
output:
"{sample}.subsampled.fastq.gz"
params:
n=3,
seed=12345
log:
"logs/seqtk_subsample/{sample}.log"
threads:
1
wrapper:
"v2.2.1/bio/seqtk/subsample/se"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
seqtk==1.3
pigz=2.3
Input/Output¶
Input:
- fastq file (can be gzip compressed)
Output:
- subsampled fastq file (gzip compressed)
Params¶
n
: number of reads after subsamplingseed
: seed to initialize a pseudorandom number generator
Authors¶
- Fabian Kilpert
Code¶
"""Snakemake wrapper for subsampling reads from FASTQ file using seqtk."""
__author__ = "Fabian Kilpert"
__copyright__ = "Copyright 2020, Fabian Kilpert"
__email__ = "fkilpert@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"( "
"seqtk sample "
"-s {snakemake.params.seed} "
"{snakemake.input} "
"{snakemake.params.n} "
"| pigz -9 -p {snakemake.threads} "
"> {snakemake.output} "
") {log} "
)
SHOVILL¶
Assemble bacterial isolate genomes from Illumina paired-end reads.
Example¶
This wrapper can be used in the following way:
rule shovill:
input:
r1="reads/{sample}_R1.fq.gz",
r2="reads/{sample}_R2.fq.gz"
output:
raw_assembly="assembly/{sample}.{assembler}.assembly.fa",
contigs="assembly/{sample}.{assembler}.contigs.fa"
params:
extra=""
log:
"logs/shovill/{sample}.{assembler}.log"
threads: 1
wrapper:
"v2.2.1/bio/shovill"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
shovill=1.1.0
Authors¶
- Sangram Keshari Sahu
Code¶
"""Snakemake wrapper for shovill."""
__author__ = "Sangram Keshari Sahu"
__copyright__ = "Copyright 2020, Sangram Keshari Sahu"
__email__ = "sangramsahu15@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from tempfile import TemporaryDirectory
# Placeholder for optional parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
params = snakemake.params.get("extra", "")
with TemporaryDirectory() as tempdir:
shell(
"(shovill"
" --assembler {snakemake.wildcards.assembler}"
" --outdir {tempdir} --force"
" --R1 {snakemake.input.r1}"
" --R2 {snakemake.input.r2}"
" --cpus {snakemake.threads}"
" {params}) {log}"
)
shell(
"mv {tempdir}/{snakemake.wildcards.assembler}.fasta {snakemake.output.raw_assembly}"
" && mv {tempdir}/contigs.fa {snakemake.output.contigs}"
)
SICKLE¶
For sickle, the following wrappers are available:
SICKLE PE¶
Trim paired-end reads with sickle.
Example¶
This wrapper can be used in the following way:
rule sickle_pe:
input:
r1="reads/{sample}.1.fastq",
r2="reads/{sample}.2.fastq"
output:
r1="{sample}.1.fastq",
r2="{sample}.2.fastq",
rs="{sample}.single.fastq",
log:
"logs/sickle/{sample}.log"
params:
qual_type="sanger",
# optional extra parameters
extra=""
wrapper:
"v2.2.1/bio/sickle/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sickle-trim=1.33
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle pe -f {snakemake.input.r1} -r {snakemake.input.r2} "
"-o {snakemake.output.r1} -p {snakemake.output.r2} "
"-s {snakemake.output.rs} -t {snakemake.params.qual_type} "
"{extra}) {log}"
)
SICKLE SE¶
Trim single-end reads with sickle.
Example¶
This wrapper can be used in the following way:
rule sickle_se:
input:
"reads/{sample}.1.fastq"
output:
"{sample}.1.fastq"
log:
"logs/sickle/{sample}.log"
params:
qual_type="sanger",
# optional extra parameters
extra=""
wrapper:
"v2.2.1/bio/sickle/se"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sickle-trim=1.33
Authors¶
- Wibowo Arindrarto
Code¶
__author__ = "Wibowo Arindrarto"
__copyright__ = "Copyright 2016, Wibowo Arindrarto"
__email__ = "bow@bow.web.id"
__license__ = "BSD"
from snakemake.shell import shell
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell()
shell(
"(sickle se -f {snakemake.input[0]} -o {snakemake.output[0]} "
"-t {snakemake.params.qual_type} {extra}) {log}"
)
SNP-MUTATOR¶
Generate mutated sequence files from a reference genome.
Example¶
This wrapper can be used in the following way:
NUM_SIMULATIONS = 2
rule snpmutator:
input:
"{sample}.fa"
output:
vcf = "{sample}.mutated.vcf",
sequences = expand(
"{{sample}}_mutated_{simulation_number}.fasta",
simulation_number=range(1, NUM_SIMULATIONS + 1)
)
params:
num_simulations = NUM_SIMULATIONS,
extra = " ".join([
"--num-substitutions 2",
"--num-insertions 2",
"--num-deletions 0"
]),
log:
"logs/snp-mutator/test/{sample}.log"
wrapper:
"v2.2.1/bio/snp-mutator"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snp-mutator=1.2.0
Authors¶
- Michael Hall
Code¶
"""Snakemake wrapper for SNP Mutator."""
__author__ = "Michael Hall"
__copyright__ = "Copyright 2019, Michael Hall"
__email__ = "mbhall88@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
num_simulations = snakemake.params.get("num_simulations", 100)
fasta_outdir = Path(snakemake.output.sequences[0]).absolute().parent
# Formats the log redrection string
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Executed shell command
shell(
"snpmutator {extra} "
"--num-simulations {num_simulations} "
"--vcf {snakemake.output.vcf} "
"-F {fasta_outdir} "
"{snakemake.input} {log} "
)
SNPEFF¶
For snpeff, the following wrappers are available:
SNPEFF¶
Annotate predicted effect of nucleotide changes with SnpEff.
URL: https://pcingola.github.io/SnpEff/se_introduction/
Example¶
This wrapper can be used in the following way:
rule snpeff:
input:
calls="{sample}.vcf", # (vcf, bcf, or vcf.gz)
db="resources/snpeff/ebola_zaire" # path to reference db downloaded with the snpeff download wrapper
output:
calls="snpeff/{sample}.vcf", # annotated calls (vcf, bcf, or vcf.gz)
stats="snpeff/{sample}.html", # summary statistics (in HTML), optional
csvstats="snpeff/{sample}.csv" # summary statistics in CSV, optional
log:
"logs/snpeff/{sample}.log"
resources:
java_opts="-XX:ParallelGCThreads=10",
mem_mb=4096
wrapper:
"v2.2.1/bio/snpeff/annotate"
rule snpeff_nostats:
input:
calls="{sample}.vcf",
db="resources/snpeff/ebola_zaire"
output:
calls="snpeff_nostats/{sample}.vcf", # the main output file
# if either "genes" or "stats" outputs are provided, both are created
log:
"logs/snpeff_nostats/{sample}.log"
params:
java_opts="-XX:ParallelGCThreads=10",
extra="" # optional parameters
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpeff/annotate"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpeff=5.1
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
calls
: input VCF/BCF filedb
: SnpEff database
Output:
calls
: trimmed fastq file with R1 reads, trimmed fastq file with R2 reads (PE only, optional)genes
: genes output file (optional)stats
: stats file (optional)csvstats
: stats CSV file (optional)
Params¶
java_opts
: additional arguments to be passed to the java compiler, e.g. “-XX:ParallelGCThreads=10” (not for -XmX or -Djava.io.tmpdir, since they are handled automatically).extra
: additional program arguments.
Authors¶
- Bradford Powell
Code¶
__author__ = "Bradford Powell"
__copyright__ = "Copyright 2018, Bradford Powell"
__email__ = "bpow@unc.edu"
__license__ = "BSD"
from snakemake.shell import shell
from os import path
import shutil
import tempfile
from pathlib import Path
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
outcalls = snakemake.output.calls
if outcalls.endswith(".vcf.gz"):
outprefix = "| bcftools view -Oz"
elif outcalls.endswith(".bcf"):
outprefix = "| bcftools view -Ob"
else:
outprefix = ""
incalls = snakemake.input[0]
if incalls.endswith(".bcf"):
incalls = "< <(bcftools view {})".format(incalls)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
data_dir = Path(snakemake.input.db).parent.resolve()
stats = snakemake.output.get("stats", "")
csvstats = snakemake.output.get("csvstats", "")
csvstats_opt = "" if not csvstats else "-csvStats {}".format(csvstats)
stats_opt = "-noStats" if not stats else "-stats {}".format(stats)
reference = path.basename(snakemake.input.db)
shell(
"snpEff {java_opts} -dataDir {data_dir} "
"{stats_opt} {csvstats_opt} {extra} "
"{reference} {incalls} "
"{outprefix} > {outcalls} {log}"
)
SNPEFF DOWNLOAD¶
Download snpeff DB for a given species.
Example¶
This wrapper can be used in the following way:
rule snpeff_download:
output:
# wildcard {reference} may be anything listed in `snpeff databases`
directory("resources/snpeff/{reference}")
log:
"logs/snpeff/download/{reference}.log"
params:
reference="{reference}"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpeff/download"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpeff=5.1
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
reference = snakemake.params.reference
outdir = Path(snakemake.output[0]).parent.resolve()
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("snpEff download {java_opts} -dataDir {outdir} {reference} {log}")
SNPSIFT¶
For snpsift, the following wrappers are available:
SNPSIFT ANNOTATE¶
Annotate using fields from another VCF file with SnpSift
Example¶
This wrapper can be used in the following way:
rule test_snpsift_annotate:
input:
call="in.vcf",
database="annotation.vcf"
output:
call="annotated/out.vcf"
log:
"annotate.log"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpsift/annotate"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpsift=5.1
bcftools=1.17
pbgzip=2016.08.04
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- A VCF-formatted file that is to be annoated
- A VCF-formatted annotation file
Output:
- A VCF-formatted file
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift annotate"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1
incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
min_threads += 1
incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
min_threads += 1
incall = "< <(gunzip -c {})".format(incall)
outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
min_threads += 1
outcall = "| bcftools view -Oz > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
min_threads += 1
outcall = "| bcftools view -Ob > {}".format(outcall)
else:
outcall = "> {}".format(outcall)
if snakemake.threads < min_threads:
raise ValueError(
"At least {} threads required, {} provided".format(
min_threads, snakemake.threads
)
)
shell(
"SnpSift annotate" # Tool and its subcommand
" {java_opts} {extra}" # Extra parameters
" {snakemake.input.database}" # Path to annotation vcf file
" {incall} " # Path to input vcf file
" {outcall} " # Path to output vcf file
" {log}" # Logging behaviour
)
SNPSIFT DBNSFP¶
Annotate using integrated annotation from dbNSFP with SnpSift
Example¶
This wrapper can be used in the following way:
rule test_snpsift_dbnsfp:
input:
call = "in.vcf",
dbNSFP = "dbNSFP.txt.gz"
output:
call = "out.vcf"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpsift/dbnsfp"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpsift=5.1
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift dbNSFP"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Using user-defined file if requested
db = snakemake.input.get("dbNSFP", "")
if db != "":
db = "-db {}".format(db)
min_threads = 1
# Uncompression shall be done on user request
incall = snakemake.input["call"]
if incall.endswith("bcf"):
min_threads += 1
incall = "< <(bcftools view {})".format(incall)
elif incall.endswith("gz"):
min_threads += 1
incall = "< <(gunzip -c {})".format(incall)
# Compression shall be done according to user-defined output
outcall = snakemake.output["call"]
if outcall.endswith("gz"):
min_threads += 1
outcall = "| gzip -c > {}".format(outcall)
elif outcall.endswith("bcf"):
min_threads += 1
outcall = "| bcftools view > {}".format(outcall)
else:
outcall = "> {}".format(outcall)
# Each (un)compression raises the thread number
if snakemake.threads < min_threads:
raise ValueError(
"At least {} threads required, {} provided".format(
min_threads, snakemake.threads
)
)
shell(
"SnpSift dbnsfp" # Tool and its subcommand
" {java_opts} {extra}" # Extra parameters
" {db}" # Path to annotation vcf file
" {incall}" # Path to input vcf file
" {outcall}" # Path to output vcf file
" {log}" # Logging behaviour
)
SNPSIFT GENES SETS¶
Annotate using GMT genes sets with SnpSift
Example¶
This wrapper can be used in the following way:
rule test_snpsift_gmt:
input:
call = "in.vcf",
gmt = "fake_set.gmt"
output:
call = "annotated/out.vcf"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpsift/genesets"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpsift=5.1
bcftools=1.17
snakemake-wrapper-utils=0.6.1
Input/Output¶
Input:
- Calls that are to be annotated
- A GMT-formatted annotation file
Output:
- Annotated calls
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift geneSets"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1
# Uncompression shall be done according to user-defined input
incall = snakemake.input["call"]
if snakemake.input["call"].endswith("bcf"):
min_threads += 1
incall = "< <(bcftools view {})".format(incall)
elif snakemake.input["call"].endswith("gz"):
min_threads += 1
incall = "< <(gunzip -c {})".format(incall)
# Compression shall be done according to user-defined output
outcall = snakemake.output["call"]
if snakemake.output["call"].endswith("gz"):
min_threads += 1
outcall = "| gzip -c > {}".format(outcall)
elif snakemake.output["call"].endswith("bcf"):
min_threads += 1
outcall = "| bcftools view > {}".format(outcall)
else:
outcall = "> {}".format(outcall)
# Each (un)compression step raises the threads requirements
if snakemake.threads < min_threads:
raise ValueError(
"At least {} threads required, {} provided".format(
min_threads, snakemake.threads
)
)
shell(
"SnpSift geneSets" # Tool and its subcommand
" {java_opts} {extra}" # Extra parameters
" {snakemake.input.gmt}" # Path to annotation vcf file
" {incall}" # Path to input vcf file
" {outcall}" # Path to output vcf file
" {log}" # Logging behaviour
)
SNPSIFT GWAS CATALOG¶
Annotate using GWAS catalog with SnpSift
Example¶
This wrapper can be used in the following way:
rule test_snpsift_gwascat:
input:
call = "in.vcf",
gwascat = "gwascatalog.txt"
output:
call = "annotated/out.vcf"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/snpsift/gwascat"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpsift=5.1
bcftools=1.17
snakemake-wrapper-utils=0.5.3
Input/Output¶
Input:
- Calls that are to be annotated (vcf, bcf, vcf.gz)
- A GWAS Catalog TSV-formatted file
Output:
- Annotated calls (vcf, bcf, vcf.gz)
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift gwasCat"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
min_threads = 1
# Uncompression shall be done based on user input
incall = snakemake.input["call"]
if incall.endswith("bcf"):
min_threads += 1
incall = "< <(bcftools view {})".format(incall)
elif incall.endswith("gz"):
min_threads += 1
incall = "< <(gunzip -c {})".format(incall)
# Compression shall be done based on user-defined output
outcall = snakemake.output["call"]
if outcall.endswith("bcf"):
min_threads += 1
outcall = "| bcftools view {}".format(outcall)
elif outcall.endswith("gz"):
min_threads += 1
outcall = "| gzip -c > {}".format(outcall)
else:
outcall = "> {}".format(outcall)
# Each additional (un)compression step requires more threads
if snakemake.threads < min_threads:
raise ValueError(
"At least {} threads required, {} provided".format(
min_threads, snakemake.threads
)
)
shell(
"SnpSift gwasCat " # Tool and its subcommand
" {java_opts} {extra} " # Extra parameters
" -db {snakemake.input.gwascat} " # Path to gwasCat file
" {incall} " # Path to input vcf file
" {outcall} " # Path to output vcf file
" {log} " # Logging behaviour
)
SNPSIFT VARTYPE¶
Add an INFO field denoting variant type with SnpSift
Example¶
This wrapper can be used in the following way:
rule test_snpsift_vartype:
input:
vcf="in.vcf"
output:
vcf="annotated/out.vcf"
message:
"Testing SnpSift varType"
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
log:
"varType.log"
wrapper:
"v2.2.1/bio/snpsift/varType"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
snpsift=5.1
snakemake-wrapper-utils=0.5.3
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for SnpSift varType"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"SnpSift varType" # Tool and its subcommand
" {java_opts} {extra}" # Extra parameters
" {snakemake.input.vcf}" # Path to input vcf file
" > {snakemake.output.vcf}" # Path to output vcf file
" {log}" # Logging behaviour
)
SOURMASH¶
For sourmash, the following wrappers are available:
SOURMASH_COMPUTE¶
Build a MinHash signature for a transcriptome, genome, or reads
Example¶
This wrapper can be used in the following way:
rule sourmash_reads:
input:
"reads/a.fastq"
output:
"reads.sig"
log:
"logs/sourmash/sourmash_compute_reads.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"v2.2.1/bio/sourmash/compute"
rule sourmash_transcriptome:
input:
"assembly/transcriptome.fasta"
output:
"transcriptome.sig"
log:
"logs/sourmash/sourmash_compute_transcriptome.log"
threads: 2
params:
# optional parameters
k = "31",
scaled = "1000",
extra = ""
wrapper:
"v2.2.1/bio/sourmash/compute"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
sourmash=4.8.2
Authors¶
- Lisa K. Johnson
Code¶
"""Snakemake wrapper for sourmash compute."""
__author__ = "Lisa K. Johnson"
__copyright__ = "Copyright 2018, Lisa K. Johnson"
__email__ = "ljcohen@ucdavis.edu"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
scaled = snakemake.params.get("scaled", "1000")
k = snakemake.params.get("k", "31")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"sourmash compute --scaled {scaled} -k {k} {snakemake.input} -o {snakemake.output}"
" {extra} {log}"
)
SPADES¶
For spades, the following wrappers are available:
METASPADES¶
Assemble metagenome with metaspades. For more information see the Spades documentation.
Metagenome assembly uses a lot of computational resources. Spades is told to restart from a previous checkpont if the file params.txt exist in the output directory. In this way one can use snakemake with –restart-times to automatically restart the assembly.
Input of metaspades should be at least one paired-end library (=2 fastq files) optionally merged reads as a third fastq file might be supplied and singleton reads as a 4th input file. Long reads can also be input as pacbio or nanopore input argument. To distinguish short from long reads. Use the reads as name for the short reads.
Example¶
This wrapper can be used in the following way:
container: "docker://continuumio/miniconda3:4.4.10"
rule run_metaspades:
input:
reads=["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
output:
contigs="assembly/contigs.fasta",
scaffolds="assembly/scaffolds.fasta",
dir=directory("assembly/intermediate_files"),
benchmark:
"logs/benchmarks/assembly/spades.txt"
params:
# all parameters are optional
k="auto",
extra="--only-assembler",
log:
"log/spades.log",
threads: 8
resources:
mem_mem=250000,
time=60 * 24,
wrapper:
"v2.2.1/bio/spades/metaspades"
rule download_test_reads:
output:
["test_reads/sample1_R1.fastq.gz", "test_reads/sample1_R2.fastq.gz"],
log:
"log/download.log",
shell:
" wget https://zenodo.org/record/3992790/files/test_reads.tar.gz >> {log} 2>&1 ; "
" tar -xzf test_reads.tar.gz >> {log} 2>&1"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
spades=3.15.5
python=3.11.4
Authors¶
- Silas Kieser
- Anton Korobeynikov
Code¶
"""Snakemake wrapper for metaspades."""
__author__ = "Silas Kieser @silask"
__copyright__ = "Copyright 2021, Silas Kieser"
__email__ = "silas.kieser@gmail.com"
__license__ = "MIT"
import os, shutil
from snakemake.shell import shell
# infer output directory
if hasattr(snakemake.output, "dir"):
output_dir = snakemake.output.dir
else:
# get output_dir file from output
if hasattr(snakemake.output, "contigs"):
output_file = snakemake.output.contigs
elif hasattr(snakemake.output, "scaffolds"):
output_file = snakemake.output.scaffolds
else:
output_file = snakemake.output[0]
output_dir = os.path.split(output_file)[0]
# parse params
extra = snakemake.params.get("extra", "")
kmers = snakemake.params.get("k", "'auto'")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
if hasattr(snakemake.resources, "mem_mb"):
mem_gb = snakemake.resources.mem_mb // 1000
memory_requirements = f" --memory {mem_gb}"
else:
memory_requirements = ""
if not os.path.exists(os.path.join(output_dir, "params.txt")):
# parse short reads
if hasattr(snakemake.input, "reads"):
reads = snakemake.input.reads
else:
reads = snakemake.input
assert (
len(reads) > 1
), "Metaspades needs a paired end library. This means you should supply at least 2 fastq files in the rule input."
assert (
type(reads[0]) == str
), f"Metaspades allows only 1 library. Therefore reads need to be strings got {reads}"
input_arg = " --pe1-1 {0} --pe1-2 {1} ".format(*reads)
if len(reads) >= 3:
input_arg += " --pe1-m {2}".format(*reads)
if len(reads) >= 4:
input_arg += " --pe1-s {3}".format(*reads)
# parse long reads
for longread_name in ["pacbio", "nanopore"]:
if hasattr(snakemake.input, longread_name):
input_arg += " --{name} {}".format(name=longread_name, **snakemake.input)
shell(
"spades.py --meta "
" --threads {snakemake.threads} "
" {memory_requirements} "
" -o {output_dir} "
" -k {kmers} "
" {input_arg} "
" {extra} "
" > {snakemake.log[0]} 2>&1 "
)
else:
# params.txt file exitst already I restart from previous run
shell(
"echo '\n\nRestart Spades \n Remove pipline_state file copy files to force copy files if necessary.' >> {log[0]}"
)
shell("rm -f {output_dir}/pipeline_state/stage_*_copy_files 2>> {log}")
shell(
"spades.py --meta "
" --restart-from last "
" --threads {threads} "
" {memory_requirements} "
" -o {output_dir} "
" >> {snakemake.log[0]} 2>&1 "
)
# Rename/ move output files
Output_key_mapping = {
"contigs": "contigs.fasta",
"scaffolds": "scaffolds.fasta",
"graph": "assembly_graph_with_scaffolds.gfa",
}
has_named_output = False
for key in Output_key_mapping:
if hasattr(snakemake.output, key):
has_named_output = True
file_produced = os.path.join(output_dir, Output_key_mapping[key])
file_renamed = getattr(snakemake.output, key)
if file_produced != file_renamed:
shutil.move(file_produced, file_renamed)
if not has_named_output:
file_produced = os.path.join(output_dir, "contigs.fasta")
file_renamed = snakemake.output[0]
if file_produced != file_renamed:
shutil.move(file_produced, file_renamed)
SRA-TOOLS¶
For sra-tools, the following wrappers are available:
SRA-TOOLS FASTERQ-DUMP¶
Download FASTQ files from SRA.
Example¶
This wrapper can be used in the following way:
rule get_fastq_pe:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq",
"data/pe/{accession}_2.fastq",
log:
"logs/pe/{accession}.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
rule get_fastq_pe_gz:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq.gz",
"data/pe/{accession}_2.fastq.gz",
log:
"logs/pe/{accession}.gz.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
rule get_fastq_pe_bz2:
output:
# the wildcard name must be accession, pointing to an SRA number
"data/pe/{accession}_1.fastq.bz2",
"data/pe/{accession}_2.fastq.bz2",
log:
"logs/pe/{accession}.bz2.log"
params:
extra="--skip-technical"
threads: 6 # defaults to 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
rule get_fastq_se:
output:
"data/se/{accession}.fastq"
log:
"logs/se/{accession}.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
rule get_fastq_se_gz:
output:
"data/se/{accession}.fastq.gz"
log:
"logs/se/{accession}.gz.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
rule get_fastq_se_bz2:
output:
"data/se/{accession}.fastq.bz2"
log:
"logs/se/{accession}.bz2.log"
params:
extra="--skip-technical"
threads: 6
wrapper:
"v2.2.1/bio/sra-tools/fasterq-dump"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The output format is automatically detected and, if needed, files compressed with either gzip or bzip2.
- Currently only supports PE samples
- The extra param alllows for additional program arguments.
- More information in, https://github.com/ncbi/sra-tools
Software dependencies¶
sra-tools=3.0.5
pigz=2.6
pbzip2=1.1.13
snakemake-wrapper-utils=0.5.3
Authors¶
- Johannes Köster
- Derek Croote
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Parse memory
mem_mb = get_mem(snakemake, "MiB")
# Outdir
outdir = os.path.dirname(snakemake.output[0])
if outdir:
outdir = f"--outdir {outdir}"
# Output compression
compress = ""
mem = f"-m{mem_mb}" if mem_mb else ""
for output in snakemake.output:
out_name, out_ext = os.path.splitext(output)
if out_ext == ".gz":
compress += f"pigz -p {snakemake.threads} {out_name}; "
elif out_ext == ".bz2":
compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; "
with tempfile.TemporaryDirectory() as tmpdir:
mem = f"--mem {mem_mb}M" if mem_mb else ""
shell(
"(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} "
"{extra} {outdir} {snakemake.wildcards.accession}; "
"{compress}"
") {log}"
)
STAR¶
For star, the following wrappers are available:
STAR¶
Map reads with STAR.
URL: https://github.com/alexdobin/STAR
Example¶
This wrapper can be used in the following way:
rule star_pe_multi:
input:
# use a list for multiple fastq files for one sample
# usually technical replicates across lanes/flowcells
fq1=["reads/{sample}_R1.1.fastq", "reads/{sample}_R1.2.fastq"],
# paired end reads needs to be ordered so each item in the two lists match
fq2=["reads/{sample}_R2.1.fastq", "reads/{sample}_R2.2.fastq"], #optional
# path to STAR reference genome index
idx="index",
output:
# see STAR manual for additional output files
aln="star/pe/{sample}/pe_aligned.sam",
log="logs/pe/{sample}/Log.out",
sj="star/pe/{sample}/SJ.out.tab",
log:
"logs/pe/{sample}.log",
params:
# optional parameters
extra="",
threads: 8
wrapper:
"v2.2.1/bio/star/align"
rule star_se:
input:
fq1="reads/{sample}_R1.1.fastq",
# path to STAR reference genome index
idx="index",
output:
# see STAR manual for additional output files
aln="star/se/{sample}/se_aligned.bam",
log="logs/se/{sample}/Log.out",
log_final="logs/se/{sample}/Log.final.out",
log:
"logs/se/{sample}.log",
params:
# optional parameters
extra="--outSAMtype BAM Unsorted",
threads: 8
wrapper:
"v2.2.1/bio/star/align"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- It is advisable to consider updating the limits setting before running STAR, such as executing ulimit -n 10000, to avoid an issue like this: https://github.com/alexdobin/STAR/issues/1344
Software dependencies¶
star=2.7.10b
Authors¶
- Johannes Köster
- Tomás Di Domenico
- Filipe G. Vieira
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fq1 = snakemake.input.get("fq1")
assert fq1 is not None, "input-> fq1 is a required input parameter"
fq1 = (
[snakemake.input.fq1]
if isinstance(snakemake.input.fq1, str)
else snakemake.input.fq1
)
fq2 = snakemake.input.get("fq2")
if fq2:
fq2 = (
[snakemake.input.fq2]
if isinstance(snakemake.input.fq2, str)
else snakemake.input.fq2
)
assert len(fq1) == len(
fq2
), "input-> equal number of files required for fq1 and fq2"
input_str_fq1 = ",".join(fq1)
input_str_fq2 = ",".join(fq2) if fq2 is not None else ""
input_str = " ".join([input_str_fq1, input_str_fq2])
if fq1[0].endswith(".gz"):
readcmd = "--readFilesCommand gunzip -c"
elif fq1[0].endswith(".bz2"):
readcmd = "--readFilesCommand bunzip2 -c"
else:
readcmd = ""
index = snakemake.input.get("idx")
if not index:
index = snakemake.params.get("idx", "")
if "--outSAMtype BAM SortedByCoordinate" in extra:
stdout = "BAM_SortedByCoordinate"
elif "BAM Unsorted" in extra:
stdout = "BAM_Unsorted"
else:
stdout = "SAM"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"STAR "
" --runThreadN {snakemake.threads}"
" --genomeDir {index}"
" --readFilesIn {input_str}"
" {readcmd}"
" {extra}"
" --outTmpDir {tmpdir}/STARtmp"
" --outFileNamePrefix {tmpdir}/"
" --outStd {stdout}"
" > {snakemake.output.aln}"
" {log}"
)
if snakemake.output.get("reads_per_gene"):
shell("cat {tmpdir}/ReadsPerGene.out.tab > {snakemake.output.reads_per_gene:q}")
if snakemake.output.get("chim_junc"):
shell("cat {tmpdir}/Chimeric.out.junction > {snakemake.output.chim_junc:q}")
if snakemake.output.get("sj"):
shell("cat {tmpdir}/SJ.out.tab > {snakemake.output.sj:q}")
if snakemake.output.get("log"):
shell("cat {tmpdir}/Log.out > {snakemake.output.log:q}")
if snakemake.output.get("log_progress"):
shell("cat {tmpdir}/Log.progress.out > {snakemake.output.log_progress:q}")
if snakemake.output.get("log_final"):
shell("cat {tmpdir}/Log.final.out > {snakemake.output.log_final:q}")
STAR INDEX¶
Index fasta sequences with STAR
URL: https://github.com/alexdobin/STAR
Example¶
This wrapper can be used in the following way:
rule star_index:
input:
fasta="{genome}.fasta",
output:
directory("{genome}"),
message:
"Testing STAR index"
threads: 1
params:
extra="",
log:
"logs/star_index_{genome}.log",
wrapper:
"v2.2.1/bio/star/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
star=2.7.10b
Input/Output¶
Input:
- A (multi)fasta formatted file
Output:
- A directory containing the indexed sequence for downstream STAR mapping
Params¶
sjdbOverhang
: length of the donor/acceptor sequence on each side of the junctions (optional)extra
: additional program arguments.
Authors¶
- Thibault Dayris
- Tomás Di Domenico
- Filipe G. Vieira
Code¶
"""Snakemake wrapper for STAR index"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
sjdb_overhang = snakemake.params.get("sjdbOverhang", "")
if sjdb_overhang:
sjdb_overhang = f"--sjdbOverhang {sjdb_overhang}"
gtf = snakemake.input.get("gtf", "")
if gtf:
gtf = f"--sjdbGTFfile {gtf}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"STAR"
" --runThreadN {snakemake.threads}" # Number of threads
" --runMode genomeGenerate" # Indexation mode
" --genomeFastaFiles {snakemake.input.fasta}" # Path to fasta files
" {sjdb_overhang}" # Read-len - 1
" {gtf}" # Highly recommended GTF
" {extra}" # Optional parameters
" --outTmpDir {tmpdir}/STARtmp" # Temp dir
" --genomeDir {snakemake.output}" # Path to output
" {log}" # Logging
)
STRELKA¶
For strelka, the following wrappers are available:
STRELKA GERMLINE¶
Call germline variants with Strelka.
Example¶
This wrapper can be used in the following way:
rule strelka_germline:
input:
# the required bam file
bam="mapped/{sample}.bam",
# path to reference genome fasta and index
fasta="genome.fasta",
fasta_index="genome.fasta.fai",
output:
# Strelka results - either use directory or complete file path
variants="strelka/{sample}.vcf.gz",
variants_index="strelka/{sample}.vcf.gz.tbi",
sample_genomes=["strelka/{sample}.genome.vcf.gz"],
sample_genomes_indices=["strelka/{sample}.genome.vcf.gz.tbi"],
log:
"logs/strelka/germline/{sample}.log",
params:
# optional parameters
config_extra="",
run_extra="",
threads: 8
wrapper:
"v2.2.1/bio/strelka/germline"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strelka=2.9.10
Authors¶
- Jan Forster
- Christopher Schröder
Code¶
__author__ = "Jan Forster, Christopher Schröder"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "jan.forster@uk-essen.de"
__license__ = "MIT"
import tempfile
import glob
from pathlib import Path
from snakemake.shell import shell
config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
bam = snakemake.input.get("bam") # input bam file, required
assert bam is not None, "input-> bam is a required input parameter"
if isinstance(bam, str):
bam = [bam]
if snakemake.output.get("sample_genomes"):
assert len(bam) == len(
snakemake.output.get("sample_genomes")
), "number of input bams and sample_genomes must be equal "
if snakemake.output.get("sample_genomes_indices"):
assert len(bam) == len(
snakemake.output.get("sample_genomes_indices")
), "number of input bams and sample_genomes_indices must be equal "
bam_input = " ".join(f"--bam {b}" for b in bam)
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(configureStrelkaGermlineWorkflow.py " # configure the strelka run
"{bam_input} " # input bam
"--referenceFasta {snakemake.input.fasta} " # reference genome
"--runDir {tmpdir} " # output directory
"{config_extra} " # additional parameters for the configuration
"&& {tmpdir}/runWorkflow.py " # run the strelka workflow
"-m local " # run in local mode
"-j {snakemake.threads} " # number of threads
"{run_extra}) " # additional parameters for the run
"{log}"
) # logging
if snakemake.output.get("variants"):
shell(
"cat {tmpdir}/results/variants/variants.vcf.gz > {snakemake.output.variants:q}"
)
if snakemake.output.get("variants_index"):
shell(
"cat {tmpdir}/results/variants/variants.vcf.gz.tbi > {snakemake.output.variants_index:q}"
)
if targets := snakemake.output.get("sample_genomes"):
origins = glob.glob(f"{tmpdir}/results/variants/genome.S*.vcf.gz")
assert len(origins) == len(targets)
for origin, target in zip(origins, targets):
shell(f"cat {origin} > {target}")
if targets := snakemake.output.get("sample_genomes_indices"):
origins = glob.glob(f"{tmpdir}/results/variants/genome.S*.vcf.gz.tbi")
assert len(origins) == len(targets)
for origin, target in zip(origins, targets):
shell(f"cat {origin} > {target}")
STRELKA¶
Strelka calls somatic and germline small variants from mapped sequencing reads
Example¶
This wrapper can be used in the following way:
rule strelka:
input:
# The normal bam and its index
# are optional input
# normal = "data/b.bam",
# normal_index = "data/b.bam.bai"
tumor="data/{tumor}.bam",
tumor_index="data/{tumor}.bam.bai",
fasta="data/genome.fasta",
fasta_index="data/genome.fasta.fai",
output:
# Strelka output - can be directory or full file path
directory("{tumor}_vcf"),
threads: 1
params:
run_extra="",
config_extra="",
log:
"logs/strelka_{tumor}.log",
wrapper:
"v2.2.1/bio/strelka/somatic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strelka=2.9.10
Input/Output¶
Input:
- A tumor bam file, with its index.
- A reference genome sequence in fasta format, with its index.
- An optional normal bam file for somatic calling, with its index.
Output:
- Statistics about calling results
- Variants called
Authors¶
- Thibault Dayris
- Christopher Schröder
Code¶
"""Snakemake wrapper for Strelka"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
from snakemake.utils import makedirs
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
config_extra = snakemake.params.get("config_extra", "")
run_extra = snakemake.params.get("run_extra", "")
# If a normal bam is given in input,
# then it should be provided in the input
# block, so Snakemake will perform additional
# tests on file existance.
normal = (
"--normalBam {}".format(snakemake.input["normal"])
if "normal" in snakemake.input.keys()
else ""
)
if snakemake.output[0].endswith("vcf.gz"):
run_dir = Path(snakemake.output[0]).parents[2]
else:
run_dir = snakemake.output
shell(
"(configureStrelkaSomaticWorkflow.py " # Configuration script
"{normal} " # Path to normal bam (if any)
"--tumorBam {snakemake.input.tumor} " # Path to tumor bam
"--referenceFasta {snakemake.input.fasta} " # Path to fasta file
"--runDir {run_dir} " # Path to output directory
"{config_extra} " # Extra parametersfor configuration
" && "
"{run_dir}/runWorkflow.py " # Run the pipeline
"--mode local " # Stop internal job submission
"--jobs {snakemake.threads} " # Nomber of threads
"{run_extra}) " # Extra parameters for runWorkflow
"{log}" # Logging behaviour
)
STRLING¶
For strling, the following wrappers are available:
STRLING CALL¶
STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. call
calls genotypes/estimate allele sizes for all loci in each sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Example¶
This wrapper can be used in the following way:
rule strling_call:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
bin="extract/{sample}.bin",
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
bounds="merged/group-bounds.txt" # optional, produced by strling merge
output:
"call/{sample}-bounds.txt", # must end with -bounds.txt
"call/{sample}-genotype.txt", # must end with -genotype.txt
"call/{sample}-unplaced.txt" # must end with -unplaced.txt
params:
extra="" # optional extra command line arguments
log:
"log/strling/call/{sample}.log"
wrapper:
"v2.2.1/bio/strling/call"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strling=0.5.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling call"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
bin = snakemake.input.get("bin", None)
reference = snakemake.input.get("reference", None)
bounds = snakemake.input.get("bounds", None)
if not bam or (isinstance(bam, list) and len(bam) != 1):
raise ValueError("Please provide exactly one 'bam' as input.")
if not path.exists(bam + ".bai"):
raise ValueError(
"Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
)
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not bounds: # optional
bounds_string = ""
else:
bounds_string = "-b {}".format(bounds)
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if not any(o.endswith("-bounds.txt") for o in snakemake.output):
raise ValueError("Please provide a file that ends with -bounds.txt in the output.")
for filename in snakemake.output:
if filename.endswith("-bounds.txt"):
prefix = filename[: -len("-bounds.txt")]
break
if not any(o == "{}-genotype.txt".format(prefix) for o in snakemake.output):
raise ValueError(
"Please provide an output file that ends with -genotype.txt and has the same prefix as -bounds.txt"
)
if not any(o == "{}-unplaced.txt".format(prefix) for o in snakemake.output):
raise ValueError(
"Please provide an output file that ends with -unplaced.txt and has the same prefix as -bounds.txt"
)
shell(
"(strling call "
"{bam} "
"{bin} "
"{bounds_string} "
"-o {prefix} "
"{extra}) {log}"
)
STRLING EXTRACT¶
STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. extract
retrieves informative read pairs to a binary format for a single sample (same as above, you can use the same bin files). Documentation at: https://strling.readthedocs.io/en/latest/run.html
Example¶
This wrapper can be used in the following way:
rule strling_extract:
input:
bam="mapped/{sample}.bam",
bai="mapped/{sample}.bam.bai",
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
index="reference/genome.fasta.str" # optional
output:
"extract/{sample}.bin"
log:
"log/strling/extract/{sample}.log"
params:
extra="" # optionally add further command line arguments
wrapper:
"v2.2.1/bio/strling/extract"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strling=0.5.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling extract"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bam = snakemake.input.get("bam", None)
reference = snakemake.input.get("reference", None)
index = snakemake.input.get("index", None)
if not bam or (isinstance(bam, list) and len(bam) != 1):
raise ValueError("Please provide exactly one 'bam' input.")
if not path.exists(bam + ".bai"):
raise ValueError(
"Please index the bam file. The index file must have same file name as the bam file, with '.bai' appended."
)
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if not index: # optional
index_string = ""
else:
index_string = "-g {}".format(index)
if len(snakemake.output) != 1:
raise ValueError("Please provide exactly one output file (.bin).")
shell(
"(strling extract "
"{bam} "
"{snakemake.output[0]} "
"-f {reference} "
"{index_string} "
"{extra}) {log}"
)
STRLING INDEX¶
STRling (pronounced like “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. index
creates a bed file of large STR regions in the reference genome. This step is performed automatically as part of strling extract
. However, when running multiple samples, it is more efficient to do it once, then pass the file to strling extract using the -g
option. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Example¶
This wrapper can be used in the following way:
rule strling_index:
input:
"reference/genome.fasta"
output:
index="reference/genome.fasta.str",
fai="reference/genome.fasta.fai"
params:
extra="" # optionally add further command line arguments
log:
"log/strling/index.log"
wrapper:
"v2.2.1/bio/strling/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strling=0.5.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling index"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
if len(snakemake.input) != 1:
raise ValueError("Please provide exactly one reference genome.")
shell(
"(strling index {snakemake.input[0]} "
"-g {snakemake.output.index} "
"{extra}) {log}"
)
STRLING MERGE¶
STRling (pronounced “sterling”) is a method to detect large short tandem repeat (STR) expansions from short-read sequencing data. merge
prepares joint calling of STR loci across all given samples. Requires minimum read evidence from at least one sample. Documentation at: https://strling.readthedocs.io/en/latest/run.html
Example¶
This wrapper can be used in the following way:
rule strling_merge:
input:
bins=["extract/A.bin", "extract/B.bin"],
reference="reference/genome.fasta",
fai="reference/genome.fasta.fai",
output:
"merged/group-bounds.txt" # must end with "-bounds.txt"
params:
extra="" # optionally add further command line arguments
log:
"log/strling/merge/group.log"
wrapper:
"v2.2.1/bio/strling/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
strling=0.5.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for strling merge"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroede@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
# Creating log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Placeholder for optional parameters
extra = snakemake.params.get("extra", "")
# Check inputs/arguments.
bins = snakemake.input.get("bins", None)
reference = snakemake.input.get("reference", None)
fai = snakemake.input.get("fai", None)
if not bins or len(bins) < 2:
raise ValueError("Please provide at least two 'bins' as input.")
if not reference:
raise ValueError("Please provide a fasta 'reference' input.")
if not path.exists(reference + ".fai"):
raise ValueError(
"Please index the reference. The index file must have same file name as the reference file, with '.fai' appended."
)
if len(snakemake.output) != 1:
raise ValueError("Please provide exactly one output file (.bin).")
if not snakemake.output[0].endswith("-bounds.txt"):
raise ValueError(
"Output file must end with '-bounds.txt'. Please change the output file name."
)
prefix = snakemake.output[0][: -len("-bounds.txt")]
shell("(strling merge " "{bins} " "-o {prefix} " "{extra}) {log}")
SUBREAD¶
For subread, the following wrappers are available:
SUBREAD FEATURECOUNTS¶
FeatureCounts assign mapped reads or fragments (paired-end data) to genomic features such as genes, exons and promoters.
URL: http://subread.sourceforge.net/
Example¶
This wrapper can be used in the following way:
rule feature_counts:
input:
# list of sam or bam files
samples="{sample}.bam",
annotation="annotation.gtf",
# optional input
#chr_names="", # implicitly sets the -A flag
#fasta="genome.fasta" # implicitly sets the -G flag
output:
multiext(
"results/{sample}",
".featureCounts",
".featureCounts.summary",
".featureCounts.jcounts",
),
threads: 2
params:
strand=0, # optional; strandness of the library (0: unstranded [default], 1: stranded, and 2: reversely stranded)
r_path="", # implicitly sets the --Rpath flag
extra="-O --fracOverlap 0.2 -J -p",
log:
"logs/{sample}.log",
wrapper:
"v2.2.1/bio/subread/featurecounts"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The strand param allows to specify the strandness of the library (0: unstranded, 1: stranded, and 2: reversely stranded)
- The extra param allows for additional program arguments.
Software dependencies¶
subread=2.0.6
Input/Output¶
Input:
- a list of .sam or .bam files
- GTF, GFF or SAF annotation file
- optional a tab separating file that determines the sorting order and contains the chromosome names in the first column
- optional a fasta index file
Output:
- Feature counts file including read counts (tab separated)
- Summary file including summary statistics (tab separated)
- Junction counts file including count number of reads supporting each exon-exon junction (tab separated)
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2020, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import tempfile
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# optional input files and directories
strand = snakemake.params.get("strand", 0)
fasta = snakemake.input.get("fasta", "")
if fasta:
fasta = f"-G {fasta}"
chr_names = snakemake.input.get("chr_names", "")
if chr_names:
chr_names = f"-A {chr_names}"
r_path = snakemake.params.get("r_path", "")
if r_path:
r_path = f"--Rpath {r_path}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"featureCounts"
" -T {snakemake.threads}"
" -s {strand}"
" -a {snakemake.input.annotation}"
" {fasta}"
" {chr_names}"
" {r_path}"
" {extra}"
" --tmpDir {tmpdir}"
" -o {snakemake.output[0]}"
" {snakemake.input.samples}"
" {log}"
)
TABIX¶
For tabix, the following wrappers are available:
TABIX INDEX¶
Process given file with tabix (e.g., create index).
URL: https://www.htslib.org/doc/tabix.html#INDEXING_OPTIONS
Example¶
This wrapper can be used in the following way:
rule tabix:
input:
"{prefix}.vcf.gz",
output:
"{prefix}.vcf.gz.tbi",
log:
"logs/tabix/{prefix}.log",
params:
# pass arguments to tabix (e.g. index a vcf)
"-p vcf",
wrapper:
"v2.2.1/bio/tabix/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Specify tabix index params (e.g. -p vcf) through params.
Software dependencies¶
htslib=1.17
Input/Output¶
Input:
- Bgzip compressed file (e.g. BED.gz, GFF.gz, or VCF.gz)
Output:
- Tabix index file
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell("tabix {snakemake.params} {snakemake.input[0]} {log}")
TABIX QUERY¶
Query given file with tabix.
URL: https://www.htslib.org/doc/tabix.html#QUERYING_AND_OTHER_OPTIONS
Example¶
This wrapper can be used in the following way:
rule tabix:
input:
## list the VCF/BCF as the first input
## and the index as the second input
"{prefix}.bed.gz",
"{prefix}.bed.gz.tbi",
output:
"{prefix}.output.bed",
log:
"logs/tabix/query/{prefix}.log",
params:
region="1",
extra="",
wrapper:
"v2.2.1/bio/tabix/query"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The region param (required) allows to specify region of interest to retrieve.
- The extra param allows for additional program arguments.
Software dependencies¶
htslib=1.17
Input/Output¶
Input:
- Bgzip compressed file (e.g. BED.gz, GFF.gz, or VCF.gz)
- Tabix index file
Output:
- Uncompressed subset of the input file from the given region
Authors¶
- William Rowell
Code¶
__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"tabix {extra} {snakemake.input[0]} {snakemake.params.region} > {snakemake.output} {log}"
)
TRANSDECODER¶
For transdecoder, the following wrappers are available:
TRANSDECODER LONGORFS¶
TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.
Example¶
This wrapper can be used in the following way:
rule transdecoder_longorfs:
input:
fasta="test.fa.gz", # required
gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
output:
"test.fa.transdecoder_dir/longest_orfs.pep"
log:
"logs/transdecoder/test-longorfs.log"
params:
extra=""
wrapper:
"v2.2.1/bio/transdecoder/longorfs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
transdecoder=5.7.0
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Transdecoder LongOrfs"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
gtm_cmd = " --gene_trans_map " + gtm
output_dir = path.dirname(str(snakemake.output))
# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")
TRANSDECODER PREDICT¶
Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.
Example¶
This wrapper can be used in the following way:
rule transdecoder_predict:
input:
fasta="test.fa.gz", # required input; optionally gzipped
pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
# you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
#longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
output:
"test.fa.transdecoder.bed",
"test.fa.transdecoder.cds",
"test.fa.transdecoder.pep",
"test.fa.transdecoder.gff3"
log:
"logs/transdecoder/test-predict.log"
params:
extra=""
wrapper:
"v2.2.1/bio/transdecoder/predict"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
transdecoder=5.7.0
Input/Output¶
Input:
- fasta assembly
Output:
- candidate coding regions (pep, cds, gff3, bed output formats)
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Transdecoder Predict"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
addl_outputs += " --retain_pfam_hits " + pfam
blast = snakemake.input.get("blastp_hits", "")
if blast:
addl_outputs += " --retain_blastp_hits " + blast
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")
TRIM_GALORE¶
For trim_galore, the following wrappers are available:
TRIM_GALORE-PE¶
Trim paired-end reads using trim_galore.
Example¶
This wrapper can be used in the following way:
rule trim_galore_pe:
input:
["reads/{sample}.1.fastq.gz", "reads/{sample}.2.fastq.gz"],
output:
fasta_fwd="trimmed/{sample}_R1.fq.gz",
report_fwd="trimmed/reports/{sample}_R1_trimming_report.txt",
fasta_rev="trimmed/{sample}_R2.fq.gz",
report_rev="trimmed/reports/{sample}_R2_trimming_report.txt",
threads: 1
params:
extra="--illumina -q 20",
log:
"logs/trim_galore/{sample}.log",
wrapper:
"v2.2.1/bio/trim_galore/pe"
rule trim_galore_pe_uncompressed:
input:
["reads/{sample}_R1.fastq", "reads/{sample}_R2.fastq"],
output:
fasta_fwd="trimmed/{sample}_R1.fastq",
report_fwd="trimmed/reports/{sample}_R1_trimming_report.txt",
fasta_rev="trimmed/{sample}_R2.fastq",
report_rev="trimmed/reports/{sample}_R2_trimming_report.txt",
threads: 1
params:
extra="--illumina -q 20",
log:
"logs/trim_galore/{sample}.log",
wrapper:
"v2.2.1/bio/trim_galore/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Software dependencies¶
trim-galore=0.6.10
Input/Output¶
Input:
- two (paired-end) fastq files (can be gzip compressed)
Output:
- two trimmed (paired-end) fastq files
- two trimming reports
Params¶
extra
: additional parameters
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming paired-end reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
import re
import os
def report_filename(infile: str) -> str:
"""Infer report output file name from input
>>> report_filename('reads/sample.1.fastq.gz')
'sample.1.fastq.gz_trimming_report.txt
>>> report_filename('reads/sample_R2.fastq.gz')
'sample_R2.fastq.gz_trimming_report.txt
"""
return os.path.basename(infile) + "_trimming_report.txt"
def fasta_filename(infile: str, infix: str, out_gzip: bool) -> str:
"""Infer fasta output file name from input
>>> fasta_filename('reads/sample.1.fq.gz', infix = '_val_1', out_gzip = False)
'sample.1_val_1.fq.gz'
>>> fasta_filename('reads/sample_R2.fastq', infix = '_val_2', out_gzip = True)
'sample_R2_val_2.fq.gz'
"""
base_input = os.path.basename(infile)
suffix = ".gz" if out_gzip or infile.endswith(".gz") else ""
REGEX_RULES = [r"\.fastq$", "\.fastq\.gz$", r"\.fq$", r"\.fq\.gz$"]
for regex in REGEX_RULES:
if re.search(regex, base_input):
return re.sub(regex, f"{infix}.fq", base_input) + suffix
return base_input + infix + suffix
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Check that two input files were supplied
n = len(snakemake.input)
assert n == 2, "Input must contain 2 files. Given: %r." % n
infile_fwd, infile_rev = snakemake.input[0:2]
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError(
"The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead."
)
# Check that four output files were supplied
m = len(snakemake.output)
assert m == 4, "Output must contain 4 files. Given: %r." % m
fasta_fwd, fasta_rev, report_fwd, report_rev = (
snakemake.output.get(key)
for key in ["fasta_fwd", "fasta_rev", "report_fwd", "report_rev"]
)
out_gzip = any((fasta_fwd.endswith("gz"), fasta_rev.endswith("gz")))
if out_gzip:
extra += " --gzip"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(trim_galore"
" {extra}"
" --cores {snakemake.threads}"
" --paired"
" -o {tmpdir}"
" {infile_fwd} {infile_rev})"
" {log}"
)
if report_fwd:
shell(f"mv {tmpdir}/{report_filename(infile_fwd)} {report_fwd}")
if report_rev:
shell(f"mv {tmpdir}/{report_filename(infile_rev)} {report_rev}")
if fasta_fwd:
shell(
f"mv {tmpdir}/{fasta_filename(infile_fwd, '_val_1', out_gzip)} {fasta_fwd}"
)
if fasta_rev:
shell(
f"mv {tmpdir}/{fasta_filename(infile_rev, '_val_2', out_gzip)} {fasta_rev}"
)
TRIM_GALORE-SE¶
Trim unpaired reads using trim_galore.
Example¶
This wrapper can be used in the following way:
rule trim_galore_se:
input:
"reads/{sample}.fastq.gz",
output:
fasta="trimmed/{sample}_trimmed.fq.gz",
report="trimmed/report/{sample}.fastq.gz_trimming_report.txt",
params:
extra="--illumina -q 20",
log:
"logs/trim_galore/{sample}.log",
wrapper:
"v2.2.1/bio/trim_galore/se"
rule trim_galore_se_uncompressed:
input:
"reads/{sample}.fastq",
output:
fasta="trimmed/{sample}_trimmed.fastq",
report="trimmed/report/{sample}.fastq_trimming_report.txt",
params:
extra="--illumina -q 20",
threads: 1
log:
"logs/trim_galore/{sample}.log",
wrapper:
"v2.2.1/bio/trim_galore/se"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- It is expected that the fastqc Snakemake wrapper be used in place of the –fastqc option.
- All output files must be placed in the same directory.
Software dependencies¶
trim-galore=0.6.10
Input/Output¶
Input:
- fastq file with untrimmed reads (can be gzip compressed)
Output:
- trimmed fastq file
- trimming report
Params¶
extra
: additional parameters
Authors¶
- Kerrin Mendler
Code¶
"""Snakemake wrapper for trimming unpaired reads using trim_galore."""
__author__ = "Kerrin Mendler"
__copyright__ = "Copyright 2018, Kerrin Mendler"
__email__ = "mendlerke@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import os
import re
import tempfile
def report_filename(infile: str) -> str:
"""Infer report output file name from input
>>> report_filename('reads/sample.fastq.gz')
'sample.fastq.gz_trimming_report.txt'
"""
return os.path.basename(infile) + "_trimming_report.txt"
def fasta_filename(infile: str, out_gzip: bool) -> str:
"""Infer fasta output file name from input
>>> fasta_filename('reads/sample.fq.gz', out_gzip = False)
'sample_trimmed.fq.gz'
>>> fasta_filename('reads/sample.fastq.gz', out_gzip = False)
'sample_trimmed.fq.gz'
>>> fasta_filename('reads/sample.fastq', out_gzip = False)
'sample_trimmed.fq'
>>> fasta_filename('reads/sample.fastq', out_gzip = True)
'sample_trimmed.fq.gz'
"""
base_input = os.path.basename(infile)
suffix = ".gz" if out_gzip or infile.endswith(".gz") else ""
REGEX_RULES = [r"\.fastq$", "\.fastq\.gz$", r"\.fq$", r"\.fq\.gz$"]
for regex in REGEX_RULES:
if re.search(regex, base_input):
return re.sub(regex, "_trimmed.fq", base_input) + suffix
return base_input + "_trimmed.fq" + suffix
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Don't run with `--fastqc` flag
if "--fastqc" in snakemake.params.get("extra", ""):
raise ValueError(
"The trim_galore Snakemake wrapper cannot "
"be run with the `--fastqc` flag. Please "
"remove the flag from extra params. "
"You can use the fastqc Snakemake wrapper on "
"the input and output files instead."
)
# Check that input files were supplied
n = len(snakemake.input)
assert n == 1, "Input must contain 1 files. Given: %r." % n
infile = snakemake.input[0]
# Check that two output files were supplied
m = len(snakemake.output)
assert m == 2, "Output must contain 2 files. Given: %r." % m
fasta, report = (snakemake.output.get(key) for key in ["fasta", "report"])
out_gzip = fasta.endswith("gz")
if out_gzip:
extra += " --gzip"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"(trim_galore"
" {extra}"
" --cores {snakemake.threads}"
" -o {tmpdir}"
" {infile})"
" {log}"
)
if report:
shell(f"mv {tmpdir}/{report_filename(infile)} {report}")
if fasta:
shell(f"mv {tmpdir}/{fasta_filename(infile, out_gzip)} {fasta}")
TRIMMOMATIC¶
For trimmomatic, the following wrappers are available:
TRIMMOMATIC PE¶
Trim paired-end reads with trimmomatic . (De)compress with pigz.
Example¶
This wrapper can be used in the following way:
rule trimmomatic_pe:
input:
r1="reads/{sample}.1.fastq.gz",
r2="reads/{sample}.2.fastq.gz"
output:
r1="trimmed/{sample}.1.fastq.gz",
r2="trimmed/{sample}.2.fastq.gz",
# reads where trimming entirely removed the mate
r1_unpaired="trimmed/{sample}.1.unpaired.fastq.gz",
r2_unpaired="trimmed/{sample}.2.unpaired.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
compression_level="-9"
threads:
32
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/trimmomatic/pe"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
trimmomatic==0.36
pigz==2.3.4
snakemake-wrapper-utils==0.1.3
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/pe
Snakemake wrapper to trim reads with trimmomatic in PE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
potential_threads_per_process = available_threads // (
1 + gzipped_input_files + gzipped_output_files
)
if potential_threads_per_process > 0:
# decompressing pigz creates at most 4 threads
pigz_input_threads = (
min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
)
pigz_output_threads = (
(available_threads - pigz_input_threads * gzipped_input_files)
// (1 + gzipped_output_files)
if gzipped_output_files != 0
else 0
)
trimmomatic_threads = (
available_threads
- pigz_input_threads * gzipped_input_files
- pigz_output_threads * gzipped_output_files
)
else:
# not enough threads for pigz
pigz_input_threads = 0
pigz_output_threads = 0
trimmomatic_threads = available_threads
return trimmomatic_threads, pigz_input_threads, pigz_output_threads
def compose_input_gz(filename, threads):
if filename.endswith(".gz") and threads > 0:
return "<(pigz -p {threads} --decompress --stdout {filename})".format(
threads=threads, filename=filename
)
return filename
def compose_output_gz(filename, threads, compression_level):
if filename.endswith(".gz") and threads > 0:
return ">(pigz -p {threads} {compression_level} > {filename})".format(
threads=threads, compression_level=compression_level, filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
snakemake.output.r1,
snakemake.output.r1_unpaired,
snakemake.output.r2,
snakemake.output.r2_unpaired,
]
trimmomatic_threads, input_threads, output_threads = distribute_threads(
input_files, output_files, snakemake.threads
)
input_r1, input_r2 = [
compose_input_gz(filename, input_threads) for filename in input_files
]
output_r1, output_r1_unp, output_r2, output_r2_unp = [
compose_output_gz(filename, output_threads, compression_level)
for filename in output_files
]
shell(
"trimmomatic PE -threads {trimmomatic_threads} {java_opts} {extra} "
"{input_r1} {input_r2} "
"{output_r1} {output_r1_unp} "
"{output_r2} {output_r2_unp} "
"{trimmer} "
"{log}"
)
TRIMMOMATIC SE¶
Trim single-end reads with trimmomatic. (De)compress with pigz.
Example¶
This wrapper can be used in the following way:
rule trimmomatic:
input:
"reads/{sample}.fastq.gz" # input and output can be uncompressed or compressed
output:
"trimmed/{sample}.fastq.gz"
log:
"logs/trimmomatic/{sample}.log"
params:
# list of trimmers (see manual)
trimmer=["TRAILING:3"],
# optional parameters
extra="",
# optional compression levels from -0 to -9 and -11
compression_level="-9"
threads:
32
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
wrapper:
"v2.2.1/bio/trimmomatic/se"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
trimmomatic==0.36
pigz==2.3.4
snakemake-wrapper-utils==0.1.3
Authors¶
- Johannes Köster
- Jorge Langa
Code¶
"""
bio/trimmomatic/se
Snakemake wrapper to trim reads with trimmomatic in SE mode with help of pigz.
pigz is the parallel implementation of gz. Trimmomatic spends most of the time
compressing and decompressing instead of trimming sequences. By using process
substitution (<(command), >(command)), we can accelerate trimmomatic a lot.
Consider providing this wrapper with at least 1 extra thread per each gzipped
input or output file.
"""
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_file, output_file, available_threads):
gzipped_input_files = 1 if input_file.endswith(".gz") else 0
gzipped_output_files = 1 if output_file.endswith(".gz") else 0
potential_threads_per_process = available_threads // (
1 + gzipped_input_files + gzipped_output_files
)
if potential_threads_per_process > 0:
# decompressing pigz creates at most 4 threads
pigz_input_threads = (
min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
)
pigz_output_threads = (
(available_threads - pigz_input_threads * gzipped_input_files)
// (1 + gzipped_output_files)
if gzipped_output_files != 0
else 0
)
trimmomatic_threads = (
available_threads
- pigz_input_threads * gzipped_input_files
- pigz_output_threads * gzipped_output_files
)
else:
# not enough threads for pigz
pigz_input_threads = 0
pigz_output_threads = 0
trimmomatic_threads = available_threads
return trimmomatic_threads, pigz_input_threads, pigz_output_threads
def compose_input_gz(filename, threads):
if filename.endswith(".gz") and threads > 0:
return "<(pigz -p {threads} --decompress --stdout {filename})".format(
threads=threads, filename=filename
)
return filename
def compose_output_gz(filename, threads, compression_level):
if filename.endswith(".gz") and threads > 0:
return ">(pigz -p {threads} {compression_level} > {filename})".format(
threads=threads, compression_level=compression_level, filename=filename
)
return filename
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)
# Distribute threads
trimmomatic_threads, input_threads, output_threads = distribute_threads(
snakemake.input[0], snakemake.output[0], snakemake.threads
)
# Collect files
input = compose_input_gz(snakemake.input[0], input_threads)
output = compose_output_gz(snakemake.output[0], output_threads, compression_level)
shell(
"trimmomatic SE -threads {trimmomatic_threads} "
"{java_opts} {extra} {input} {output} {trimmer} {log}"
)
TRINITY¶
Generate transcriptome assembly with Trinity
URL: https://github.com/trinityrnaseq/trinityrnaseq/
Example¶
This wrapper can be used in the following way:
rule trinity:
input:
left=["reads/reads.left.fq.gz", "reads/reads2.left.fq.gz"],
right=["reads/reads.right.fq.gz", "reads/reads2.right.fq.gz"],
output:
dir=temp(directory("trinity_out_dir/")),
fas="trinity_out_dir.Trinity.fasta",
map="trinity_out_dir.Trinity.fasta.gene_trans_map",
log:
'logs/trinity/trinity.log',
params:
extra="",
threads: 4
resources:
mem_gb=10,
wrapper:
"v2.2.1/bio/trinity"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
trinity=2.15.1
Input/Output¶
Input:
- fastq files
Output:
fas
: fasta containing assemblymap
: gene transcripts mapdir
: folder for intermediate results
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Trinity."""
__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Previous wrapper reserved 10 Gigabytes by default. This behaviour is
# preserved below:
max_memory = "10G"
# Getting memory in megabytes, if java opts is not filled with -Xmx parameter
# By doing so, backward compatibility is preserved
if "mem_mb" in snakemake.resources.keys():
# max_memory from trinity expects a value in gigabytes.
rounded_mb_to_gb = int(snakemake.resources["mem_mb"] / 1024)
max_memory = "{}G".format(rounded_mb_to_gb)
# Getting memory in gigabytes, for user convenience. Please prefer the use
# of mem_mb over mem_gb as advised in documentation.
elif "mem_gb" in snakemake.resources.keys():
max_memory = "{}G".format(snakemake.resources["mem_gb"])
# allow multiple input files for single assembly
left = snakemake.input.get("left")
assert left is not None, "input-> left is a required input parameter"
left = (
[snakemake.input.left]
if isinstance(snakemake.input.left, str)
else snakemake.input.left
)
right = snakemake.input.get("right")
if right:
right = (
[snakemake.input.right]
if isinstance(snakemake.input.right, str)
else snakemake.input.right
)
assert len(left) >= len(
right
), "left input needs to contain at least the same number of files as the right input (can contain additional, single-end files)"
input_str_left = " --left " + ",".join(left)
input_str_right = " --right " + ",".join(right)
else:
input_str_left = " --single " + ",".join(left)
input_str_right = ""
input_cmd = " ".join([input_str_left, input_str_right])
# infer seqtype from input files:
seqtype = snakemake.params.get("seqtype")
if not seqtype:
if "fq" in left[0] or "fastq" in left[0]:
seqtype = "fq"
elif "fa" in left[0] or "fas" in left[0] or "fasta" in left[0]:
seqtype = "fa"
else: # assertion is redundant - warning or error instead?
assert (
seqtype is not None
), "cannot infer 'fq' or 'fa' seqtype from input files. Please specify 'fq' or 'fa' in 'seqtype' parameter"
shell(
"Trinity {input_cmd} --CPU {snakemake.threads} "
" --max_memory {max_memory} --seqType {seqtype} "
" --output {snakemake.output.dir} {snakemake.params.extra} "
" {log}"
)
TXIMPORT¶
Import and summarize transcript-level estimates for both transcript-level and gene-level analysis.
Example¶
This wrapper can be used in the following way:
rule tximport:
input:
quant = expand("quant/A/quant.sf")
# Optional transcript/gene links as described in tximport
# tx2gene = /path/to/tx2gene
output:
txi = "txi.RDS"
params:
extra = "type='salmon', txOut=TRUE"
wrapper:
"v2.2.1/bio/tximport"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Add any tximport options in the params, they will be transmitted through the R wrapper. Supplementary options will cause unknown parameters error.
Software dependencies¶
bioconductor-tximport=1.26.0
r-readr=2.1.4
r-jsonlite=1.8.7
Authors¶
- Thibault Dayris
Code¶
#!/bin/R
# Loading library
base::library("tximport"); # Perform actual count importation in R
base::library("readr"); # Read faster!
base::library("jsonlite"); # Importing inferential replicates
# Cast input paths as character to avoid errors
samples_paths <- sapply( # Sequentially apply
snakemake@input[["quant"]], # ... to all quantification paths
function(quant) as.character(quant) # ... a cast as character
);
# Collapse path into a character vector
samples_paths <- base::paste0(samples_paths, collapse = '", "');
# Building function arguments
extra <- base::paste0('files = c("', samples_paths, '")');
# Check if user provided optional transcript to gene table
if ("tx_to_gene" %in% names(snakemake@input)) {
tx2gene <- readr::read_tsv(snakemake@input[["tx_to_gene"]]);
extra <- base::paste(
extra, # Foreward existing arguments
", tx2gene = ", # Argument name
"tx2gene" # Add tx2gene to parameters
);
}
# Add user defined arguments
if ("extra" %in% names(snakemake@params)) {
if (snakemake@params[["extra"]] != "") {
extra <- base::paste(
extra, # Foreward existing parameters
snakemake@params[["extra"]], # Add user parameters
sep = ", " # Field separator
);
}
}
print(extra);
# Perform tximport work
txi <- base::eval( # Evaluate the following
base::parse( # ... parsed expression
text = base::paste0(
"tximport::tximport(", extra, ");" # ... of tximport and its arguments
)
)
);
# Save results
base::saveRDS( # Save R object
object = txi, # The txi object
file = snakemake@output[["txi"]] # Output path is provided by Snakemake
);
UCSC¶
For ucsc, the following wrappers are available:
BEDGRAPHTOBIGWIG¶
Convert *.bedGraph file to *.bw file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Example¶
This wrapper can be used in the following way:
rule bedGraphToBigWig:
input:
bedGraph="{sample}.bedGraph",
chromsizes="genome.chrom.sizes"
output:
"{sample}.bw"
log:
"logs/{sample}.bed-graph_to_big-wig.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/ucsc/bedGraphToBigWig"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ucsc-bedgraphtobigwig=445
Input/Output¶
Input:
bedGraph
: Path to *.bedGraph filechromsizes
: Chrom sizes file, could be generated by twoBitInfo or downloaded from UCSC
Output:
- Path to output ‘*.bw’ file
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.bedGraph to *.bw conversion using UCSC bedGraphToBigWig tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"bedGraphToBigWig {extra}"
" {snakemake.input.bedGraph} {snakemake.input.chromsizes}"
" {snakemake.output} {log}"
)
FATOTWOBIT¶
Convert *.fa file to *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Example¶
This wrapper can be used in the following way:
# Example: from *.fa file
rule faToTwoBit_fa:
input:
"{sample}.fa"
output:
"{sample}.2bit"
log:
"logs/{sample}.fa_to_2bit.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/ucsc/faToTwoBit"
# Example: from *.fa.gz file
rule faToTwoBit_fa_gz:
input:
"{sample}.fa.gz"
output:
"{sample}.2bit"
log:
"logs/{sample}.fa-gz_to_2bit.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/ucsc/faToTwoBit"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ucsc-fatotwobit=447
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC faToTwoBit tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("faToTwoBit {extra} {snakemake.input} {snakemake.output} {log}")
GTFTOGENEPRED¶
Convert a GTF file to genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9)
URL: https://hgdownload.cse.ucsc.edu/admin/exe/
Example¶
This wrapper can be used in the following way:
rule gtfToGenePred:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.genePred",
log:
"logs/gtfToGenePred.log",
params:
extra="-genePredExt", # optional parameters to pass to gtfToGenePred
wrapper:
"v2.2.1/bio/ucsc/gtfToGenePred"
rule gtfToGenePred_CollectRnaSeqMetrics:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.PicardCollectRnaSeqMetrics.genePred",
log:
"logs/gtfToGenePred.PicardCollectRnaSeqMetrics.log",
params:
convert_out="PicardCollectRnaSeqMetrics",
extra="-genePredExt -geneNameAsName2", # optional parameters to pass to gtfToGenePred
wrapper:
"v2.2.1/bio/ucsc/gtfToGenePred"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- The convert_out param allows to apply some conversions to the refFlat output. For example, if set to PicardCollectRnaSeqMetrics it makes it compatible with Picard CollectRnaSeqMetrics (this one also requires extra to be set to -genePredExt -geneNameAsName2).
Software dependencies¶
ucsc-gtftogenepred=447
csvkit=1.1.1
Authors¶
- Brett Copeland
- Filipe G. Vieira
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
convert_out = snakemake.params.get("convert_out", "raw")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
pipes = ""
if convert_out == "raw":
pipes = ""
elif convert_out == "PicardCollectRnaSeqMetrics":
pipes += " | csvcut -t -c 12,1-10 | csvformat -T"
else:
raise ValueError(
f"Unsupported conversion mode {convert_out}. Please check wrapper documentation."
)
shell(
"(gtfToGenePred {extra} {snakemake.input} /dev/stdout {pipes} > {snakemake.output}) {log}"
)
TWOBITINFO¶
Generate *.chorom.sizes file by *.2bit file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Example¶
This wrapper can be used in the following way:
rule twoBitInfo:
input:
"{sample}.2bit"
output:
"{sample}.chrom.sizes"
log:
"logs/{sample}.chrom.sizes.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/ucsc/twoBitInfo"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ucsc-twobitinfo=447
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitInfo tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("twoBitInfo {extra} {snakemake.input} {snakemake.output} {log}")
TWOBITTOFA¶
Convert *.2bit file to *.fa file (see http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt)
Example¶
This wrapper can be used in the following way:
rule twoBitToFa:
input:
"{sample}.2bit"
output:
"{sample}.fa"
log:
"logs/{sample}.2bit_to_fa.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/ucsc/twoBitToFa"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ucsc-twobittofa=447
Authors¶
- Roman Cherniatchik
Code¶
"""Snakemake wrapper for *.2bit to *.fa conversion using UCSC twoBitToFa tool."""
# http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/FOOTER.txt
__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
shell("twoBitToFa {extra} {snakemake.input} {snakemake.output} {log}")
UMIS¶
For umis, the following wrappers are available:
UMIS BAMTAG¶
Convert a BAM/SAM with fastqtransformed read names to have UMI and
Example¶
This wrapper can be used in the following way:
rule umis_bamtag:
input:
"data/{sample}.bam"
output:
"data/{sample}.annotated.bam"
log:
"logs/umis/bamtag/{sample}.log"
params:
extra=""
threads: 1
wrapper:
"v2.2.1/bio/umis/bamtag"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
umis=1.0.9
samtools=1.17
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2019, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
bam_input = snakemake.input[0]
if bam_input is None:
raise ValueError("Missing bam input file!")
elif not len(snakemake.input) == 1:
raise ValueError("Only expecting one input file: " + str(snakemake.input) + "!")
output_file = snakemake.output[0]
if output_file is None:
raise ValueError("Missing output file")
elif not len(snakemake.output) == 1:
raise ValueError("Only expecting one output file: " + str(output_file) + "!")
in_pipe = ""
if bam_input.endswith(".sam"):
in_pipe = "cat "
else:
in_pipe = "samtools view -h "
out_pipe = ""
if not output_file.endswith(".sam"):
out_pipe = " | samtools view -S -b - "
shell(
" {in_pipe} {bam_input} | " " umis bamtag -" " {out_pipe} > {output_file}" " {log}"
)
UNICYCLER¶
Assemble bacterial genomes with Unicycler.
You may find additional information on Unicycler’s github page.
Example¶
This wrapper can be used in the following way:
rule test_unicycler:
input:
# R1 and R2 short reads:
paired = expand(
"reads/{sample}.{read}.fq.gz",
read=["R1", "R2"],
allow_missing=True
)
# Long reads:
# long = long_reads/{sample}.fq.gz
# Unpaired reads:
# unpaired = reads/{sample}.fq.gz
output:
"result/{sample}/assembly.fasta"
log:
"logs/{sample}.log"
params:
extra=""
wrapper:
"v2.2.1/bio/unicycler"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bowtie2=2.5.1
bcftools=1.17
spades=3.15.5
samtools=1.17
pilon=1.24
racon=1.5.0
blast=2.14.0
unicycler=0.5.0
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Unicycler"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2020, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from os.path import dirname
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
input_reads = ""
if "paired" in snakemake.input.keys():
input_reads += " --short1 {} --short2 {}".format(*snakemake.input.paired)
if "unpaired" in snakemake.input.keys():
input_reads += " --unpaired {} ".format(snakemake.input["unpaired"])
if "long" in snakemake.input.keys():
input_reads += " --long {} ".format(snakemake.input["long"])
output_dir = " --out {} ".format(dirname(snakemake.output[0]))
shell(
" unicycler "
" {input_reads} "
" --threads {snakemake.threads} "
" {output_dir} "
" {extra} "
" {log} "
)
VARDICT¶
Run Vardict to call genomic variants
Example¶
This wrapper can be used in the following way:
rule vardict_single_mode:
input:
reference="data/genome.fasta",
regions="regions.bed",
bam="mapped/{sample}.bam",
output:
vcf="vcf/{sample}.s.vcf",
params:
extra="",
bed_columns="-c 1 -S 2 -E 3 -g 4", # Optional, default is -c 1 -S 2 -E 3 -g 4
allele_frequency_threshold="0.01", # Optional, default is 0.01
threads: 1
log:
"logs/varscan_{sample}_s_.log",
wrapper:
"v2.2.1/bio/vardict"
rule vardict_paired_mode:
input:
reference="data/genome.fasta",
regions="regions.bed",
bam="mapped/{sample}.bam",
normal="mapped/b.bam",
output:
vcf="vcf/{sample}.tn.vcf",
params:
extra="",
threads: 1
log:
"logs/varscan_{sample}_tn.log",
wrapper:
"v2.2.1/bio/vardict"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vardict-java=1.8.3
Input/Output¶
Input:
- reference file
- bam file
- normal file, optional (must be set for tumor/normal mode)
- region file
Output:
- A VCF file
Params¶
extra. optional
:bed_columns, optional, default -c 1 -S 2 -E 3 -g 4
:ah_th optional, default values is 0.01
:
Authors¶
- Patrik Smeds
Code¶
"""Snakemake wrapper for VarDict Single sample mode"""
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2021, Patrik Smeds"
__email__ = "patrik.smeds@scilifelab.uu.se"
__license__ = "MIT"
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
reference = snakemake.input.reference
regions = snakemake.input.regions
bam = snakemake.input.bam
normal = snakemake.input.get("normal", None)
vcf = snakemake.output.vcf
extra = snakemake.params.get("extra", "")
bed_columns = snakemake.params.get("bed_columns", "-c 1 -S 2 -E 3 -g 4")
af_th = snakemake.params.get("allele_frequency_threshold", "0.01")
if normal is None:
input_bams = bam
name = snakemake.params.get("sample_name", Path(bam).stem)
post_scripts = (
"teststrandbias.R | var2vcf_valid.pl -A -N '" + name + "' -E -f " + af_th
)
else:
input_bams = "'" + bam + "|" + normal + "'"
name = snakemake.params.get("sample_name", Path(bam).stem + "|" + Path(normal).stem)
post_scripts = 'testsomatic.R | var2vcf_paired.pl -N "' + name + '" -f ' + af_th
shell(
"vardict-java -G {reference} "
"-f {af_th} "
" {extra} "
"-th {snakemake.threads} "
"{bed_columns} "
"-N '{name}' "
"-b {input_bams} "
"{regions} |"
"{post_scripts} "
"> {vcf}"
"{log}"
)
VARSCAN¶
For varscan, the following wrappers are available:
VARSCAN MPILEUP2INDEL¶
Detect indel in NGS data from mpileup files with VarScan
Example¶
This wrapper can be used in the following way:
rule mpileup_to_vcf:
input:
"mpileup/{sample}.mpileup.gz"
output:
"vcf/{sample}.vcf"
message:
"Calling Indel with Varscan2"
threads: # Varscan does not take any threading information
1 # However, mpileup might have to be unzipped.
# Keep threading value to one for unzipped mpileup input
# Set it to two for zipped mipileup files
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
log:
"logs/varscan_{sample}.log"
wrapper:
"v2.2.1/bio/varscan/mpileup2indel"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.
If so, it’s recommended to use two threads:
- 1 for varscan itself
- 1 for zcat
Software dependencies¶
varscan=2.4.4
snakemake-wrapper-utils=0.6.1
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Varscan2 mpileup2indel"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts
# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
" cat {} ".format(snakemake.input[0])
if not snakemake.input[0].endswith("gz")
else " zcat {} ".format(snakemake.input[0])
)
# Building output directories
makedirs(op.dirname(snakemake.output[0]))
shell(
"varscan mpileup2indel " # Tool and its subprocess
"<( {pileup} ) "
"{java_opts} {extra} " # Extra parameters
"> {snakemake.output[0]} " # Path to vcf file
"{log}" # Logging behaviour
)
VARSCAN MPILEUP2SNP¶
Detect variants in NGS data from Samtools mpileup with VarScan
Example¶
This wrapper can be used in the following way:
rule mpileup_to_vcf:
input:
"mpileup/{sample}.mpileup.gz"
output:
"vcf/{sample}.vcf"
message:
"Calling SNP with Varscan2"
threads: # Varscan does not take any threading information
1 # However, mpileup might have to be unzipped.
# Keep threading value to one for unzipped mpileup input
# Set it to two for zipped mipileup files
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
log:
"logs/varscan_{sample}.log"
wrapper:
"v2.2.1/bio/varscan/mpileup2snp"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Varscan does not take any threading information by itself. However, mpileup files given as input, might be gzipped.
If so, it’s recommended to use two threads:
- 1 for varscan itself
- 1 for zcat
Software dependencies¶
varscan=2.4.4
snakemake-wrapper-utils=0.6.1
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for Varscan2 mpileup2snp"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts
# Gathering extra parameters and logging behaviour
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
# In case input files are gzipped mpileup files,
# they are being unzipped and piped
# In that case, it is recommended to use at least 2 threads:
# - One for unzipping with zcat
# - One for running varscan
pileup = (
" cat {} ".format(snakemake.input[0])
if not snakemake.input[0].endswith("gz")
else " zcat {} ".format(snakemake.input[0])
)
# Building output directories
makedirs(op.dirname(snakemake.output[0]))
shell(
"varscan mpileup2snp " # Tool and its subprocess
"<( {pileup} ) "
"{java_opts} {extra} " # Extra parameters
"> {snakemake.output[0]} " # Path to vcf file
"{log}" # Logging behaviour
)
VARSCAN SOMATIC¶
Varscan Somatic calls variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair.
Example¶
This wrapper can be used in the following way:
rule varscan_somatic:
input:
# A pair of pileup files can be used *instead* of the mpileup
# normal_pileup = ""
# tumor_pileup = ""
mpileup = "mpileup/{sample}.mpileup.gz"
output:
snp = "vcf/{sample}.snp.vcf",
indel = "vcf/{sample}.indel.vcf"
message:
"Calling somatic variants {wildcards.sample}"
threads:
1
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024
params:
extra = ""
wrapper:
"v2.2.1/bio/varscan/somatic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
varscan=2.4.4
snakemake-wrapper-utils=0.6.1
Authors¶
- Thibault Dayris
Code¶
"""Snakemake wrapper for varscan somatic"""
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2019, Dayris Thibault"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os.path as op
from snakemake.shell import shell
from snakemake.utils import makedirs
from snakemake_wrapper_utils.java import get_java_opts
# Defining logging and gathering extra parameters
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
# Building output dirs
makedirs(op.dirname(snakemake.output.snp))
makedirs(op.dirname(snakemake.output.indel))
# Output prefix
prefix = op.splitext(snakemake.output.snp)[0]
# Searching for input files
pileup_pair = ["normal_pileup", "tumor_pileup"]
in_pileup = ""
mpileup = ""
if "mpileup" in snakemake.input.keys():
# Case there is a mpileup with both normal and tumor
in_pileup = snakemake.input.mpileup
mpileup = "--mpileup 1"
elif all(pileup in snakemake.input.keys() for pileup in pileup_pair):
# Case there are two separate pileup files
in_pileup = " {snakemake.input.normal_pileup}" " {snakemakeinput.tumor_pileup} "
else:
raise KeyError("Could not find either a mpileup, or a pair of pileup files")
shell(
"varscan somatic" # Tool and its subcommand
" {in_pileup}" # Path to input file(s)
" {prefix}" # Path to output
" {java_opts} {extra}" # Extra parameters
" {mpileup}"
" --output-snp {snakemake.output.snp}" # Path to snp output file
" --output-indel {snakemake.output.indel}" # Path to indel output file
)
VCFTOOLS¶
For vcftools, the following wrappers are available:
VCFTOOLS FILTER¶
Filter vcf files using vcftools
Example¶
This wrapper can be used in the following way:
rule filter_vcf:
input:
"{sample}.vcf"
output:
"{sample}.filtered.vcf"
params:
extra="--chr 1 --recode-INFO-all"
wrapper:
"v2.2.1/bio/vcftools/filter"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vcftools=0.1.16
Authors¶
- Patrik Smeds
Code¶
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2018, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
input_flag = "--vcf"
if snakemake.input[0].endswith(".gz"):
input_flag = "--gzvcf"
output = " > " + snakemake.output[0]
if output.endswith(".gz"):
output = " | gzip" + output
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"vcftools "
"{input_flag} "
"{snakemake.input} "
"{extra} "
"--recode "
"--stdout "
"{output} "
"{log}"
)
VEMBRANE¶
For vembrane, the following wrappers are available:
VEMBRANE FILTER¶
Vembrane filter allows to simultaneously filter variants based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. If no annotation entry remains, the entire variant is deleted. https://github.com/vembrane/vembrane
Example¶
This wrapper can be used in the following way:
rule vembrane_filter:
input:
vcf="in.vcf",
output:
vcf="filtered/out.vcf"
params:
expression="POS > 4000",
extra=""
log:
"logs/vembrane.log"
wrapper:
"v2.2.1/bio/vembrane/filter"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vembrane=1.0.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for vembrane"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"vembrane filter" # Tool and its subcommand
" {extra}" # Extra parameters
" {snakemake.params.expression:q}"
" {snakemake.input}" # Path to input file
" > {snakemake.output}" # Path to output file
" {log}" # Logging behaviour
)
VEMBRANE TABLE¶
Vembrane table allows to generate table-like textfiles from vcfs based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. https://github.com/vembrane/vembrane
Example¶
This wrapper can be used in the following way:
rule vembrane_table:
input:
vcf="in.vcf",
output:
vcf="table/out.tsv"
params:
expression="CHROM, POS, ALT, REF",
extra=""
log:
"logs/vembrane.log"
wrapper:
"v2.2.1/bio/vembrane/table"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vembrane=1.0.2
Authors¶
- Christopher Schröder
Code¶
"""Snakemake wrapper for vembrane"""
__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2020, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
shell(
"vembrane table" # Tool and its subcommand
" {extra}" # Extra parameters
" {snakemake.params.expression:q}"
" {snakemake.input}" # Path to input file
" > {snakemake.output}" # Path to output file
" {log}" # Logging behaviour
)
VEP¶
For vep, the following wrappers are available:
VEP ANNOTATE¶
Annotate variant calls with VEP.
Example¶
This wrapper can be used in the following way:
rule annotate_variants:
input:
calls="variants.bcf", # .vcf, .vcf.gz or .bcf
cache="resources/vep/cache", # can be omitted if fasta and gff are specified
plugins="resources/vep/plugins",
# optionally add reference genome fasta
# fasta="genome.fasta",
# fai="genome.fasta.fai", # fasta index
# gff="annotation.gff",
# csi="annotation.gff.csi", # tabix index
# add mandatory aux-files required by some plugins if not present in the VEP plugin directory specified above.
# aux files must be defined as following: "<plugin> = /path/to/file" where plugin must be in lowercase
# revel = path/to/revel_scores.tsv.gz
output:
calls="variants.annotated.bcf", # .vcf, .vcf.gz or .bcf
stats="variants.html",
params:
# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
# Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
plugins=["LoFtool"],
extra="--everything", # optional: extra arguments
log:
"logs/vep/annotate.log",
threads: 4
wrapper:
"v2.2.1/bio/vep/annotate"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ensembl-vep=109.3
bcftools=1.17
perl-encode-locale=1.05
perl=5.32.1
Authors¶
- Johannes Köster
- Felix Mölder
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import os
from pathlib import Path
from snakemake.shell import shell
def get_only_child_dir(path):
children = [child for child in path.iterdir() if child.is_dir()]
assert (
len(children) == 1
), "Invalid VEP cache directory, only a single entry is allowed, make sure that cache was created with the snakemake VEP cache wrapper"
return children[0]
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
fork = "--fork {}".format(snakemake.threads) if snakemake.threads > 1 else ""
stats = snakemake.output.stats
cache = snakemake.input.get("cache", "")
plugins = snakemake.input.plugins
plugin_aux_files = {"LoFtool": "LoFtool_scores.txt", "ExACpLI": "ExACpLI_values.txt"}
load_plugins = []
for plugin in snakemake.params.plugins:
if plugin in plugin_aux_files.keys():
aux_path = os.path.join(plugins, plugin_aux_files[plugin])
load_plugins.append(",".join([plugin, aux_path]))
else:
load_plugins.append(",".join([plugin, snakemake.input.get(plugin.lower(), "")]))
load_plugins = " ".join(map("--plugin {}".format, load_plugins))
if snakemake.output.calls.endswith(".vcf.gz"):
fmt = "z"
elif snakemake.output.calls.endswith(".bcf"):
fmt = "b"
else:
fmt = "v"
fasta = snakemake.input.get("fasta", "")
if fasta:
fasta = "--fasta {}".format(fasta)
gff = snakemake.input.get("gff", "")
if gff:
gff = "--gff {}".format(gff)
if cache:
entrypath = get_only_child_dir(get_only_child_dir(Path(cache)))
species = (
entrypath.parent.name[:-7]
if entrypath.parent.name.endswith("_refseq")
else entrypath.parent.name
)
release, build = entrypath.name.split("_")
cache = (
"--offline --cache --dir_cache {cache} --cache_version {release} --species {species} --assembly {build}"
).format(cache=cache, release=release, build=build, species=species)
shell(
"(bcftools view '{snakemake.input.calls}' | "
"vep {extra} {fork} "
"--format vcf "
"--vcf "
"{cache} "
"{gff} "
"{fasta} "
"--dir_plugins {plugins} "
"{load_plugins} "
"--output_file STDOUT "
"--stats_file {stats} | "
"bcftools view -O{fmt} > {snakemake.output.calls}) {log}"
)
VEP DOWNLOAD CACHE¶
Download VEP cache for given species, build and release.
URL: http://www.ensembl.org/info/docs/tools/vep/index.html
Example¶
This wrapper can be used in the following way:
rule get_vep_cache:
output:
directory("resources/vep/cache"),
params:
species="saccharomyces_cerevisiae",
build="R64-1-1",
release="98",
log:
"logs/vep/cache.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v2.2.1/bio/vep/cache"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
ensembl-vep=109.3
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2023, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
try:
release = int(snakemake.params.release)
except ValueError:
raise ValueError("The parameter release is supposed to be an integer.")
with tempfile.TemporaryDirectory() as tmpdir:
# We download the cache tarball manually because vep_install does not consider proxy settings (in contrast to curl).
# See https://github.com/bcbio/bcbio-nextgen/issues/1080
vep_dir = "vep" if release >= 97 else "VEP"
cache_tarball = (
f"{snakemake.params.species}_vep_{release}_{snakemake.params.build}.tar.gz"
)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"curl -L ftp://ftp.ensembl.org/pub/release-{snakemake.params.release}/"
"variation/{vep_dir}/{cache_tarball} "
"-o {tmpdir}/{cache_tarball} {log}"
)
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
"vep_install --AUTO c "
"--SPECIES {snakemake.params.species} "
"--ASSEMBLY {snakemake.params.build} "
"--CACHE_VERSION {release} "
"--CACHEURL {tmpdir} "
"--CACHEDIR {snakemake.output} "
"--CONVERT "
"--NO_UPDATE "
"{extra} {log}"
)
VEP DOWNLOAD PLUGINS¶
Download VEP plugins.
Example¶
This wrapper can be used in the following way:
rule download_vep_plugins:
output:
directory("resources/vep/plugins")
params:
release=100
wrapper:
"v2.2.1/bio/vep/plugins"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
python=3.11.4
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import sys
from pathlib import Path
from urllib.request import urlretrieve
from zipfile import ZipFile
from tempfile import NamedTemporaryFile
if snakemake.log:
sys.stderr = open(snakemake.log[0], "w")
outdir = Path(snakemake.output[0])
outdir.mkdir()
with NamedTemporaryFile() as tmp:
urlretrieve(
"https://github.com/Ensembl/VEP_plugins/archive/release/{release}.zip".format(
release=snakemake.params.release
),
tmp.name,
)
with ZipFile(tmp.name) as f:
for member in f.infolist():
memberpath = Path(member.filename)
if len(memberpath.parts) == 1:
# skip root dir
continue
targetpath = outdir / memberpath.relative_to(memberpath.parts[0])
if member.is_dir():
targetpath.mkdir()
else:
with open(targetpath, "wb") as out:
out.write(f.read(member.filename))
VERIFYBAMID¶
For verifybamid, the following wrappers are available:
VERIFYBAMID2¶
Run verifybamid2.
Example¶
This wrapper can be used in the following way:
rule verify_bam_id:
input:
bam="a.bam",
ref="genome.fasta",
# optional - this can be used to specify custom resource files if
# necessary (if using GRCh37 or GRCh38 instead simply specify
# params.genome_build="38", for example)
# N.B. if svd_mu={prefix}.mu, then {prefix}.bed, {prefix}.UD, and
# {prefix}.V must also exist
svd_mu="ref.vcf.mu",
output:
selfsm="a.selfSM",
ancestry="a.ancestry",
params:
# optional - see note for input.svd_mu
# current choices are {37,38}
# genome_build="38",
log:
"logs/verifybamid2/a.log",
wrapper:
"v2.2.1/bio/verifybamid/verifybamid2"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more information see, https://github.com/Griffan/VerifyBamID
Software dependencies¶
verifybamid2=2.0.1
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from tempfile import TemporaryDirectory
from shutil import copyfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
svd_mu = snakemake.input.get("svd_mu", "")
if svd_mu:
svd_prefix = os.path.splitext(svd_mu)[0]
for suffix in ("bed", "UD", "V"):
fn = f"{svd_prefix}.{suffix}"
if not os.path.isfile(fn):
raise Exception(f"Failed to find required input {fn}.")
else:
genome_build = snakemake.params.get("genome_build", "38")
if genome_build not in ("37", "38"):
raise Exception(
f"No svd_prefix given and improper {genome_build=} "
f"given. Valid choices are 37,38."
)
verifybamid2_found = False
for path in os.getenv("PATH").split(os.path.pathsep):
path_to_verifybamid2 = os.path.join(path, "verifybamid2")
if os.path.isfile(path_to_verifybamid2):
verifybamid2_found = True
resources_directory = os.path.join(
os.path.dirname(os.path.realpath(path_to_verifybamid2)), "resource"
)
svd_prefix = os.path.join(
resources_directory, f"1000g.phase3.100k.b{genome_build}.vcf.gz.dat"
)
break
if not verifybamid2_found:
raise Exception("Failed to find verifybamid2 location.")
def move_file(src, dst):
"this function will move `fn` while respecting ACLs in the target directory"
copyfile(src, dst)
os.remove(src)
# verifybamid2 outputs results to result.selfSM and result.Ancestry in the working directory,
# so to avoid collisions we have to run it from a temporary directory and fix the paths
# to inputs, outputs, and the log file
ref_path = os.path.abspath(snakemake.input.ref)
svd_prefix = os.path.abspath(svd_prefix)
bam_path = os.path.abspath(snakemake.input.bam)
selfsm_path = os.path.abspath(snakemake.output.selfsm)
ancestry_path = os.path.abspath(snakemake.output.ancestry)
if snakemake.log:
snakemake.log[0] = os.path.abspath(snakemake.log[0])
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
with TemporaryDirectory() as tmp_dir:
os.chdir(tmp_dir)
shell(
"verifybamid2 --SVDPrefix {svd_prefix} "
"--Reference {ref_path} --BamFile {bam_path} {extra} "
"--NumThread {snakemake.threads} {log}"
)
move_file("result.selfSM", selfsm_path)
move_file("result.Ancestry", ancestry_path)
VG¶
For vg, the following wrappers are available:
VG CONSTRUCT¶
Construct variation graphs from a reference and variant calls.
Example¶
This wrapper can be used in the following way:
rule construct:
input:
ref="c.fa",
vcfgz="c.vcf.gz"
output:
vg="graph/c.vg"
params:
"--node-max 10"
log:
"logs/vg/construct/c.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/construct"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.48.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg construct {snakemake.params} --reference {snakemake.input.ref}"
" --vcf {snakemake.input.vcfgz} --threads {snakemake.threads}"
" > {snakemake.output.vg}) {log}"
)
VG IDS¶
Manipulate id space of input graphs. NOTE Use bio/vg/merge for making a joint id space for graphs.
Example¶
This wrapper can be used in the following way:
rule ids:
input:
vgs="c.vg"
output:
mod="graph/c_mod.vg"
log:
"logs/vg/ids/c.log"
wrapper:
"v2.2.1/bio/vg/ids"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.49.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg ids {snakemake.params} {snakemake.input.vgs}"
" > {snakemake.output.mod}) {log}"
)
VG INDEX GCSA¶
Build GCSA index for variation graphs.
Example¶
This wrapper can be used in the following way:
rule gcsa:
input:
vgs=["x.vg", "c.vg"]
output:
gcsa="index/wg.gcsa"
params:
"-Z 3000 -X 3"
log:
"logs/vg/index/gcsa/wg.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/index/gcsa"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg==1.27.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"(vg index -g {snakemake.output.gcsa} --threads {snakemake.threads}"
" {snakemake.params} {snakemake.input.vgs}) {log}"
)
VG INDEX XG¶
Create an xg index on variation graphs.
Example¶
This wrapper can be used in the following way:
rule xg:
input:
vgs="x.vg"
output:
xg="index/x.xg"
log:
"logs/vg/index/xg/x.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/index/xg"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg==1.27.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"(vg index --xg-name {snakemake.output.xg} --threads {snakemake.threads}"
" {snakemake.params} {snakemake.input.vgs}) {log}"
)
VG KMERS¶
Generates kmers from both strands of variation graphs.
Example¶
This wrapper can be used in the following way:
rule kmers:
input:
vgs="c.vg"
output:
kmers="kmers/c.kmers"
params:
"-gBk 16 -H 1000000000 -T 1000000001"
log:
"logs/vg/kmers/c.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/kmers"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.49.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg kmers {snakemake.params} --threads {snakemake.threads}"
" {snakemake.input.vgs} > {snakemake.output.kmers}) {log}"
)
VG MERGE¶
Generate a joint id space across each graph and merge them all.
Example¶
This wrapper can be used in the following way:
rule merge:
input:
vgs=["c.vg", "x.vg"]
output:
merged="graph/wg.vg"
log:
"logs/vg/merge/wg.log"
wrapper:
"v2.2.1/bio/vg/merge"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.49.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg ids --join {snakemake.input.vgs} &&"
" for VGFILE in {snakemake.input.vgs};"
" do cat $VGFILE >> {snakemake.output.merged};"
" done) {log}"
)
VG PRUNE¶
Prunes the complex regions of the graph for GCSA2 indexing.
Example¶
This wrapper can be used in the following way:
rule prune:
input:
vg="c.vg"
output:
pruned="graph/c.pruned.vg"
params:
"-r"
log:
"logs/vg/prune/c.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/prune"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.49.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2017, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg prune --threads {snakemake.threads} {snakemake.params}"
" {snakemake.input.vg} > {snakemake.output.pruned}) {log}"
)
VG SIM¶
Samples sequences from the xg-indexed graph.
Example¶
This wrapper can be used in the following way:
rule sim:
input:
xg="x.xg"
output:
reads="reads/x.seq"
params:
"--read-length 100 --num-reads 100 -f"
log:
"logs/vg/sim/x.log"
threads:
4
wrapper:
"v2.2.1/bio/vg/sim"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
vg=1.48.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2018, Ali Ghaffaari"
__email__ = "ghaffari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False)
shell(
"(vg sim {snakemake.params} --xg-name {snakemake.input.xg}"
" --threads {snakemake.threads} > {snakemake.output.reads}) {log}"
)
VSEARCH¶
Versatile open-source tool for microbiome analysis.
URL: https://github.com/torognes/vsearch
Example¶
This wrapper can be used in the following way:
rule vsearch_cluster_fast:
input:
cluster_fast="reads/{sample}.fasta",
output:
profile="out/cluster_fast/{sample}.profile",
log:
"logs/vsearch/cluster_fast/{sample}.log",
params:
extra="--id 0.2 --sizeout --minseqlength 5",
threads: 1
wrapper:
"v2.2.1/bio/vsearch"
rule vsearch_maskfasta:
input:
maskfasta="reads/{sample}.fasta",
output:
output="out/maskfasta/{sample}.fasta",
log:
"logs/vsearch/maskfasta/{sample}.log",
params:
extra="--hardmask",
threads: 1
wrapper:
"v2.2.1/bio/vsearch"
rule vsearch_fastx_uniques:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v2.2.1/bio/vsearch"
rule vsearch_fastx_uniques_gzip:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.gz",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v2.2.1/bio/vsearch"
rule vsearch_fastx_uniques_bzip2:
input:
fastx_uniques="reads/{sample}.fastq",
output:
fastqout="out/fastx_uniques/{sample}.fastq.bz2",
log:
"logs/vsearch/fastx_uniques/{sample}.log",
params:
extra="--strand both --minseqlength 5",
threads: 2
wrapper:
"v2.2.1/bio/vsearch"
rule vsearch_fastq_convert:
input:
fastq_convert="reads/{sample}.fastq",
output:
fastqout="out/fastq_convert/{sample}.fastq",
log:
"logs/vsearch/fastq_convert/{sample}.log",
params:
extra="--fastq_ascii 33 --fastq_asciiout 64",
threads: 2
wrapper:
"v2.2.1/bio/vsearch"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- Keys for input and output files need to match vsearch arguments, (e.g. input) uchime_denovo, cluster_fast, fastx_uniques, maskfasta, fastq_convert, fastq_mergepairs, or (e.g. output) chimeras, fastaout, fastqout, output.
Software dependencies¶
vsearch=2.22.1
gzip
bzip2
Params¶
extra
: additional program arguments
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
if snakemake.log:
log = f"--log {snakemake.log}"
input = " ".join([f"--{key} {value}" for key, value in snakemake.input.items()])
out_list = list()
for key, value in snakemake.output.items():
if value.endswith(".gz"):
out_list.append(f"--{key} /dev/stdout | gzip > {value}")
elif value.endswith(".bz2"):
out_list.append(f"--{key} /dev/stdout | bzip2 > {value}")
else:
out_list.append(f"--{key} {value}")
# Check which output files are to be compressed
out_gz = [out.endswith(".gz") for out in out_list]
out_bz2 = [out.endswith(".bz2") for out in out_list]
assert sum(out_gz + out_bz2) <= 1, "only one output can be compressed"
# Move compressed file (if any) to last
output = [out for _, out in sorted(zip(out_gz or out_bz2, out_list))]
shell("vsearch --threads {snakemake.threads} {input} {extra} {log} {output}")
WGSIM¶
Short read simulator.
Example¶
This wrapper can be used in the following way:
rule wgsim:
input:
ref="genome.fa"
output:
read1="reads/1.fq",
read2="reads/2.fq"
log:
"logs/wgsim/sim.log"
params:
"-X 0 -R 0 -r 0.1 -h"
wrapper:
"v2.2.1/bio/wgsim"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
wgsim=1.0
Authors¶
- Ali Ghaffaari
Code¶
__author__ = "Ali Ghaffaari"
__copyright__ = "Copyright 2018, Ali Ghaffaari"
__email__ = "ali.ghaffaari@mpi-inf.mpg.de"
__license__ = "MIT"
from snakemake.shell import shell
log = snakemake.log_fmt_shell()
shell(
"(wgsim {snakemake.params} {snakemake.input.ref}"
" {snakemake.output.read1} {snakemake.output.read2}) {log}"
)
XSV¶
Perform various operations over CSV/TSV tables.
URL: https://github.com/BurntSushi/xsv
Example¶
This wrapper can be used in the following way:
### Concatenation subcommand ###
rule test_xsv_cat_rows:
input:
table=["table.csv", "right.csv"],
output:
"xsv_catrows.csv",
threads: 1
log:
"xsv/catrow.log",
params:
subcommand="cat rows",
extra="",
wrapper:
"v2.2.1/bio/xsv"
rule test_xsv_cat_cols:
input:
table=["table.csv", "right.csv"],
output:
"xsv_catcols.csv",
threads: 1
log:
"xsv/catcol.log",
params:
subcommand="cat columns",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Count subcommand ###
rule test_xsv_count:
input:
table="table.csv",
output:
"xsv_count.csv",
threads: 1
log:
"xsv/count.log",
params:
subcommand="count",
extra="",
wrapper:
"v2.2.1/bio/xsv"
rule test_xsv_count_tsv_input:
input:
table="table.tsv",
output:
"xsv_count.tsv_as_input.csv",
threads: 1
log:
"xsv/count.log",
params:
subcommand="count",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Fix lengths subcommand ###
rule test_xsv_fixlength:
input:
table="table.csv",
output:
"xsv_fixlength.csv",
threads: 1
log:
"xsv/fixlength.log",
params:
subcommand="fixlengths",
extra="--length 20",
wrapper:
"v2.2.1/bio/xsv"
### Flatten subcommand ###
rule test_xsv_flatten:
input:
table="table.csv",
output:
"xsv_flatten.csv",
threads: 1
log:
"xsv/flatten.log",
params:
subcommand="flatten",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Format subcommand ###
rule test_xsv_fmt:
input:
table="table.csv",
output:
"xsv_fmt.tsv",
threads: 1
log:
"xsv/fmt.log",
params:
subcommand="fmt",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Frequency subcommand ###
rule test_xsv_frequency:
input:
table="table.csv",
output:
"xsv_frequency.csv",
threads: 1
log:
"xsv/frequency.log",
params:
subcommand="frequency",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Headers subcommand ###
rule test_xsv_headers:
input:
table="table.csv",
output:
"xsv_headers.csv",
threads: 1
log:
"xsv/headers.log",
params:
subcommand="headers",
extra="",
wrapper:
"v2.2.1/bio/xsv"
rule test_xsv_headers_list:
input:
table=["table.csv", "right.csv"],
output:
"xsv_headers_all.csv",
threads: 1
log:
"xsv/headers_all.log",
params:
subcommand="headers",
extra="--intersect",
wrapper:
"v2.2.1/bio/xsv"
### Index subcommand ###
rule test_xsv_index:
input:
table="table.csv",
output:
"table.csv.idx",
threads: 1
log:
"xsv/index.log",
params:
subcommand="index",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Input subcommand ###
rule test_xsv_input:
input:
table="table.csv",
output:
"xsv_input.csv",
threads: 1
log:
"xsv/input.log",
params:
subcommand="input",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Join subcommand ###
rule test_xsv_join:
input:
table=["table.csv", "right.csv"],
output:
"xsv_join.csv",
threads: 1
log:
"xsv/join.log",
params:
subcommand="join",
col1="gene_id",
col2="gene_id",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Sample subcommand ###
rule test_xsv_sample:
input:
table="table.csv",
output:
"xsv_sample.csv",
threads: 1
log:
"xsv/sample.log",
params:
subcommand="sample",
extra="1",
wrapper:
"v2.2.1/bio/xsv"
### Search subcommand ###
rule test_xsv_search:
input:
table="table.csv",
output:
"xsv_search.csv",
threads: 1
log:
"xsv/search.log",
params:
subcommand="search",
extra="--select gene_id ENSG[0-9]+",
wrapper:
"v2.2.1/bio/xsv"
### Select subcommand ###
rule test_xsv_select:
input:
table="table.csv",
output:
"xsv_select.csv",
threads: 1
log:
"xsv/select.log",
params:
subcommand="select",
extra="3-",
wrapper:
"v2.2.1/bio/xsv"
### Slice subcommand ###
rule test_xsv_slice:
input:
table="table.csv",
output:
"xsv_slice.csv",
threads: 1
log:
"xsv/slice.log",
params:
subcommand="slice",
extra="-i 2",
wrapper:
"v2.2.1/bio/xsv"
### Sort subcommand ###
rule test_xsv_sort:
input:
table="table.csv",
output:
"xsv_sort.csv",
threads: 1
log:
"xsv/sort.log",
params:
subcommand="sort",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Split subcommand ###
rule test_xsv_split:
input:
table="table.csv",
output:
directory("xsv_split"),
threads: 1
log:
"xsv/split.log",
params:
subcommand="split",
extra="-s 2",
wrapper:
"v2.2.1/bio/xsv"
rule test_xsv_split_list:
input:
table="table.csv",
output:
expand("xsv_split/{nb}.csv", nb=["0", "1"]),
threads: 1
log:
"xsv/split.log",
params:
subcommand="split",
extra="-s 1",
wrapper:
"v2.2.1/bio/xsv"
### Stat subcommand ###
rule test_xsv_stats:
input:
table="table.csv",
output:
"xsv_stats.txt",
threads: 1
log:
"xsv/stats.log",
params:
subcommand="stats",
extra="",
wrapper:
"v2.2.1/bio/xsv"
### Table subcommand ###
rule test_xsv_table:
input:
table="right.csv",
output:
"xsv_table.txt",
threads: 1
log:
"xsv/table.log",
params:
subcommand="table",
extra="",
wrapper:
"v2.2.1/bio/xsv"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
Adding table(s) index(es) to the input file list makes many subcommands faster.
Software dependencies¶
xsv=0.13.0
Params¶
extra
: Optional arguments for xsv. For TSV files, –delimiter is automatically set to a tabulation.subcommand
: xsv subcommand among cat, count, fixlengths, flatten, fmt, frequency, headers, index, input, join, sample, search, select, slice, sort, split, stats, or table
Authors¶
- Thibault Dayris
Code¶
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
import os
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
subcommand = snakemake.params["subcommand"]
extra = snakemake.params.get("extra", "")
# TSV delimiter
if len(snakemake.input["table"]) == 1:
if str(snakemake.input["table"]).endswith(".tsv"):
extra += " --delimiter $'\t' "
elif all(str(table).endswith(".tsv") for table in snakemake.input["table"]):
extra += " --delimiter $'\t' "
# Automatic multithreading when possible
if subcommand in ["frequency", "split", "stats"]:
extra += f" --jobs {snakemake.threads} "
elif snakemake.threads > 1:
raise Warning("Only one thread is required")
# Command line building
if subcommand == "join":
shell(
"xsv {subcommand} {extra} "
"{snakemake.params.col1} {snakemake.input.table[0]} "
"{snakemake.params.col2} {snakemake.input.table[1]} "
"> {snakemake.output} {log}"
)
elif subcommand == "index":
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell("xsv {subcommand} {extra} {snakemake.input.table} {log}")
elif subcommand == "split":
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
outdir = snakemake.output
if len(outdir) > 1:
outdir = os.path.dirname(outdir[0])
shell("xsv {subcommand} {extra} {outdir} {snakemake.input.table} {log}")
else:
shell(
"xsv {subcommand} {extra} {snakemake.input.table} "
" > {snakemake.output} {log}"
)
Meta-Wrappers¶
Meta-wrappers offer curated and tested combinations of Wrappers that fulfil common tasks with popular tools, in a best-practice way. For using them, simply copy-paste the offered snippets into your Snakemake workflow.
The menu on the left (expand by clicking (+) if necessary), lists all available meta-wrappers.
BWA_MAPPING¶
Map reads with bwa-mem and index with samtools index - this is just a test for subworkflows
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
rule bwa_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
idx=multiext("genome", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
"mapped/{sample}.bam"
log:
"logs/bwa_mem/{sample}.log"
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="samtools", # Can be 'none', 'samtools' or 'picard'.
sort_order="coordinate", # Can be 'queryname' or 'coordinate'.
sort_extra="" # Extra args for samtools/picard.
threads: 8
wrapper:
"v2.2.1/bio/bwa/mem"
rule samtools_index:
input:
"mapped/{sample}.bam"
output:
"mapped/{sample}.bam.bai"
log:
"logs/samtools_index/{sample}.log"
params:
"" # optional params string
wrapper:
"v2.2.1/bio/samtools/index"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
- Jan Forster
CALC_CONSENSUS_READS¶
Performs consensus read calculation on a marked bam and merges single, paired and skipped-consensus reads into a single sorted bam file.
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
rule calc_consensus_reads:
input:
# sorted bam file
"mapped/{sample}.marked.bam",
output:
# non-overlapping consensus read pairs will be written into consensus_r1 and consensus_r2
consensus_r1=temp("results/consensus_reads/{sample}.1.fq"),
consensus_r2=temp("results/consensus_reads/{sample}.2.fq"),
# consensus reads from single end records or overlapping read pairs will be merged into a single end record
consensus_se=temp("results/consensus_reads/{sample}.se.fq"),
# skipped reads (soft-clipped or unpropper mapped reads) will be skipped and unmarked
skipped=temp("results/consensus_reads/{sample}.skipped.bam"),
params:
extra="",
log:
"logs/consensus/{sample}.log",
wrapper:
"v2.2.1/bio/rbt/collapse_reads_to_fragments-bam"
rule map_consensus_reads:
input:
reads=lambda wc: expand(
"results/consensus_reads/{sample}.{read}.fq",
sample=wc.sample,
read="se" if wc.read_type == "se" else (1, 2),
),
idx=multiext("resources/genome.fa", ".amb", ".ann", ".bwt", ".pac", ".sa"),
output:
temp("results/consensus_mapped/{sample}.{read_type}.bam"),
params:
extra=r"-C -R '@RG\tID:{sample}\tSM:{sample}'",
index=lambda w, input: os.path.splitext(input.idx[0])[0],
sort="samtools",
sort_order="coordinate",
wildcard_constraints:
read_type="pe|se",
log:
"logs/bwa_mem/{sample}.{read_type}.consensus.log",
threads: 8
wrapper:
"v2.2.1/bio/bwa/mem"
rule sort_skipped_reads:
input:
"results/consensus_reads/{sample}.skipped.bam",
output:
temp("results/consensus_reads/{sample}.skipped.sorted.bam"),
params:
extra="-m 4G",
tmp_dir="/tmp/",
log:
"logs/sort_consensus/{sample}.log",
# Samtools takes additional threads through its option -@
threads: 8 # This value - 1 will be sent to -@.
wrapper:
"v2.2.1/bio/samtools/sort"
rule mark_duplicates_skipped:
input:
bams=["results/consensus_reads/{sample}.skipped.sorted.bam"],
output:
bam=temp("results/consensus_dupmarked/{sample}.skipped.marked.bam"),
metrics="results/consensus_dupmarked/{sample}.skipped.metrics.txt",
log:
"logs/picard/marked/{sample}.log",
params:
extra="--VALIDATION_STRINGENCY LENIENT --TAG_DUPLICATE_SET_MEMBERS true",
# optional specification of memory usage of the JVM that snakemake will respect with global
# resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources)
# and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`:
# https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties
resources:
mem_mb=1024,
wrapper:
"v2.2.1/bio/picard/markduplicates"
rule merge_consensus_reads:
input:
"results/consensus_dupmarked/{sample}.skipped.marked.bam",
"results/consensus_mapped/{sample}.se.bam",
"results/consensus_mapped/{sample}.pe.bam",
output:
"results/consensus/{sample}.bam",
log:
"logs/samtools_merge/{sample}.log",
threads: 8
wrapper:
"v2.2.1/bio/samtools/merge"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
- RBT COLLAPSE-READS-TO-FRAGMENTS BAM
- BWA MEM
- BWA INDEX
- SAMTOOLS MERGE
- SAMTOOLS SORT
- PICARD MARKDUPLICATES
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
DADA2-PE¶
A subworkflow for processing paired-end sequences from metabarcoding projects in order to construct ASV tables using DADA2
. The example is based on the data provided by the R
package. For more details, see the official website and the tutorial.
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
# Make sure that you set the `truncLen=` option in the rule `dada2_filter_and_trim_pe` according
# to the results of the quality profile checks (after rule `dada2_quality_profile_pe` has finished on all samples).
# If in doubt, check https://benjjneb.github.io/dada2/tutorial.html#inspect-read-quality-profiles
rule all:
input:
# In a first run of this meta-wrapper, comment out all other inputs and only keep this one.
# Looking at the resulting plot, adjust the `truncLen` in rule `dada2_filter_trim_pe` and then
# rerun with all inputs uncommented.
expand(
"reports/dada2/quality-profile/{sample}-quality-profile.png",
sample=["a","b"]
),
"results/dada2/taxa.RDS"
rule dada2_quality_profile_pe:
input:
# FASTQ file without primer sequences
expand("trimmed/{{sample}}.{orientation}.fastq.gz",orientation=[1,2])
output:
"reports/dada2/quality-profile/{sample}-quality-profile.png"
log:
"logs/dada2/quality-profile/{sample}-quality-profile-pe.log"
wrapper:
"v2.2.1/bio/dada2/quality-profile"
rule dada2_filter_trim_pe:
input:
# Paired-end files without primer sequences
fwd="trimmed/{sample}.1.fastq.gz",
rev="trimmed/{sample}.2.fastq.gz"
output:
filt="filtered-pe/{sample}.1.fastq.gz",
filt_rev="filtered-pe/{sample}.2.fastq.gz",
stats="reports/dada2/filter-trim-pe/{sample}.tsv"
params:
# Set the maximum expected errors tolerated in filtered reads
maxEE=1,
# Set the number of kept bases in forward and reverse reads
truncLen=[240,200]
log:
"logs/dada2/filter-trim-pe/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/filter-trim"
rule dada2_learn_errors:
input:
# Quality filtered and trimmed forward FASTQ files (potentially compressed)
expand("filtered-pe/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
output:
err="results/dada2/model_{orientation}.RDS",# save the error model
plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
params:
randomize=True
log:
"logs/dada2/learn-errors/learn-errors_{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/learn-errors"
rule dada2_dereplicate_fastq:
input:
# Quality filtered FASTQ file
"filtered-pe/{fastq}.fastq.gz"
output:
# Dereplicated sequences stored as `derep-class` object in a RDS file
"uniques/{fastq}.RDS"
log:
"logs/dada2/dereplicate-fastq/{fastq}.log"
wrapper:
"v2.2.1/bio/dada2/dereplicate-fastq"
rule dada2_sample_inference:
input:
# Dereplicated (aka unique) sequences of the sample
derep="uniques/{sample}.{orientation}.RDS",
err="results/dada2/model_{orientation}.RDS" # Error model
output:
"denoised/{sample}.{orientation}.RDS" # Inferred sample composition
log:
"logs/dada2/sample-inference/{sample}.{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/sample-inference"
rule dada2_merge_pairs:
input:
dadaF="denoised/{sample}.1.RDS",# Inferred composition
dadaR="denoised/{sample}.2.RDS",
derepF="uniques/{sample}.1.RDS",# Dereplicated sequences
derepR="uniques/{sample}.2.RDS"
output:
"merged/{sample}.RDS"
log:
"logs/dada2/merge-pairs/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/merge-pairs"
rule dada2_make_table_pe:
input:
# Merged composition
expand("merged/{sample}.RDS", sample=['a','b'])
output:
"results/dada2/seqTab-pe.RDS"
params:
names=['a','b'], # Sample names instead of paths
orderBy="nsamples" # Change the ordering of samples
log:
"logs/dada2/make-table/make-table-pe.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/make-table"
rule dada2_remove_chimeras:
input:
"results/dada2/seqTab-pe.RDS" # Sequence table
output:
"results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
log:
"logs/dada2/remove-chimeras/remove-chimeras.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/remove-chimeras"
rule dada2_collapse_nomismatch:
input:
"results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
output:
"results/dada2/seqTab.collapsed.RDS"
log:
"logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/collapse-nomismatch"
rule dada2_assign_taxonomy:
input:
seqs="results/dada2/seqTab.collapsed.RDS", # Chimera-free sequence table
refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
output:
"results/dada2/taxa.RDS" # Taxonomic assignments
log:
"logs/dada2/assign-taxonomy/assign-taxonomy.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/assign-taxonomy"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
- DADA2_QUALITY_PROFILES
- DADA2_FILTER_TRIM
- DADA2_LEARN_ERRORS
- DADA2_DEREPLICATE_FASTQ
- DADA2_SAMPLE_INFERENCE
- DADA2_MERGE_PAIRS
- DADA2_MAKE_TABLE
- DADA2_REMOVE_CHIMERAS
- DADA2_COLLAPSE_NOMISMATCH
- DADA2_ASSIGN_TAXONOMY
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
- Charlie Pauvert
DADA2-SE¶
A subworkflow for processing single-end sequences from metabarcoding projects in order to construct ASV tables using DADA2
. The example is based on the data provided in the R
package. For more details, see the official website. While the tutorial is tailored for paired-end sequences, useful information can be found regarding common functions to singled-end sequences processing.
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
# Make sure that you set the `truncLen=` option in the rule `dada2_filter_and_trim_se` according
# to the results of the quality profile checks (after rule `dada2_quality_profile_se` has finished on all samples).
# If in doubt, check https://benjjneb.github.io/dada2/tutorial.html#inspect-read-quality-profiles
rule all:
input:
# In a first run of this meta-wrapper, comment out all other inputs and only keep this one.
# Looking at the resulting plot, adjust the `truncLen` in rule `dada2_filter_trim_se` and then
# rerun with all inputs uncommented.
expand(
"reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png",
sample=["a","b"], orientation=1
),
"results/dada2/taxa.RDS"
rule dada2_quality_profile_se:
input:
# FASTQ file without primer sequences
"trimmed/{sample}.{orientation}.fastq.gz"
output:
"reports/dada2/quality-profile/{sample}.{orientation}-quality-profile.png"
log:
"logs/dada2/quality-profile/{sample}.{orientation}-quality-profile-se.log"
wrapper:
"v2.2.1/bio/dada2/quality-profile"
rule dada2_filter_trim_se:
input:
# Single-end files without primer sequences
fwd="trimmed/{sample}.1.fastq.gz"
output:
filt="filtered-se/{sample}.1.fastq.gz",
stats="reports/dada2/filter-trim-se/{sample}.tsv"
params:
# Set the maximum expected errors tolerated in filtered reads
maxEE=1,
# Set the number of kept bases
truncLen=240
log:
"logs/dada2/filter-trim-se/{sample}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/filter-trim"
rule dada2_learn_errors:
input:
# Quality filtered and trimmed forward FASTQ files (potentially compressed)
expand("filtered-se/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
output:
err="results/dada2/model_{orientation}.RDS",# save the error model
plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
params:
randomize=True
log:
"logs/dada2/learn-errors/learn-errors_{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/learn-errors"
rule dada2_dereplicate_fastq:
input:
# Quality filtered FASTQ file
"filtered-se/{fastq}.fastq.gz"
output:
# Dereplicated sequences stored as `derep-class` object in a RDS file
"uniques/{fastq}.RDS"
log:
"logs/dada2/dereplicate-fastq/{fastq}.log"
wrapper:
"v2.2.1/bio/dada2/dereplicate-fastq"
rule dada2_sample_inference:
input:
# Dereplicated (aka unique) sequences of the sample
derep="uniques/{sample}.{orientation}.RDS",
err="results/dada2/model_{orientation}.RDS" # Error model
output:
"denoised/{sample}.{orientation}.RDS" # Inferred sample composition
log:
"logs/dada2/sample-inference/{sample}.{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/sample-inference"
rule dada2_make_table_se:
input:
# Inferred composition
expand("denoised/{sample}.1.RDS", sample=['a','b'])
output:
"results/dada2/seqTab-se.RDS"
params:
names=['a','b'] # Sample names instead of paths
log:
"logs/dada2/make-table/make-table-se.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/make-table"
rule dada2_remove_chimeras:
input:
"results/dada2/seqTab-se.RDS" # Sequence table
output:
"results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
log:
"logs/dada2/remove-chimeras/remove-chimeras.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/remove-chimeras"
rule dada2_collapse_nomismatch:
input:
"results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
output:
"results/dada2/seqTab.collapsed.RDS"
log:
"logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/collapse-nomismatch"
rule dada2_assign_taxonomy:
input:
seqs="results/dada2/seqTab.collapsed.RDS", # Chimera-free sequence table
refFasta="resources/example_train_set.fa.gz" # Reference FASTA for taxonomy
output:
"results/dada2/taxa.RDS" # Taxonomic assignments
log:
"logs/dada2/assign-taxonomy/assign-taxonomy.log"
threads: 1 # set desired number of threads here
wrapper:
"v2.2.1/bio/dada2/assign-taxonomy"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
- DADA2_QUALITY_PROFILES
- DADA2_FILTER_TRIM
- DADA2_LEARN_ERRORS
- DADA2_DEREPLICATE_FASTQ
- DADA2_SAMPLE_INFERENCE
- DADA2_MAKE_TABLE
- DADA2_REMOVE_CHIMERAS
- DADA2_COLLAPSE_NOMISMATCH
- DADA2_ASSIGN_TAXONOMY
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
- Charlie Pauvert
SALMON TXIMPORT¶
This meta-wrapper includes the following steps:
Step Tool Reason Indexation Bash Identify decoy sequences Indexation Salmon Create decoy aware gentrome (genome + trancriptome) index Quantification Salmon Quantify sequenced reads Quantification Tximport Import counts and inferential replicates in R as a ready-to-use SummarizedExperiment object.
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
rule salmon_decoy_sequences:
input:
transcriptome="resources/transcriptome.fasta",
genome="resources/genome.fasta",
output:
gentrome=temp("resources/gentrome.fasta"),
decoys=temp("resources/decoys.txt"),
threads: 1
log:
"decoys.log",
wrapper:
"v2.2.1/bio/salmon/decoys"
rule salmon_index_gentrome:
input:
sequences="resources/gentrome.fasta",
decoys="resources/decoys.txt",
output:
multiext(
"salmon/transcriptome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
"duplicate_clusters.tsv",
"info.json",
"mphf.bin",
"pos.bin",
"pre_indexing.log",
"rank.bin",
"refAccumLengths.bin",
"ref_indexing.log",
"reflengths.bin",
"refseq.bin",
"seq.bin",
"versionInfo.json",
),
cache: True
log:
"logs/salmon/transcriptome_index.log",
threads: 2
params:
# optional parameters
extra="",
wrapper:
"v2.2.1/bio/salmon/index"
rule salmon_quant_reads:
input:
r="reads/{sample}.fastq.gz",
index=multiext(
"salmon/transcriptome_index/",
"complete_ref_lens.bin",
"ctable.bin",
"ctg_offsets.bin",
"duplicate_clusters.tsv",
"info.json",
"mphf.bin",
"pos.bin",
"pre_indexing.log",
"rank.bin",
"refAccumLengths.bin",
"ref_indexing.log",
"reflengths.bin",
"refseq.bin",
"seq.bin",
"versionInfo.json",
),
gtf="resources/annotation.gtf",
output:
quant=temp("pseudo_mapping/{sample}/quant.sf"),
quant_gene=temp("pseudo_mapping/{sample}/quant.genes.sf"),
lib=temp("pseudo_mapping/{sample}/lib_format_counts.json"),
aux_info=temp(directory("pseudo_mapping/{sample}/aux_info")),
cmd_info=temp("pseudo_mapping/{sample}/cmd_info.json"),
libparams=temp(directory("pseudo_mapping/{sample}/libParams")),
logs=temp(directory("pseudo_mapping/{sample}/logs")),
log:
"logs/salmon/{sample}.log",
params:
# optional parameters
libtype="A",
extra="--numBootstraps 32",
threads: 2
wrapper:
"v2.2.1/bio/salmon/quant"
rule tximport:
input:
quant=expand(
"pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"]
),
lib=expand(
"pseudo_mapping/{sample}/lib_format_counts.json",
sample=["S1", "S2", "S3", "S4"],
),
aux_info=expand(
"pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"]
),
cmd_info=expand(
"pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"]
),
libparams=expand(
"pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"]
),
logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]),
tx_to_gene="resources/tx2gene.tsv",
output:
txi="tximport/SummarizedExperimentObject.RDS",
params:
extra="type='salmon'",
log:
"logs/tximport.log"
wrapper:
"v2.2.1/bio/tximport"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
- Thibault Dayris
STAR-ARRIBA¶
A subworkflow for fusion detection from RNA-seq data with arriba
. The fusion calling is based on splice-aware, chimeric alignments done with STAR
. STAR
is used with specific parameters to ensure optimal functionality of the arriba
fusion detection, for details, see the documentation.
Example¶
This meta-wrapper can be used by integrating the following into your workflow:
rule star_index:
input:
fasta="resources/genome.fasta",
gtf="resources/genome.gtf",
output:
directory("resources/star_genome"),
threads: 4
params:
sjdbOverhang=100,
extra="--genomeSAindexNbases 2",
log:
"logs/star_index_genome.log",
cache: True # mark as eligible for between workflow caching
wrapper:
"v2.2.1/bio/star/index"
rule star_align:
input:
# use a list for multiple fastq files for one sample
# usually technical replicates across lanes/flowcells
fq1="reads/{sample}_R1.1.fastq",
fq2="reads/{sample}_R2.1.fastq", #optional
idx="resources/star_genome",
annotation="resources/genome.gtf",
output:
# see STAR manual for additional output files
aln="star/{sample}/Aligned.out.bam",
reads_per_gene="star/{sample}/ReadsPerGene.out.tab",
log:
"logs/star/{sample}.log",
params:
# specific parameters to work well with arriba
extra=lambda wc, input: f"--quantMode GeneCounts --sjdbGTFfile {input.annotation}"
" --outSAMtype BAM Unsorted --chimSegmentMin 10 --chimOutType WithinBAM SoftClip"
" --chimJunctionOverhangMin 10 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0"
" --chimScoreSeparation 1 --alignSJstitchMismatchNmax 5 -1 5 5 --chimSegmentReadGapMax 3",
threads: 12
wrapper:
"v2.2.1/bio/star/align"
rule arriba:
input:
bam=rules.star_align.output.aln,
genome="resources/genome.fasta",
annotation="resources/genome.gtf",
# optional: # A custom tsv containing identified artifacts, such as read-through fusions of neighbouring genes.
# default blacklists are selected via blacklist parameter
# see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
custom_blacklist=[],
output:
fusions="results/arriba/{sample}.fusions.tsv",
discarded="results/arriba/{sample}.fusions.discarded.tsv",
params:
# required if blacklist or known_fusions is set
genome_build="GRCh38",
default_blacklist=False,
default_known_fusions=True,
extra="",
log:
"logs/arriba/{sample}.log",
threads: 1
wrapper:
"v2.2.1/bio/arriba"
Note that input, output and log file paths can be chosen freely, as long as the dependencies between the rules remain as listed here. For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Used wrappers¶
The following individual wrappers are used in this meta-wrapper:
Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.
Authors¶
- Jan Forster
Contributing¶
We invite anybody to contribute to the Snakemake Wrapper Repository. If you want to contribute we suggest the following procedure:
- Fork the repository: https://github.com/snakemake/snakemake-wrappers
- Clone your fork locally.
- Locally, create a new branch:
git checkout -b my-new-snakemake-wrapper
- Commit your contributions to that branch and push them to your fork:
git push -u origin my-new-snakemake-wrapper
- Create a pull request.
The pull request will be reviewed and included as fast as possible.
If your pull request does not get a review quickly, you can @mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/> previous contributors to a particular wrapper (git blame
) or regular contributors that you think might be able to give a review.
Contributions should follow the coding style of the already present examples, i.e.:
- provide a
meta.yaml
that describes the wrapper (see the meta.yaml documentation below) - provide an
environment.yaml
which lists all required software packages and follows the respective best practices. The packages should be available for installation via the default anaconda channels or via the conda channels bioconda or conda-forge. Other sustainable community maintained channels are possible as well. - add a
wrapper.py
orwrapper.R
file that can deal with arbitraryinput:
andoutput:
paths. - provide a minimal test case in a subfolder called
test
, with an exampleSnakefile
that shows how to use the wrapper (rule names should be descriptive and written in snake_case), some minimal testing data (also check existing wrappers for suitable data) and add an invocation of the test intest.py
- ensure consistent formatting of Python files and linting of Snakefiles.
meta.yaml
file¶
The following fields are available to use in the wrapper meta.yaml
file. All, except
those marked optional, should be provided.
- name: The name of the wrapper.
- description: a description of what the wrapper does.
- url: URL to the wrapper tool webpage.
- authors: A sequence of names of the people who have contributed to the wrapper.
- input: A mapping or sequence of required inputs for the wrapper.
- output: A mapping or sequence of output(s) from the wrapper.
- params (optional): A mapping of parameters that can be used in the wrapper’s
params
directive. If no parameters are used for the wrapper, this field can be omitted. - notes (optional): Anything of note that does not fit into the scope of the other fields.
You can add a newline to the rendered text in these fields with the addition of |nl|
Example¶
name: seqtk mergepe
description: Interleave two paired-end FASTA/Q files
url: https://github.com/lh3/seqtk
authors:
- Michael Hall
input:
- paired fastq files - can be compressed.
output:
- >
a single, interleaved FASTA/Q file. By default, the output will be compressed,
use the param ``compress_lvl`` to change this.
params:
compress_lvl: >
Regulate the speed of compression using the specified digit,
where 1 indicates the fastest compression method (less compression)
and 9 indicates the slowest compression method (best compression).
0 is no compression. 11 gives a few percent better compression at a severe cost
in execution time, using the zopfli algorithm. The default is 6.
notes: Multiple threads can be used during compression of the output file with ``pigz``.
Formatting¶
Please ensure Python files such as test.py
and wrapper.py
are formatted with
black
. Additionally, please format your test Snakefile
with snakefmt
.
Testing locally¶
If you want to debug your contribution locally (before creating a pull request), you
can install all dependencies with mamba
(or conda
). Install miniconda with the
channels as described for bioconda and
set up an environment with the necessary dependencies and activate it:
mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black
conda activate test-snakemake-wrappers
Afterwards, from the main directory of the repo, you can run the test(s) for your
contribution by specifying an expression
that matches the name(s) of your test(s) via the -k
option of pytest
:
pytest test.py -v -k your_test
If you also want to test the docs generation locally, create another environment and activate it:
mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
conda activate test-snakemake-wrapper-docs
Then, enter the respective directory and build the docs:
cd docs
make html
If it runs through, you can open the main page at docs/_build/html/index.html
in a web browser. If you want to start fresh, you can clean up the build
with make clean
.