ADAPTERREMOVAL¶
rapid adapter trimming, identification, and read merging.
URL: https://adapterremoval.readthedocs.io/en/latest/
Example¶
This wrapper can be used in the following way:
rule adapterremoval_se:
input:
sample=["reads/se/{sample}.fastq"]
output:
fq="trimmed/se/{sample}.fastq.gz", # trimmed reads
discarded="trimmed/se/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/se/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/se/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA",
extra="",
threads: 1
wrapper:
"v2.6.0-35-g755343f/bio/adapterremoval"
rule adapterremoval_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
fq1="trimmed/pe/{sample}_R1.fastq.gz", # trimmed mate1 reads
fq2="trimmed/pe/{sample}_R2.fastq.gz", # trimmed mate2 reads
collapsed="trimmed/pe/{sample}.collapsed.fastq.gz", # overlapping mate-pairs which have been merged into a single read
collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz", # collapsed reads that were quality trimmed
singleton="trimmed/pe/{sample}.singleton.fastq.gz", # mate-pairs for which the mate has been discarded
discarded="trimmed/pe/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/pe/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/pe/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
extra="--collapse --collapse-deterministic",
threads: 2
wrapper:
"v2.6.0-35-g755343f/bio/adapterremoval"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- All output files, except for ‘settings’, must be compressed the same way (gz, or bz2).
Software dependencies¶
adapterremoval=2.3.3
Input/Output¶
Input:
sample
: [‘raw fastq file with R1 reads’, ‘raw fastq file with R2 reads (PE only)’]
Output:
fq
: path to single fastq file (SE only)fq1
: path to fastq R1 (PE only)fq2
: path to fastq R2 (PE only)singleton
: fastq file with singleton reads (PE only; PE reads for which the mate has been discarded)collapsed
: fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read)collapsed_trunc
: fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed)discarded
: fastq file with discarded reads (reads that did not pass filters)settings
: settings and stats file
Authors¶
- Filipe G. Vieira
Code¶
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
import re
extra = snakemake.params.get("extra", "") + " "
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Check input files
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
# Input files
if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra:
reads = "--file1 {}".format(snakemake.input.sample)
else:
reads = "--file1 {} --file2 {}".format(*snakemake.input.sample)
# Gzip or Bzip compressed output?
compress_out = ""
if all(
[
Path(value).suffix == ".gz"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--gzip"
elif all(
[
Path(value).suffix == ".bz2"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--bzip2"
else:
raise ValueError(
"all output files (except for 'settings') must be compressed the same way"
)
# Output files
if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra:
trimmed = f"--output1 {snakemake.output.fq}"
else:
trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}"
# Output singleton files
singleton = snakemake.output.get("singleton", None)
if singleton:
trimmed += f" --singleton {singleton}"
# Output collapsed PE reads
collapsed = snakemake.output.get("collapsed", None)
if collapsed:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsed {collapsed}"
# Output collapsed and truncated PE reads
collapsed_trunc = snakemake.output.get("collapsed_trunc", None)
if collapsed_trunc:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed_trunc specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsedtruncated {collapsed_trunc}"
shell(
"(AdapterRemoval --threads {snakemake.threads} "
"{reads} "
"{adapters} "
"{extra} "
"{compress_out} "
"{trimmed} "
"--discarded {snakemake.output.discarded} "
"--settings {snakemake.output.settings}"
") {log}"
)