ADAPTERREMOVAL
rapid adapter trimming, identification, and read merging.
URL: https://adapterremoval.readthedocs.io/en/latest/
Example
This wrapper can be used in the following way:
rule adapterremoval_se:
input:
sample=["reads/se/{sample}.fastq"]
output:
fq="trimmed/se/{sample}.fastq.gz", # trimmed reads
discarded="trimmed/se/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/se/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/se/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA",
extra="",
threads: 1
wrapper:
"v5.0.0/bio/adapterremoval"
rule adapterremoval_pe:
input:
sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
output:
fq1="trimmed/pe/{sample}_R1.fastq.gz", # trimmed mate1 reads
fq2="trimmed/pe/{sample}_R2.fastq.gz", # trimmed mate2 reads
collapsed="trimmed/pe/{sample}.collapsed.fastq.gz", # overlapping mate-pairs which have been merged into a single read
collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz", # collapsed reads that were quality trimmed
singleton="trimmed/pe/{sample}.singleton.fastq.gz", # mate-pairs for which the mate has been discarded
discarded="trimmed/pe/{sample}.discarded.fastq.gz", # reads that did not pass filters
settings="stats/pe/{sample}.settings" # parameters as well as overall statistics
log:
"logs/adapterremoval/pe/{sample}.log"
params:
adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
extra="--collapse --collapse-deterministic",
threads: 2
wrapper:
"v5.0.0/bio/adapterremoval"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
All output files, except for ‘settings’, must be compressed the same way (gz, or bz2).
Software dependencies
adapterremoval=2.3.4
Input/Output
Input:
sample
: [‘raw fastq file with R1 reads’, ‘raw fastq file with R2 reads (PE only)’]
Output:
fq
: path to single fastq file (SE only)fq1
: path to fastq R1 (PE only)fq2
: path to fastq R2 (PE only)singleton
: fastq file with singleton reads (PE only; PE reads for which the mate has been discarded)collapsed
: fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read)collapsed_trunc
: fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed)discarded
: fastq file with discarded reads (reads that did not pass filters)settings
: settings and stats file
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"
from snakemake.shell import shell
from pathlib import Path
import re
extra = snakemake.params.get("extra", "") + " "
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Check input files
n = len(snakemake.input.sample)
assert (
n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."
# Input files
if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra:
reads = "--file1 {}".format(snakemake.input.sample)
else:
reads = "--file1 {} --file2 {}".format(*snakemake.input.sample)
# Gzip or Bzip compressed output?
compress_out = ""
if all(
[
Path(value).suffix == ".gz"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--gzip"
elif all(
[
Path(value).suffix == ".bz2"
for key, value in snakemake.output.items()
if key != "settings"
]
):
compress_out = "--bzip2"
else:
raise ValueError(
"all output files (except for 'settings') must be compressed the same way"
)
# Output files
if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra:
trimmed = f"--output1 {snakemake.output.fq}"
else:
trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}"
# Output singleton files
singleton = snakemake.output.get("singleton", None)
if singleton:
trimmed += f" --singleton {singleton}"
# Output collapsed PE reads
collapsed = snakemake.output.get("collapsed", None)
if collapsed:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsed {collapsed}"
# Output collapsed and truncated PE reads
collapsed_trunc = snakemake.output.get("collapsed_trunc", None)
if collapsed_trunc:
if not re.search(r"--collapse\b", extra):
raise ValueError(
"output.collapsed_trunc specified but '--collapse' option missing from params.extra"
)
trimmed += f" --outputcollapsedtruncated {collapsed_trunc}"
shell(
"(AdapterRemoval --threads {snakemake.threads} "
"{reads} "
"{adapters} "
"{extra} "
"{compress_out} "
"{trimmed} "
"--discarded {snakemake.output.discarded} "
"--settings {snakemake.output.settings}"
") {log}"
)