ADAPTERREMOVAL

rapid adapter trimming, identification, and read merging.

Example

This wrapper can be used in the following way:

rule adapterremoval_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        fq="trimmed/se/{sample}.fastq.gz",                               # trimmed reads
        discarded="trimmed/se/{sample}.discarded.fastq.gz",              # reads that did not pass filters
        settings="stats/se/{sample}.settings"                            # parameters as well as overall statistics
    log:
        "logs/adapterremoval/se/{sample}.log"
    params:
        adapters="--adapter1 ACGGCTAGCTA",
        extra="",
    threads: 1
    wrapper:
        "0.75.0-7-g74e079c/bio/adapterremoval"


rule adapterremoval_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        fq1="trimmed/pe/{sample}_R1.fastq.gz",                           # trimmed mate1 reads
        fq2="trimmed/pe/{sample}_R2.fastq.gz",                           # trimmed mate2 reads
        collapsed="trimmed/pe/{sample}.collapsed.fastq.gz",              # overlapping mate-pairs which have been merged into a single read
        collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz",  # collapsed reads that were quality trimmed
        singleton="trimmed/pe/{sample}.singleton.fastq.gz",              # mate-pairs for which the mate has been discarded
        discarded="trimmed/pe/{sample}.discarded.fastq.gz",              # reads that did not pass filters
        settings="stats/pe/{sample}.settings"                            # parameters as well as overall statistics
    log:
        "logs/adapterremoval/pe/{sample}.log"
    params:
        adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
        extra="--collapse --collapse-deterministic",
    threads: 2
    wrapper:
        "0.75.0-7-g74e079c/bio/adapterremoval"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • adapterremoval=2.3

Input/Output

Input:

  • raw fastq file with R1 reads
  • raw fastq file with R2 reads (PE only)

Output:

  • trimmed fastq file with R1 reads
  • trimmed fastq file with R2 reads (PE only)
  • fastq file with singleton reads (PE only; PE reads for which the mate has been discarded)
  • fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read)
  • fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed)
  • fastq file with discarded reads (reads that did not pass filters)
  • settings and stats file

Notes

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"

from snakemake.shell import shell
from pathlib import Path
import re

extra = snakemake.params.get("extra", "") + " "
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Check input files
n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."


# Input files
if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra:
    reads = "--file1 {}".format(snakemake.input.sample)
else:
    reads = "--file1 {} --file2 {}".format(*snakemake.input.sample)


# Gzip or Bzip compressed output?
compress_out = ""
if all(
    [
        Path(value).suffix == ".gz"
        for key, value in snakemake.output.items()
        if key != "settings"
    ]
):
    compress_out = "--gzip"
elif all(
    [
        Path(value).suffix == ".bz2"
        for key, value in snakemake.output.items()
        if key != "settings"
    ]
):
    compress_out = "--bzip2"
else:
    raise ValueError(
        "all output files (except for 'settings') must be compressed the same way"
    )


# Output files
if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra:
    trimmed = f"--output1 {snakemake.output.fq}"
else:
    trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}"

    # Output singleton files
    singleton = snakemake.output.get("singleton", None)
    if singleton:
        trimmed += f" --singleton {singleton}"

    # Output collapsed PE reads
    collapsed = snakemake.output.get("collapsed", None)
    if collapsed:
        if not re.search(r"--collapse\b", extra):
            raise ValueError(
                "output.collapsed specified but '--collapse' option missing from params.extra"
            )
        trimmed += f" --outputcollapsed {collapsed}"

    # Output collapsed and truncated PE reads
    collapsed_trunc = snakemake.output.get("collapsed_trunc", None)
    if collapsed_trunc:
        if not re.search(r"--collapse\b", extra):
            raise ValueError(
                "output.collapsed_trunc specified but '--collapse' option missing from params.extra"
            )
        trimmed += f" --outputcollapsedtruncated {collapsed_trunc}"


shell(
    "(AdapterRemoval --threads {snakemake.threads} "
    "{reads} "
    "{adapters} "
    "{extra} "
    "{compress_out} "
    "{trimmed} "
    "--discarded {snakemake.output.discarded} "
    "--settings {snakemake.output.settings}"
    ") {log}"
)