.. _`bio/adapterremoval`: ADAPTERREMOVAL ============== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/adapterremoval?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/adapterremoval rapid adapter trimming, identification, and read merging. **URL**: https://adapterremoval.readthedocs.io/en/latest/ Example ------- This wrapper can be used in the following way: .. code-block:: python rule adapterremoval_se: input: sample=["reads/se/{sample}.fastq"] output: fq="trimmed/se/{sample}.fastq.gz", # trimmed reads discarded="trimmed/se/{sample}.discarded.fastq.gz", # reads that did not pass filters settings="stats/se/{sample}.settings" # parameters as well as overall statistics log: "logs/adapterremoval/se/{sample}.log" params: adapters="--adapter1 ACGGCTAGCTA", extra="", threads: 1 wrapper: "v3.0.1/bio/adapterremoval" rule adapterremoval_pe: input: sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"] output: fq1="trimmed/pe/{sample}_R1.fastq.gz", # trimmed mate1 reads fq2="trimmed/pe/{sample}_R2.fastq.gz", # trimmed mate2 reads collapsed="trimmed/pe/{sample}.collapsed.fastq.gz", # overlapping mate-pairs which have been merged into a single read collapsed_trunc="trimmed/pe/{sample}.collapsed_trunc.fastq.gz", # collapsed reads that were quality trimmed singleton="trimmed/pe/{sample}.singleton.fastq.gz", # mate-pairs for which the mate has been discarded discarded="trimmed/pe/{sample}.discarded.fastq.gz", # reads that did not pass filters settings="stats/pe/{sample}.settings" # parameters as well as overall statistics log: "logs/adapterremoval/pe/{sample}.log" params: adapters="--adapter1 ACGGCTAGCTA --adapter2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC", extra="--collapse --collapse-deterministic", threads: 2 wrapper: "v3.0.1/bio/adapterremoval" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * All output files, except for 'settings', must be compressed the same way (gz, or bz2). Software dependencies --------------------- * ``adapterremoval=2.3.3`` Input/Output ------------ **Input:** * ``sample``: ['raw fastq file with R1 reads', 'raw fastq file with R2 reads (PE only)'] **Output:** * ``fq``: path to single fastq file (SE only) * ``fq1``: path to fastq R1 (PE only) * ``fq2``: path to fastq R2 (PE only) * ``singleton``: fastq file with singleton reads (PE only; PE reads for which the mate has been discarded) * ``collapsed``: fastq file with collapsed reads (PE only; overlapping mate-pairs which have been merged into a single read) * ``collapsed_trunc``: fastq file with collapsed truncated reads (PE only; collapsed reads that were quality trimmed) * ``discarded``: fastq file with discarded reads (reads that did not pass filters) * ``settings``: settings and stats file Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2020, Filipe G. Vieira" __license__ = "MIT" from snakemake.shell import shell from pathlib import Path import re extra = snakemake.params.get("extra", "") + " " adapters = snakemake.params.get("adapters", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) # Check input files n = len(snakemake.input.sample) assert ( n == 1 or n == 2 ), "input->sample must have 1 (single-end) or 2 (paired-end) elements." # Input files if n == 1 or "--interleaved " in extra or "--interleaved-input " in extra: reads = "--file1 {}".format(snakemake.input.sample) else: reads = "--file1 {} --file2 {}".format(*snakemake.input.sample) # Gzip or Bzip compressed output? compress_out = "" if all( [ Path(value).suffix == ".gz" for key, value in snakemake.output.items() if key != "settings" ] ): compress_out = "--gzip" elif all( [ Path(value).suffix == ".bz2" for key, value in snakemake.output.items() if key != "settings" ] ): compress_out = "--bzip2" else: raise ValueError( "all output files (except for 'settings') must be compressed the same way" ) # Output files if n == 1 or "--interleaved " in extra or "--interleaved-output " in extra: trimmed = f"--output1 {snakemake.output.fq}" else: trimmed = f"--output1 {snakemake.output.fq1} --output2 {snakemake.output.fq2}" # Output singleton files singleton = snakemake.output.get("singleton", None) if singleton: trimmed += f" --singleton {singleton}" # Output collapsed PE reads collapsed = snakemake.output.get("collapsed", None) if collapsed: if not re.search(r"--collapse\b", extra): raise ValueError( "output.collapsed specified but '--collapse' option missing from params.extra" ) trimmed += f" --outputcollapsed {collapsed}" # Output collapsed and truncated PE reads collapsed_trunc = snakemake.output.get("collapsed_trunc", None) if collapsed_trunc: if not re.search(r"--collapse\b", extra): raise ValueError( "output.collapsed_trunc specified but '--collapse' option missing from params.extra" ) trimmed += f" --outputcollapsedtruncated {collapsed_trunc}" shell( "(AdapterRemoval --threads {snakemake.threads} " "{reads} " "{adapters} " "{extra} " "{compress_out} " "{trimmed} " "--discarded {snakemake.output.discarded} " "--settings {snakemake.output.settings}" ") {log}" ) .. |nl| raw:: html