FASTP

trim and QC fastq reads with fastp

Example

This wrapper can be used in the following way:

rule fastp_se:
    input:
        sample=["reads/se/{sample}.fastq"]
    output:
        trimmed="trimmed/se/{sample}.fastq",
        failed="trimmed/se/{sample}.failed.fastq",
        html="report/se/{sample}.html",
        json="report/se/{sample}.json"
    log:
        "logs/fastp/se/{sample}.log"
    params:
        adapters="--adapter_sequence ACGGCTAGCTA",
        extra=""
    threads: 1
    wrapper:
        "v3.9.0-14-g476823b/bio/fastp"


rule fastp_pe:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        trimmed=["trimmed/pe/{sample}.1.fastq", "trimmed/pe/{sample}.2.fastq"],
        # Unpaired reads separately
        unpaired1="trimmed/pe/{sample}.u1.fastq",
        unpaired2="trimmed/pe/{sample}.u2.fastq",
        # or in a single file
#        unpaired="trimmed/pe/{sample}.singletons.fastq",
        merged="trimmed/pe/{sample}.merged.fastq",
        failed="trimmed/pe/{sample}.failed.fastq",
        html="report/pe/{sample}.html",
        json="report/pe/{sample}.json"
    log:
        "logs/fastp/pe/{sample}.log"
    params:
        adapters="--adapter_sequence ACGGCTAGCTA --adapter_sequence_r2 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
        extra="--merge"
    threads: 2
    wrapper:
        "v3.9.0-14-g476823b/bio/fastp"

rule fastp_pe_wo_trimming:
    input:
        sample=["reads/pe/{sample}.1.fastq", "reads/pe/{sample}.2.fastq"]
    output:
        html="report/pe_wo_trimming/{sample}.html",
        json="report/pe_wo_trimming/{sample}.json"
    log:
        "logs/fastp/pe_wo_trimming/{sample}.log"
    params:
        extra=""
    threads: 2
    wrapper:
        "v3.9.0-14-g476823b/bio/fastp"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

The adapters param allows to specify adapter sequences
The extra param allows for additional program arguments.
For more inforamtion see, https://github.com/OpenGene/fastp

Software dependencies

fastp=0.23.4

Input/Output

Input:

fastq file(s)

Output:

trimmed fastq file(s)
unpaired reads (optional; eihter in a single file or separate)
merged reads (optional)
failed reads (optional)
json file containing trimming statistics
html file containing trimming statistics

Authors

Sebastian Kurscheid (sebastian.kurscheid@unibas.ch)
Filipe G. Vieira

Code

__author__ = "Sebastian Kurscheid"
__copyright__ = "Copyright 2019, Sebastian Kurscheid"
__email__ = "sebastian.kurscheid@anu.edu.au"
__license__ = "MIT"

from snakemake.shell import shell
import re

extra = snakemake.params.get("extra", "")
adapters = snakemake.params.get("adapters", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Assert input
n = len(snakemake.input.sample)
assert (
    n == 1 or n == 2
), "input->sample must have 1 (single-end) or 2 (paired-end) elements."


# Input files
if n == 1:
    reads = "--in1 {}".format(snakemake.input.sample)
else:
    reads = "--in1 {} --in2 {}".format(*snakemake.input.sample)


# Output files
trimmed_paths = snakemake.output.get("trimmed", None)
if trimmed_paths:
    if n == 1:
        trimmed = "--out1 {}".format(snakemake.output.trimmed)
    else:
        trimmed = "--out1 {} --out2 {}".format(*snakemake.output.trimmed)

        # Output unpaired files
        unpaired = snakemake.output.get("unpaired", None)
        if unpaired:
            trimmed += f" --unpaired1 {unpaired} --unpaired2 {unpaired}"
        else:
            unpaired1 = snakemake.output.get("unpaired1", None)
            if unpaired1:
                trimmed += f" --unpaired1 {unpaired1}"
            unpaired2 = snakemake.output.get("unpaired2", None)
            if unpaired2:
                trimmed += f" --unpaired2 {unpaired2}"

        # Output merged PE reads
        merged = snakemake.output.get("merged", None)
        if merged:
            if not re.search(r"--merge\b", extra):
                raise ValueError(
                    "output.merged specified but '--merge' option missing from params.extra"
                )
            trimmed += f" --merged_out {merged}"
else:
    trimmed = ""


# Output failed reads
failed = snakemake.output.get("failed", None)
if failed:
    trimmed += f" --failed_out {failed}"


# Stats
html = "--html {}".format(snakemake.output.html)
json = "--json {}".format(snakemake.output.json)


shell(
    "(fastp --thread {snakemake.threads} "
    "{extra} "
    "{adapters} "
    "{reads} "
    "{trimmed} "
    "{json} "
    "{html} ) {log}"
)