DADA2_LEARN_ERRORS

DADA2 Learning error rates separately on paired-end data using dada2 learnErrors function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.

URL:

Example

This wrapper can be used in the following way:

rule learn_pe:
    # Run twice dada2_learn_errors: on forward and on reverse reads
    input: expand("results/dada2/model_{orientation}.RDS", orientation=[1,2])

rule dada2_learn_errors:
    input:
    # Quality filtered and trimmed forward FASTQ files (potentially compressed)
        expand("filtered/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
    output:
        err="results/dada2/model_{orientation}.RDS",# save the error model
        plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    randomize=True
    log:
        "logs/dada2/learn-errors/learn-errors_{orientation}.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "v1.2.0/bio/dada2/learn-errors"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bioconductor-dada2==1.16

Input/Output

Input:

  • A list of quality filtered and trimmed forward FASTQ files (potentially compressed)

Output:

  • err: RDS file with the stored error model
  • plot: plot observed vs estimated errors rates

Params

  • optional arguments for ``learnErrors(), please provide them as python key=value pairs``:

Authors

  • Charlie Pauvert

Code

# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for learning error rates on sequence data using dada2 learnErrors function.

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           fls = snakemake@input,
           multithread=snakemake@threads
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using defaults parameters from dada2::learnErrors()")
}

# Learn errors rates for both read types
err<-do.call(learnErrors, args)

# Plot estimated versus observed error rates to validate models
perr<-plotErrors(err, nominalQ = TRUE)

# Save the plots
library(ggplot2)
ggsave(snakemake@output[["plot"]], perr, width = 8, height = 8, dpi = 300)

# Store the estimated errors as RDS files
saveRDS(err, snakemake@output[["err"]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()