DADA2_LEARN_ERRORS¶
DADA2
Learning error rates separately on paired-end data using dada2 learnErrors
function. Optional parameters are documented in the manual and the function is introduced in the dedicated tutorial section.
Example¶
This wrapper can be used in the following way:
rule learn_pe:
# Run twice dada2_learn_errors: on forward and on reverse reads
input: expand("results/dada2/model_{orientation}.RDS", orientation=[1,2])
rule dada2_learn_errors:
input:
# Quality filtered and trimmed forward FASTQ files (potentially compressed)
expand("filtered/{sample}.{{orientation}}.fastq.gz", sample=["a","b"])
output:
err="results/dada2/model_{orientation}.RDS",# save the error model
plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# randomize=True
log:
"logs/dada2/learn-errors/learn-errors_{orientation}.log"
threads: 1 # set desired number of threads here
wrapper:
"v1.23.1/bio/dada2/learn-errors"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bioconductor-dada2=1.26.0
Input/Output¶
Input:
- A list of quality filtered and trimmed forward FASTQ files (potentially compressed)
Output:
err
: RDS file with the stored error modelplot
: plot observed vs estimated errors rates
Params¶
optional arguments for ``learnErrors()
, please provide them as pythonkey=value
pairs``:
Authors¶
- Charlie Pauvert
Code¶
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for learning error rates on sequence data using dada2 learnErrors function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
fls = snakemake@input,
multithread=snakemake@threads
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using defaults parameters from dada2::learnErrors()")
}
# Learn errors rates for both read types
err<-do.call(learnErrors, args)
# Plot estimated versus observed error rates to validate models
perr<-plotErrors(err, nominalQ = TRUE)
# Save the plots
library(ggplot2)
ggsave(snakemake@output[["plot"]], perr, width = 8, height = 8, dpi = 300)
# Store the estimated errors as RDS files
saveRDS(err, snakemake@output[["err"]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()