.. _`bio/dada2/learn-errors`: DADA2_LEARN_ERRORS ================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/dada2/learn-errors?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/dada2/learn-errors `DADA2 `_ Learning error rates separately on paired-end data using dada2 ``learnErrors`` function. Optional parameters are documented in the `manual `_ and the function is introduced in the dedicated tutorial `section `_. Example ------- This wrapper can be used in the following way: .. code-block:: python rule learn_pe: # Run twice dada2_learn_errors: on forward and on reverse reads input: expand("results/dada2/model_{orientation}.RDS", orientation=[1,2]) rule dada2_learn_errors: input: # Quality filtered and trimmed forward FASTQ files (potentially compressed) expand("filtered/{sample}.{{orientation}}.fastq.gz", sample=["a","b"]) output: err="results/dada2/model_{orientation}.RDS",# save the error model plot="reports/dada2/errors_{orientation}.png",# plot observed and estimated rates # Even though this is an R wrapper, use named arguments in Python syntax # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`) # and lists (`list_arg=[]`) are automatically converted to R. # For a named list as an extra named argument, use a python dict # (`named_list={name1=arg1}`). #params: # randomize=True log: "logs/dada2/learn-errors/learn-errors_{orientation}.log" threads: 1 # set desired number of threads here wrapper: "v3.0.4/bio/dada2/learn-errors" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``bioconductor-dada2=1.28.0`` Input/Output ------------ **Input:** * A list of quality filtered and trimmed forward FASTQ files (potentially compressed) **Output:** * ``err``: RDS file with the stored error model * ``plot``: plot observed vs estimated errors rates Params ------ * ``optional arguments for ``learnErrors()``, please provide them as python ``key=value`` pairs``: Authors ------- * Charlie Pauvert Code ---- .. code-block:: R # __author__ = "Charlie Pauvert" # __copyright__ = "Copyright 2020, Charlie Pauvert" # __email__ = "cpauvert@protonmail.com" # __license__ = "MIT" # Snakemake wrapper for learning error rates on sequence data using dada2 learnErrors function. # Sink the stderr and stdout to the snakemake log file # https://stackoverflow.com/a/48173272 log.file<-file(snakemake@log[[1]],open="wt") sink(log.file) sink(log.file,type="message") library(dada2) # Prepare arguments (no matter the order) args<-list( fls = snakemake@input, multithread=snakemake@threads ) # Check if extra params are passed if(length(snakemake@params) > 0 ){ # Keeping only the named elements of the list for do.call() extra<-snakemake@params[ names(snakemake@params) != "" ] # Add them to the list of arguments args<-c(args, extra) } else{ message("No optional parameters. Using defaults parameters from dada2::learnErrors()") } # Learn errors rates for both read types err<-do.call(learnErrors, args) # Plot estimated versus observed error rates to validate models perr<-plotErrors(err, nominalQ = TRUE) # Save the plots library(ggplot2) ggsave(snakemake@output[["plot"]], perr, width = 8, height = 8, dpi = 300) # Store the estimated errors as RDS files saveRDS(err, snakemake@output[["err"]],compress = T) # Proper syntax to close the connection for the log file # but could be optional for Snakemake wrapper sink(type="message") sink() .. |nl| raw:: html