DADA2_COLLAPSE_NOMISMATCH

DADA2 Combine together sequences that are identical up to shifts and/or indels using dada2 collapseNoMismatch function. Optional parameters are documented in the manual. While the function is not included in the tutorial, feel free to browse the dada2 issues for showcases.

Example

This wrapper can be used in the following way:

rule dada2_collapse_nomismatch:
    input:
        "results/dada2/seqTab.nochimeras.RDS" # Chimera-free sequence table
    output:
        "results/dada2/seqTab.collapsed.RDS"
    # Even though this is an R wrapper, use named arguments in Python syntax
    # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
    # and lists (`list_arg=[]`) are automatically converted to R.
    # For a named list as an extra named argument, use a python dict
    #   (`named_list={name1=arg1}`).
    #params:
    #    verbose=True
    log:
        "logs/dada2/collapse-nomismatch/collapse-nomismatch.log"
    threads: 1 # set desired number of threads here
    wrapper:
        "0.72.0/bio/dada2/collapse-nomismatch"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bioconductor-dada2==1.16

Input/Output

Input:

  • RDS file with the chimera-free sequence table

Output:

  • RDS file with the sequence table where the needed sequences were collapsed

Authors

  • Charlie Pauvert

Code

# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"

# Snakemake wrapper for combining together sequences that are identical
# up to shifts and/or indels using dada2 collapseNoMismatch function

# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")

library(dada2)

# Prepare arguments (no matter the order)
args<-list(
           seqtab = readRDS(snakemake@input[[1]])
           )
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
       # Keeping only the named elements of the list for do.call()
       extra<-snakemake@params[ names(snakemake@params) != "" ]
       # Add them to the list of arguments
       args<-c(args, extra)
} else{
    message("No optional parameters. Using default parameters from dada2::collapseNoMismatch()")
}

# Collapse sequences
taxa<-do.call(collapseNoMismatch, args)

# Store the resulting table as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)

# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()