DADA2_ASSIGN_SPECIES
DADA2
Classifying sequences against a reference database using dada2 assignSpecies
function. Optional parameters are documented in the manual and an example of the function can be found in the dedicated section of the DADA2 website.
Example
This wrapper can be used in the following way:
rule dada2_assign_species:
input:
seqs="results/dada2/seqTab.nochim.RDS", # Chimera-free sequence table
refFasta="resources/species.fasta" # Reference FASTA for Genus-Species taxonomy
output:
"results/dada2/genus-species-taxa.RDS" # Genus-Species taxonomic assignments
# Even though this is an R wrapper, use named arguments in Python syntax
# here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`)
# and lists (`list_arg=[]`) are automatically converted to R.
# For a named list as an extra named argument, use a python dict
# (`named_list={name1=arg1}`).
#params:
# allowMultiple=True
log:
"logs/dada2/assign-species/assign-species.log"
threads: 1 # set desired number of threads here
wrapper:
"v5.0.1/bio/dada2/assign-species"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
bioconductor-dada2=1.30.0
Input/Output
Input:
seqs
: RDS file with the chimera-free sequence tablerefFasta
: A string with the path to the genus-species FASTA reference database
Output:
RDS file containing the genus and species taxonomic assignments
Params
optional arguments for ``assignTaxonomy()
, please provide them as pythonkey=value
pairs``:
Code
# __author__ = "Charlie Pauvert"
# __copyright__ = "Copyright 2020, Charlie Pauvert"
# __email__ = "cpauvert@protonmail.com"
# __license__ = "MIT"
# Snakemake wrapper for exact matching of sequences against
# a genus-species reference database using dada2 assignSpecies function.
# Sink the stderr and stdout to the snakemake log file
# https://stackoverflow.com/a/48173272
log.file<-file(snakemake@log[[1]],open="wt")
sink(log.file)
sink(log.file,type="message")
library(dada2)
# Prepare arguments (no matter the order)
args<-list(
seqs = readRDS(snakemake@input[["seqs"]]),
refFasta = snakemake@input[["refFasta"]]
)
# Check if extra params are passed
if(length(snakemake@params) > 0 ){
# Keeping only the named elements of the list for do.call()
extra<-snakemake@params[ names(snakemake@params) != "" ]
# Add them to the list of arguments
args<-c(args, extra)
} else{
message("No optional parameters. Using default parameters from dada2::assignSpecies()")
}
# Perform Genus-Species taxonomic assignments
taxa<-do.call(assignSpecies, args)
# Store the taxonomic assignments as a RDS file
saveRDS(taxa, snakemake@output[[1]],compress = T)
# Proper syntax to close the connection for the log file
# but could be optional for Snakemake wrapper
sink(type="message")
sink()