.. _`bio/dada2/filter-trim`: DADA2_FILTER_TRIM ================= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/dada2/filter-trim?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/dada2/filter-trim `DADA2 `_ Quality filtering of single or paired-end reads using dada2 ``filterAndTrim`` function. Optional parameters are documented in the `manual `_ and the function is introduced in the dedicated tutorial `section `_. Example ------- This wrapper can be used in the following way: .. code-block:: python rule dada2_filter_trim_se: input: # Single-end files without primers sequences fwd="trimmed/{sample}.1.fastq.gz" output: filt="filtered-se/{sample}.1.fastq.gz", stats="reports/dada2/filter-trim-se/{sample}.tsv" # Even though this is an R wrapper, use named arguments in Python syntax # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`) # and lists (`list_arg=[]`) are automatically converted to R. # For a named list as an extra named argument, use a python dict # (`named_list={name1=arg1}`). params: # Set the maximum expected errors tolerated in filtered reads maxEE=1, # Set the number of kept bases to 7 for the toy example truncLen=7, # Set minLen to 1 for the toy example but default is 20 minLen=1 log: "logs/dada2/filter-trim-se/{sample}.log" threads: 1 # set desired number of threads here wrapper: "v3.0.4/bio/dada2/filter-trim" rule dada2_filter_trim_pe: input: # Paired-end files without primers sequences fwd="trimmed/{sample}.1.fastq", rev="trimmed/{sample}.2.fastq" output: filt="filtered-pe/{sample}.1.fastq.gz", filt_rev="filtered-pe/{sample}.2.fastq.gz", stats="reports/dada2/filter-trim-pe/{sample}.tsv" # Even though this is an R wrapper, use named arguments in Python syntax # here, to specify extra parameters. Python booleans (`arg1=True`, `arg2=False`) # and lists (`list_arg=[]`) are automatically converted to R. # For a named list as an extra named argument, use a python dict # (`named_list={name1=arg1}`). params: # Set the maximum expected errors tolerated in filtered reads maxEE=1, # Set the number of kept bases in forward and reverse reads # respectively to 7 for the toy example truncLen=[7,6], # Set minLen to 1 for the toy example but default is 20 minLen=1 log: "logs/dada2/filter-trim-pe/{sample}.log" threads: 1 # set desired number of threads here wrapper: "v3.0.4/bio/dada2/filter-trim" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``bioconductor-dada2=1.28.0`` Input/Output ------------ **Input:** * ``fwd``: a forward FASTQ file (potentially compressed) without primer sequences * ``rev``: an (optional) reverse FASTQ file (potentially compressed) without primer sequences **Output:** * ``filt``: a compressed filtered forward FASTQ file * ``filt_rev``: an (optional) compressed filtered reverse FASTQ file * ``stats``: a .tsv file with the number of processed and filtered reads per sample Params ------ * ``optional arguments for ``filterAndTrim()``, please provide them as python ``key=value`` pairs``: Authors ------- * Charlie Pauvert Code ---- .. code-block:: R # __author__ = "Charlie Pauvert" # __copyright__ = "Copyright 2020, Charlie Pauvert" # __email__ = "cpauvert@protonmail.com" # __license__ = "MIT" # Snakemake wrapper for filtering single or paired-end reads using dada2 filterAndTrim function. # Sink the stderr and stdout to the snakemake log file # https://stackoverflow.com/a/48173272 log.file<-file(snakemake@log[[1]],open="wt") sink(log.file) sink(log.file,type="message") library(dada2) # Prepare arguments (no matter the order) args<-list( fwd = snakemake@input[["fwd"]], filt = snakemake@output[["filt"]], multithread=snakemake@threads ) # Test if paired end input is passed if(!is.null(snakemake@input[["rev"]]) & !is.null(snakemake@output[["filt_rev"]])){ args<-c(args, rev = snakemake@input[["rev"]], filt.rev = snakemake@output[["filt_rev"]] ) } # Check if extra params are passed if(length(snakemake@params) > 0 ){ # Keeping only the named elements of the list for do.call() extra<-snakemake@params[ names(snakemake@params) != "" ] # Check if 'compress=' option is passed if(!is.null(extra[["compress"]])){ stop("Remove the `compress=` option from `params`.\n", "The `compress` option is implicitly set here from the file extension.") } else { # Check if output files are given as compressed files # ex: in se version, all(TRUE, NULL) gives TRUE compressed <- c( endsWith(args[["filt"]], '.gz'), if(is.null(args[["filt.rev"]])) NULL else {endsWith(args[["filt.rev"]], 'gz')} ) if ( all(compressed) ) { extra[["compress"]] <- TRUE } else if ( any(compressed) ) { stop("Either all or no fastq output should be compressed. Please check `output.filt` and `output.filt_rev` for consistency.") } else { extra[["compress"]] <- FALSE } } # Add them to the list of arguments args<-c(args, extra) } else { message("No optional parameters. Using default parameters from dada2::filterAndTrim()") } # Call the function with arguments filt.stats<-do.call(filterAndTrim, args) # Write processed reads report write.table(filt.stats, snakemake@output[["stats"]], sep="\t", quote=F) # Proper syntax to close the connection for the log file # but could be optional for Snakemake wrapper sink(type="message") sink() .. |nl| raw:: html