NONPAREIL INFER
Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve “nearly complete coverage”.
URL: https://nonpareil.readthedocs.io/en/latest/
Example
This wrapper can be used in the following way:
rule nonpareil:
input:
"reads/{sample}",
output:
redund_sum="results/{sample}.npo",
redund_val="results/{sample}.npa",
mate_distr="results/{sample}.npc",
log="results/{sample}.log",
log:
"logs/{sample}.log",
params:
alg="kmer",
infer_X=True,
extra="-k 3 -F",
threads: 2
resources:
mem_mb=50,
wrapper:
"v9.4.2/bio/nonpareil/infer"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
For a PDF version of the manual, see https://nonpareil.readthedocs.io/_/downloads/en/latest/pdf/
Software dependencies
nonpareil=3.5.5pbzip2=1.1.13snakemake-wrapper-utils=0.8.0
Input/Output
Input:
reads in FASTA/Q format (can be gziped or bziped)
Output:
redund_sum: redundancy summary TSV file with six columns, representing sequencing effort, summary of the distribution of redundancy (average redundancy, standard deviation, quartile 1, median, and quartile 3).redund_val: redundancy values TSV file with three columns (similar to redundancy summary, but provides ALL results), representing sequencing effort, ID of the replicate and estimated redundancy value.mate_distr: mate distribution file, with the number of reads in the dataset matching a query read.log: log of internal Nonpareil processing.
Params
alg: nonpareil algorithm, either kmer or alignment (mandatory).infer_X: automatically infer value of -X (couple of minutes slower to count number of reads)extra: additional program arguments (not -X if infer_X == True)
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"
import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem, get_format
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
mem_mb = get_mem(snakemake, out_unit="MiB")
# Infer output format
in_format = get_format(snakemake.input[0])
# Redundancy summary
redund_sum = snakemake.output.get("redund_sum", "")
if redund_sum:
extra += f" -o {redund_sum}"
# Redundancy values
redund_val = snakemake.output.get("redund_val", "")
if redund_val:
extra += f" -a {redund_val}"
# Mate distribution
mate_distr = snakemake.output.get("mate_distr", "")
if mate_distr:
extra += f" -C {mate_distr}"
# Log
out_log = snakemake.output.get("log", "")
if out_log:
extra += f" -l {out_log}"
with tempfile.NamedTemporaryFile() as tmp:
# Uncompress bzip2 input files
if Path(snakemake.input[0]).suffix == ".bz2":
in_uncomp = tmp.name
shell(
"pbzip2 -p{snakemake.threads} --decompress --stdout {snakemake.input[0]} > {tmp.name}"
)
else:
in_uncomp = snakemake.input[0]
# Check if input file is empty
empty = Path(in_uncomp).stat().st_size == 0
if Path(in_uncomp).suffix == ".gz" and Path(in_uncomp).stat().st_size < 50:
import gzip
with gzip.open(in_uncomp, "rt") as f:
empty = not bool(f.readlines())
# If empty, provide dummy input file so that program does not crash
# https://github.com/lmrodriguezr/nonpareil/issues/71
if empty:
in_uncomp = tmp.name
with open(in_uncomp, "wt") as f:
if in_format == "fasta":
f.write("\n".join([">dummy1", "N" * 100]))
elif in_format == "fastq":
f.write("\n".join(["@dummy1", "N" * 100, "+", "!" * 100]))
else:
raise ValueError(f"invalid format: {in_format}")
f.write("\n")
shell(
"nonpareil"
" -t {snakemake.threads}"
" -R {mem_mb}"
" -T {snakemake.params.alg}"
" -s {in_uncomp}"
" -f {in_format}"
" {extra}"
" {log}"
)
if empty and redund_sum:
with open(redund_sum, "at") as f:
f.write("\t".join(["0.000000"] + ["0.00000"] * 5) + "\n")