NONPAREIL¶

Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve “nearly complete coverage”.

URL: https://nonpareil.readthedocs.io/en/latest/

Example¶

This wrapper can be used in the following way:

rule nonpareil:
    input:
        "reads/{sample}",
    output:
        redund_sum="results/{sample}.npo",
        redund_val="results/{sample}.npa",
        mate_distr="results/{sample}.npc",
        log="results/{sample}.log",
    log:
        "logs/{sample}.log",
    params:
        alg="kmer",
        extra="-X 1 -k 3 -F",
    wrapper:
        "v1.31.0/bio/nonpareil"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies¶

nonpareil=3.4.1

Input/Output¶

Input:

reads in FASTA/Q format (can be gziped or bziped)

Output:

redund_sum: redundancy summary TSV file with six columns, representing sequencing effort, summary of the distribution of redundancy (average redundancy, standard deviation, quartile 1, median, and quartile 3).
redund_val: redundancy values TSV file with three columns (similar to redundancy summary, but provides ALL results), representing sequencing effort, ID of the replicate and estimated redundancy value.
mate_distr: mate distribution file, with the number of reads in the dataset matching a query read.
log: log of internal Nonpareil processing.

Params¶

alg: nonpareil algorithm, either kmer or alignment (mandatory).
extra: additional program arguments

Authors¶

Filipe G. Vieira

Code¶

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2023, Filipe G. Vieira"
__license__ = "MIT"

from os import path
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

uncomp = ""
in_name, in_ext = path.splitext(snakemake.input[0])
if in_ext in [".gz", ".bz2"]:
    uncomp = "zcat" if in_ext == ".gz" else "bzcat"
    in_name, in_ext = path.splitext(in_name)

# Infer output format
if in_ext in [".fa", ".fas", ".fasta"]:
    in_format = "fasta"
elif in_ext in [".fq", ".fastq"]:
    in_format = "fastq"
else:
    raise ValueError("invalid input format")

# Redundancy summary
redund_sum = snakemake.output.get("redund_sum", "")
if redund_sum:
    redund_sum = f"-o {redund_sum}"

# Redundancy values
redund_val = snakemake.output.get("redund_val", "")
if redund_val:
    redund_val = f"-a {redund_val}"

# Mate distribution
mate_distr = snakemake.output.get("mate_distr", "")
if mate_distr:
    mate_distr = f"-C {mate_distr}"

# Log
out_log = snakemake.output.get("log", "")
if out_log:
    out_log = f"-l {out_log}"


with tempfile.NamedTemporaryFile() as tmp:
    in_uncomp = snakemake.input[0]
    if uncomp:
        in_uncomp = tmp.name
        shell("{uncomp} {snakemake.input[0]} > {in_uncomp}")

    shell(
        "nonpareil"
        " -T {snakemake.params.alg}"
        " -s {in_uncomp}"
        " -f {in_format}"
        " {extra}"
        " {redund_sum}"
        " {redund_val}"
        " {mate_distr}"
        " {out_log}"
        " {log}"
    )