EMU ABUNDANCE

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/emu/abundance?label=version%20update%20pull%20requests

Generate relative abundance estimates from ONT, Pac-Bio or short 16S reads using emu.

URL: https://github.com/treangenlab/emu

Example

This wrapper can be used in the following way:

rule abundance:
    input:
        reads="{sample}.fa",
        db="database",
    output:
        abundances="{sample}_rel-abundance.tsv",
        alignments="{sample}_emu_alignments.sam",
        unclassified="{sample}_unclassified.fas",
        unmapped="{sample}_unmapped.fas",
    log:
        "logs/emu/{sample}_abundance.log",
    params:
        extra="--type map-ont --keep-counts",
    threads: 3  # optional, defaults to 1
    wrapper:
        "v5.0.0/bio/emu/abundance"


rule abundance_paired:
    input:
        reads=["{sample}_R1.fq", "{sample}_R2.fq"],
        db="database",
    output:
        abundances="{sample}_rel-abundance_paired.tsv",
        alignments="{sample}_emu_alignments_paired.sam",
        unclassified="{sample}_unclassified_paired.fq",
        unmapped="{sample}_unmapped_paired.fq",
    log:
        "logs/emu/{sample}_abundance_paired.log",
    params:
        extra="--type sr --keep-counts",
    threads: 3  # optional, defaults to 1
    wrapper:
        "v5.0.0/bio/emu/abundance"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • emu=3.5.0

Input/Output

Input:

  • reads: single FASTA file or paired FASTQ files

  • db: emu database (optional; check documentation for pre-built databases and how to build them).

Output:

  • abundances: TSV with relative (and optionally, absolute abundances).

  • alignments: SAM file with the alignments (optional).

  • unclassified: FASTA/Q file with unclassified sequences (optional).

  • unmapped: FASTA/Q file with unmapped sequences (optional).

Params

  • extra: Any optimal parameter such as –type (sequencer) or –min-abundance. Optional flags involving output are handled automatically (e.g. –output-dir, –output-basename …)

Authors

  • Curro Campuzano

Code

__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import tempfile
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

# Infer format of input file
in_fmt = "fasta"
if isinstance(snakemake.input.reads, list) and len(snakemake.input.reads) == 2:
    in_fmt = "fastq"

if db := snakemake.input.get("db", ""):
    db = f"--db {db}"

with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "emu abundance {snakemake.input.reads} {db}"
        " --keep-files --output-dir {tmpdir}"
        " --output-basename output --output-unclassified"
        " --threads {snakemake.threads}"
        " {extra}"
        " {log}"
    )
    if out_tsv := snakemake.output.get("abundances"):
        shell("mv {tmpdir}/output_rel-abundance.tsv {out_tsv}")
    if out_sam := snakemake.output.get("alignments"):
        shell("mv {tmpdir}/output_emu_alignments.sam {out_sam}")
    if out_unclassified_fq := snakemake.output.get("unclassified"):
        shell("mv {tmpdir}/output_unclassified_mapped.{in_fmt} {out_unclassified_fq}")
    if out_unmapped_fq := snakemake.output.get("unmapped"):
        shell("mv {tmpdir}/output_unmapped.{in_fmt} {out_unmapped_fq}")