EMU ABUNDANCE
Generate relative abundance estimates from ONT, Pac-Bio or short 16S reads using emu.
URL: https://github.com/treangenlab/emu
Example
This wrapper can be used in the following way:
rule abundance:
input:
reads="{sample}.fa",
db="database",
output:
abundances="{sample}_rel-abundance.tsv",
alignments="{sample}_emu_alignments.sam",
unclassified="{sample}_unclassified.fas",
unmapped="{sample}_unmapped.fas",
log:
"logs/emu/{sample}_abundance.log",
params:
extra="--type map-ont --keep-counts",
threads: 3 # optional, defaults to 1
wrapper:
"v5.0.0/bio/emu/abundance"
rule abundance_paired:
input:
reads=["{sample}_R1.fq", "{sample}_R2.fq"],
db="database",
output:
abundances="{sample}_rel-abundance_paired.tsv",
alignments="{sample}_emu_alignments_paired.sam",
unclassified="{sample}_unclassified_paired.fq",
unmapped="{sample}_unmapped_paired.fq",
log:
"logs/emu/{sample}_abundance_paired.log",
params:
extra="--type sr --keep-counts",
threads: 3 # optional, defaults to 1
wrapper:
"v5.0.0/bio/emu/abundance"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
emu=3.5.0
Input/Output
Input:
reads
: single FASTA file or paired FASTQ filesdb
: emu database (optional; check documentation for pre-built databases and how to build them).
Output:
abundances
: TSV with relative (and optionally, absolute abundances).alignments
: SAM file with the alignments (optional).unclassified
: FASTA/Q file with unclassified sequences (optional).unmapped
: FASTA/Q file with unmapped sequences (optional).
Params
extra
: Any optimal parameter such as –type (sequencer) or –min-abundance. Optional flags involving output are handled automatically (e.g. –output-dir, –output-basename …)
Code
__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
# Infer format of input file
in_fmt = "fasta"
if isinstance(snakemake.input.reads, list) and len(snakemake.input.reads) == 2:
in_fmt = "fastq"
if db := snakemake.input.get("db", ""):
db = f"--db {db}"
with tempfile.TemporaryDirectory() as tmpdir:
shell(
"emu abundance {snakemake.input.reads} {db}"
" --keep-files --output-dir {tmpdir}"
" --output-basename output --output-unclassified"
" --threads {snakemake.threads}"
" {extra}"
" {log}"
)
if out_tsv := snakemake.output.get("abundances"):
shell("mv {tmpdir}/output_rel-abundance.tsv {out_tsv}")
if out_sam := snakemake.output.get("alignments"):
shell("mv {tmpdir}/output_emu_alignments.sam {out_sam}")
if out_unclassified_fq := snakemake.output.get("unclassified"):
shell("mv {tmpdir}/output_unclassified_mapped.{in_fmt} {out_unclassified_fq}")
if out_unmapped_fq := snakemake.output.get("unmapped"):
shell("mv {tmpdir}/output_unmapped.{in_fmt} {out_unmapped_fq}")