EMU COMBINE-OUTPUTS

Collapse individual abundance tables TSV into a single TSV at the desired taxonomic rank.

URL: https://github.com/treangenlab/emu

Example

This wrapper can be used in the following way:

rule combine_outputs:
    input:
        expand("{sample}_rel-abundance.tsv", sample=["sample1", "sample2"]),
    output:
        abundances=ensure("combined_abundances.tsv", non_empty=True),
    log:
        "logs/emu/combined_abundances.log",
    wrapper:
        "v3.9.0-14-g476823b/bio/emu/combine-outputs"


rule combine_outputs_split:
    input:
        expand("{sample}.txt", sample=["sample1", "sample2"]),
    output:
        abundances = ensure("counts.tsv", non_empty=True),
        taxonomy = ensure("taxonomy.tsv", non_empty=True),
    log:
        "logs/emu/combined_split.log",
    params:
        rank="genus",
        extra="--counts",
    wrapper:
        "v3.9.0-14-g476823b/bio/emu/combine-outputs"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

emu=3.4.5

Input/Output

Input:

A list of TSV files obtained with emu abundance.

Output:

abundances: TSV file containing the abundance of different taxa.
taxonomy: TSV file containing the taxonomy (optional; otherwise, taxonomy will be included in the abundance table).

Params

rank: Accepted ranks are ‘tax_id’, ‘species’, ‘genus’, ‘family’, ‘order’, ‘class’, ‘phylum’ and ‘superkingdom’. If ommited, no agglomeration will be done (that is, the default is ‘tax_id’).
extra: Extra arguments (such as ‘–counts’).

Authors

Curro Campuzano

Code

__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import tempfile
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

taxonomy = snakemake.output.get("taxonomy", "")
abundances = snakemake.output.get("abundances", "")
if taxonomy and abundances:
    split = True
    extra += " --split-tables"
else:
    split = False

rank = snakemake.params.get("rank", "tax_id")
counts = "--counts" in extra


with tempfile.TemporaryDirectory() as tmpdir:
    for infile in snakemake.input:
        # Files has to end in tsv, and contain rel_abundances
        temp_basename = os.path.basename(infile)
        if not temp_basename.endswith("_rel-abundance.tsv"):
            temp_basename = os.path.splitext(infile)[0] + "_rel-abundance.tsv"
        temp = os.path.join(tmpdir, temp_basename)
        os.link(infile, temp)
    shell("emu combine-outputs {tmpdir} {rank} {extra} {log}")
    if split and counts:
        shell("mv {tmpdir}/emu-combined-taxonomy-{rank}.tsv {taxonomy}")
        shell("mv {tmpdir}/emu-combined-abundance-{rank}-counts.tsv {abundances}")
    elif split and not counts:
        shell("mv {tmpdir}/emu-combined-taxonomy-{rank}.tsv {taxonomy}")
        shell("mv {tmpdir}/emu-combined-abundance-{rank}.tsv {abundances}")
    elif not split and counts:
        shell("mv {tmpdir}/emu-combined-{rank}-counts.tsv {abundances}")
    elif not split and not counts:
        shell("mv {tmpdir}/emu-combined-{rank}.tsv {abundances}")