EMU COMBINE-OUTPUTS
Collapse individual abundance tables TSV into a single TSV at the desired taxonomic rank.
URL: https://github.com/treangenlab/emu
Example
This wrapper can be used in the following way:
rule combine_outputs:
input:
expand("{sample}_rel-abundance.tsv", sample=["sample1", "sample2"]),
output:
abundances=ensure("combined_abundances.tsv", non_empty=True),
log:
"logs/emu/combined_abundances.log",
wrapper:
"v4.6.0-24-g250dd3e/bio/emu/combine-outputs"
rule combine_outputs_split:
input:
expand("{sample}.txt", sample=["sample1", "sample2"]),
output:
abundances = ensure("counts.tsv", non_empty=True),
taxonomy = ensure("taxonomy.tsv", non_empty=True),
log:
"logs/emu/combined_split.log",
params:
rank="genus",
extra="--counts",
wrapper:
"v4.6.0-24-g250dd3e/bio/emu/combine-outputs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
emu=3.5.0
Input/Output
Input:
A list of TSV files obtained with emu abundance.
Output:
abundances
: TSV file containing the abundance of different taxa.taxonomy
: TSV file containing the taxonomy (optional; otherwise, taxonomy will be included in the abundance table).
Params
rank
: Accepted ranks are ‘tax_id’, ‘species’, ‘genus’, ‘family’, ‘order’, ‘class’, ‘phylum’ and ‘superkingdom’. If ommited, no agglomeration will be done (that is, the default is ‘tax_id’).extra
: Extra arguments (such as ‘–counts’).
Code
__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")
taxonomy = snakemake.output.get("taxonomy", "")
abundances = snakemake.output.get("abundances", "")
if taxonomy and abundances:
split = True
extra += " --split-tables"
else:
split = False
rank = snakemake.params.get("rank", "tax_id")
counts = "--counts" in extra
with tempfile.TemporaryDirectory() as tmpdir:
for infile in snakemake.input:
# Files has to end in tsv, and contain rel_abundances
temp_basename = os.path.basename(infile)
if not temp_basename.endswith("_rel-abundance.tsv"):
temp_basename = os.path.splitext(infile)[0] + "_rel-abundance.tsv"
temp = os.path.join(tmpdir, temp_basename)
os.link(infile, temp)
shell("emu combine-outputs {tmpdir} {rank} {extra} {log}")
if split and counts:
shell("mv {tmpdir}/emu-combined-taxonomy-{rank}.tsv {taxonomy}")
shell("mv {tmpdir}/emu-combined-abundance-{rank}-counts.tsv {abundances}")
elif split and not counts:
shell("mv {tmpdir}/emu-combined-taxonomy-{rank}.tsv {taxonomy}")
shell("mv {tmpdir}/emu-combined-abundance-{rank}.tsv {abundances}")
elif not split and counts:
shell("mv {tmpdir}/emu-combined-{rank}-counts.tsv {abundances}")
elif not split and not counts:
shell("mv {tmpdir}/emu-combined-{rank}.tsv {abundances}")