EMU COLLAPSE-TAXONOMY
Collapse a TSV output file generated with emu at the desired taxonomic rank.
URL: https://github.com/treangenlab/emu
Example
This wrapper can be used in the following way:
rule collapse_taxonomy:
input:
"full_length_rel-abundance.tsv",
output:
"full_length_rel-abundance_collapsed.tsv",
log:
"logs/emu/full_length_collapsed.log",
params:
rank="genus",
wrapper:
"v5.0.1/bio/emu/collapse-taxonomy"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
emu=3.5.0
Input/Output
Input:
A TSV output file generated with emu.
Output:
A TSV output file collapsed at the desired taxonomic rank.
Params
rank
: Accepted ranks are ‘species’, ‘genus’, ‘family’, ‘order’, ‘class’, ‘phylum’ and ‘superkingdom’. If ommited, agglomeration will be done at the species level.
Code
__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
import tempfile
import os
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
input_file = snakemake.input[0]
output_file = snakemake.output[0]
rank = snakemake.params.get("rank", "species")
with tempfile.TemporaryDirectory() as tmpdir:
# Resolve the symbolic link and get the actual path of the input file
input_file_path = os.path.realpath(input_file)
# Create a symlink of the input file in the temporary directory
symlink_path = os.path.join(tmpdir, os.path.basename(input_file_path))
os.symlink(input_file_path, symlink_path)
shell("emu collapse-taxonomy {symlink_path} {rank} {log}")
# Get the input file name without extension
name = os.path.splitext(os.path.basename(input_file_path))[0]
temp_out = f"{tmpdir}/{name}-{rank}.tsv" # it is always a tsv
shell("mv {temp_out} {output_file}")