EMU COLLAPSE-TAXONOMY

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/emu/collapse-taxonomy?label=version%20update%20pull%20requests

Collapse a TSV output file generated with emu at the desired taxonomic rank.

URL: https://github.com/treangenlab/emu

Example

This wrapper can be used in the following way:

rule collapse_taxonomy:
    input:
        "full_length_rel-abundance.tsv",
    output:
        "full_length_rel-abundance_collapsed.tsv",
    log:
        "logs/emu/full_length_collapsed.log",
    params:
        rank="genus",
    wrapper:
        "v3.9.0-14-g476823b/bio/emu/collapse-taxonomy"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • emu=3.4.5

Input/Output

Input:

  • A TSV output file generated with emu.

Output:

  • A TSV output file collapsed at the desired taxonomic rank.

Params

  • rank: Accepted ranks are ‘species’, ‘genus’, ‘family’, ‘order’, ‘class’, ‘phylum’ and ‘superkingdom’. If ommited, agglomeration will be done at the species level.

Authors

  • Curro Campuzano

Code

__author__ = "Curro Campuzano Jimenez"
__copyright__ = "Copyright 2024, Curro Campuzano Jimenez"
__email__ = "campuzanocurro@gmail.com"
__license__ = "MIT"


from snakemake.shell import shell
import tempfile
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

input_file = snakemake.input[0]
output_file = snakemake.output[0]
rank = snakemake.params.get("rank", "species")

with tempfile.TemporaryDirectory() as tmpdir:
    # Resolve the symbolic link and get the actual path of the input file
    input_file_path = os.path.realpath(input_file)
    # Create a symlink of the input file in the temporary directory
    symlink_path = os.path.join(tmpdir, os.path.basename(input_file_path))
    os.symlink(input_file_path, symlink_path)
    shell("emu collapse-taxonomy {symlink_path} {rank} {log}")
    # Get the input file name without extension
    name = os.path.splitext(os.path.basename(input_file_path))[0]
    temp_out = f"{tmpdir}/{name}-{rank}.tsv"  # it is always a tsv
    shell("mv {temp_out} {output_file}")