MMSEQS2 DB

https://img.shields.io/badge/wrapper_version-v9.3.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/mmseqs2/db?label=version%20update%20pull%20requests&color=1cb481

ultra fast and sensitive sequence search and clustering suite

URL: https://github.com/soedinglab/mmseqs2

Example

This wrapper can be used in the following way:

rule mmseqs2_databases:
    output:
        db=multiext(
            "out/databases/{sample}",
            "",
            ".dbtype",
            ".index",
            ".lookup",
            ".source",
            ".version",
            "_h",
            "_h.dbtype",
            "_h.index",
            "_mapping",
            "_taxonomy",
        ),
    log:
        "logs/databases/{sample}.log",
    params:
        module="databases SILVA",
        extra="-v 3",
    threads: 1
    wrapper:
        "v9.3.0/bio/mmseqs2/db"


rule mmseqs2_createdb:
    input:
        fas="seqs/{sample}.fasta",
    output:
        db=multiext(
            "out/createdb/{sample}",
            "",
            ".dbtype",
            ".index",
            ".lookup",
            ".source",
            "_h",
            "_h.dbtype",
            "_h.index",
        ),
    log:
        "logs/createdb/{sample}.log",
    params:
        module="createdb",
        extra="-v 3 --dbtype 2",
    threads: 1
    wrapper:
        "v9.3.0/bio/mmseqs2/db"


rule mmseqs2_createtaxdb:
    input:
        db="out/createdb/{sample}/",
        tax_dump=storage.http("http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz"),
        tax_map="seqs/{sample}.map",
    output:
        # Touch empty file since createtaxdb edits DB in place.
        # Make sure to include this file as input in rules that need the annotated database.
        touch("out/createtaxdb/{sample}.done"),
    log:
        "logs/createtaxdb/{sample}.log",
    params:
        module="createtaxdb",
        extra="-v 3",
    threads: 1
    wrapper:
        "v9.3.0/bio/mmseqs2/db"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • mmseqs2=18.8cc5c

Input/Output

Input:

  • input FAS file

Output:

  • output: DB files

Params

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


# Input
if snakemake.input.get("tax_map"):
    extra += f" --tax-mapping-file {snakemake.input.tax_map}"
taxdump = snakemake.input.get("taxdump")


# Output
out = snakemake.output.get("db")
if isinstance(out, list):
    out = os.path.commonprefix(out)


with tempfile.TemporaryDirectory() as tmpdir:
    # Modules with threads
    if snakemake.params.module.startswith("databases "):
        input = ""
        extra += f" --threads {snakemake.threads}"
    # Modules with no temp folder
    elif snakemake.params.module == "createdb":
        input = snakemake.input.fas
        tmpdir = ""
    # Modules with no out folder
    elif snakemake.params.module == "createtaxdb":
        input = snakemake.input.db
        out = ""
    else:
        raise ValueError(
            f"The module specified under 'params' is invalid: '{snakemake.params.module}'."
        )

    # Auto-uncompress taxdump file
    if taxdump:
        if taxdump.endswith(".tar.gz"):
            import tarfile

            tar = tarfile.open(taxdump, "r:gz")
            taxdump = tmpdir / "taxdump"
            tar.extractall(taxdump)
            tar.close()
        extra += f" --ncbi-tax-dump {taxdump}"

    shell("mmseqs {snakemake.params.module} {input} {out} {tmpdir} {extra} {log}")