MMSEQS2 DB
ultra fast and sensitive sequence search and clustering suite
URL: https://github.com/soedinglab/mmseqs2
Example
This wrapper can be used in the following way:
rule mmseqs2_databases:
output:
db=multiext(
"out/databases/{sample}",
"",
".dbtype",
".index",
".lookup",
".source",
".version",
"_h",
"_h.dbtype",
"_h.index",
"_mapping",
"_taxonomy",
),
log:
"logs/databases/{sample}.log",
params:
module="databases SILVA",
extra="-v 3",
threads: 1
wrapper:
"v9.3.0/bio/mmseqs2/db"
rule mmseqs2_createdb:
input:
fas="seqs/{sample}.fasta",
output:
db=multiext(
"out/createdb/{sample}",
"",
".dbtype",
".index",
".lookup",
".source",
"_h",
"_h.dbtype",
"_h.index",
),
log:
"logs/createdb/{sample}.log",
params:
module="createdb",
extra="-v 3 --dbtype 2",
threads: 1
wrapper:
"v9.3.0/bio/mmseqs2/db"
rule mmseqs2_createtaxdb:
input:
db="out/createdb/{sample}/",
tax_dump=storage.http("http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz"),
tax_map="seqs/{sample}.map",
output:
# Touch empty file since createtaxdb edits DB in place.
# Make sure to include this file as input in rules that need the annotated database.
touch("out/createtaxdb/{sample}.done"),
log:
"logs/createtaxdb/{sample}.log",
params:
module="createtaxdb",
extra="-v 3",
threads: 1
wrapper:
"v9.3.0/bio/mmseqs2/db"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
mmseqs2=18.8cc5c
Input/Output
Input:
input FAS file
Output:
output: DB files
Params
module: workflow to use; it can either be database, or createdb/createtaxdbextra: additional program arguments
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2024, Filipe G. Vieira"
__license__ = "MIT"
import os
import tempfile
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
# Input
if snakemake.input.get("tax_map"):
extra += f" --tax-mapping-file {snakemake.input.tax_map}"
taxdump = snakemake.input.get("taxdump")
# Output
out = snakemake.output.get("db")
if isinstance(out, list):
out = os.path.commonprefix(out)
with tempfile.TemporaryDirectory() as tmpdir:
# Modules with threads
if snakemake.params.module.startswith("databases "):
input = ""
extra += f" --threads {snakemake.threads}"
# Modules with no temp folder
elif snakemake.params.module == "createdb":
input = snakemake.input.fas
tmpdir = ""
# Modules with no out folder
elif snakemake.params.module == "createtaxdb":
input = snakemake.input.db
out = ""
else:
raise ValueError(
f"The module specified under 'params' is invalid: '{snakemake.params.module}'."
)
# Auto-uncompress taxdump file
if taxdump:
if taxdump.endswith(".tar.gz"):
import tarfile
tar = tarfile.open(taxdump, "r:gz")
taxdump = tmpdir / "taxdump"
tar.extractall(taxdump)
tar.close()
extra += f" --ncbi-tax-dump {taxdump}"
shell("mmseqs {snakemake.params.module} {input} {out} {tmpdir} {extra} {log}")