MEHARI BUILD TRANSCRIPT DB
Build a transcript database for mehari.
URL: https://github.com/varfish-org/mehari
Example
This wrapper can be used in the following way:
rule mehari_build_transcript_database:
input:
annotation="resources/{prefix}.gff3.gz",
sequences="resources/{prefix}.cdna.fasta",
output:
db="{prefix}.bin.zst",
log:
"logs/mehari/build_transcript_db/{prefix}.log",
threads: 4
params:
assembly="GRCh38",
assembly_version="GRCh38.p14",
transcript_source="Ensembl",
transcript_source_version="115",
annotation_version="115",
wrapper:
"v9.8.0/bio/mehari/build-transcript-db"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
mehari=0.43.2snakemake-wrapper-utils=0.8.0
Input/Output
Input:
annotation
sequences
Output:
db
Params
assembly: Assembly name, e.g., “GRCh38”.assembly_version: Assembly version, e.g., “GRCh38.p14”.annotation_version: Version of the annotation.transcript_source: Source of the transcript sequences, e.g., “Ensembl” or “RefSeq”.transcript_source_version: Version of the transcript sequences, e.g., “115”.extra: Extra arguments for the mehari db create invocation.
Code
__author__ = "Till Hartmann"
__copyright__ = "Copyright 2025, Till Hartmann"
__email__ = "till.hartmann@bih-charite.de"
__license__ = "MIT"
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_format
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# required inputs and outputs
if not snakemake.input.get("annotation"):
raise ValueError("Input 'annotation' is required but not specified")
if not snakemake.output.get("db"):
raise ValueError("Output 'db' is required but not specified")
sequences = snakemake.input.get("sequences")
if not sequences:
raise ValueError("Input 'sequences' is required but not specified")
if get_format(sequences) == "fasta":
sequences = f"--transcript-sequences {sequences}"
else:
sequences = f"--seqrepo {sequences}"
# required params
if not snakemake.params.get("assembly"):
raise ValueError("Parameter 'assembly' is required but not specified")
if not snakemake.params.get("transcript_source"):
raise ValueError("Parameter 'transcript_source' is required but not specified")
# optional params
assembly_version = snakemake.params.get("assembly_version", "")
if assembly_version:
assembly_version = f"--assembly-version {assembly_version}"
annotation_version = snakemake.params.get("annotation_version", "")
if annotation_version:
annotation_version = f"--annotation-version {annotation_version}"
transcript_source_version = snakemake.params.get("transcript_source_version", "")
if transcript_source_version:
transcript_source_version = (
f"--transcript-source-version {transcript_source_version}"
)
shell(
"mehari db create"
" --threads {snakemake.threads}"
" --annotation {snakemake.input.annotation:q}"
" --assembly {snakemake.params.assembly:q}"
" --transcript-source {snakemake.params.transcript_source:q}"
" {sequences}"
" {assembly_version}"
" {annotation_version}"
" {transcript_source_version}"
" {extra}"
" --output {snakemake.output.db:q}"
" {log}"
)