MEHARI ANNOTATE SEQVARS

https://img.shields.io/badge/wrapper_version-v9.0.1-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/mehari/annotate-seqvars?label=version%20update%20pull%20requests&color=1cb481

Annotate variant calls with mehari.

URL: https://github.com/varfish-org/mehari

Example

This wrapper can be used in the following way:

rule mehari_annotate_seqvars_variants_MT:
    input:
        calls="{prefix}.vcf",  # .vcf, .vcf.gz or .bcf
        ref="resources/MT.fasta",  # has to be uncompressed
        fai="resources/MT.fasta.fai",
        transcript_db="resources/MT-ND2-GRCh38-ensembl-0.10.3.bin.zst",  # transcript database for SO term / consequence annotation
        # clinvar_db="resources/clinvar.bin.zst",  # clinvar database for clinvar VCV annotation
        # frequency_db="resources/frequencies.bin.zst"  # frequencies/gnomad database for frequency annotation
    output:
        calls="{prefix}.annotated.bcf",  # .vcf, .vcf.gz or .bcf
    params:
        extra="",
    log:
        "logs/mehari/mehari_annotate_variants.{prefix}.log",
    wrapper:
        "v9.0.1/bio/mehari/annotate-seqvars"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • mehari=0.39.0

Input/Output

Input:

  • calls

  • ref

  • fai

  • transcript_db

  • clinvar_db

  • frequency_db

Output:

  • calls

Params

  • extra: Extra arguments for the mehari annotate seqvars invocation.

Authors

  • Till Hartmann

Code

__author__ = "Till Hartmann"
__copyright__ = "Copyright 2025, Till Hartmann"
__email__ = "till.hartmann@bih-charite.de"
__license__ = "MIT"

from snakemake.shell import shell
import logging

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

transcript_db = snakemake.input.get("transcript_db", "")
if transcript_db:
    transcript_db = f"--transcripts {transcript_db}"

clinvar_db = snakemake.input.get("clinvar_db", "")
if clinvar_db:
    clinvar_db = f"--clinvar {clinvar_db}"

frequency_db = snakemake.input.get("frequency_db", "")
if frequency_db:
    frequency_db = f"--frequency {frequency_db}"

if not transcript_db and not clinvar_db and not frequency_db:
    raise ValueError(
        "At least one of inputs 'transcript_db', 'clinvar_db' and 'frequency_db' must be specified"
    )

ref = snakemake.input.get("ref", "")
if ref:
    ref = f"--reference {ref}"
    if not snakemake.input.get("fai"):
        raise ValueError("Reference FASTA index must be specified")
else:
    logging.warning(
        "Without reference fasta, cannot do correct HGVS 3' shifting for genomic coordinates."
    )


shell(
    "(mehari annotate seqvars "
    "--path-input-vcf {snakemake.input.calls:q} "
    "{transcript_db} "
    "{clinvar_db} "
    "{frequency_db} "
    "{ref} "
    "{extra} "
    "--path-output-vcf {snakemake.output.calls:q} "
    ") {log}"
)