VG AUTOINDEX GIRAFFE

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/vg/autoindex?label=version%20update%20pull%20requests

Creates index files for mapping reads via vg giraffe.

URL: https://github.com/vgteam/vg

Example

This wrapper can be used in the following way:

rule vg_autoindex_giraffe:
    input:
        ref="{genome}.fasta",
        # Optional vcf file containing haplotypes for graph creation
        #vcf="",
    output:
        multiext("resources/{genome}", ".dist", ".min", ".giraffe.gbz"),
    log:
        "logs/autoindex/{genome}.log",
    params:
        extra="",
    threads: 8
    wrapper:
        "v5.5.2-17-g33d5b76/bio/vg/autoindex"


rule vg_autoindex_map:
    input:
        ref="{genome}.fasta",
        # Optional vcf file containing haplotypes for graph creation
        #vcf="",
    output:
        multiext("resources/{genome}", ".xg", ".gcsa", ".gcsa.lcp"),
    log:
        "logs/autoindex/{genome}.log",
    params:
        extra="",
    threads: 8
    wrapper:
        "v5.5.2-17-g33d5b76/bio/vg/autoindex"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The extra param allows for additional arguments for vg autoindex.

Software dependencies

  • vg=1.61.0

Input/Output

Input:

  • ref: FASTA reference

  • vcf: haplotypes in vcf format (optional)

Output:

  • .giraffe.gbz, .dist and .min-file

Authors

  • Felix Mölder

Code

__author__ = "Felix Mölder"
__copyright__ = "Copyright 2024, Felix Mölder"
__email__ = "felix.moelder@uk-essen.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

# Extract arguments.
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

prefix_path = path.commonprefix(snakemake.output).rstrip(".")
vcf_cmd = f"-v {snakemake.input.vcf}" if snakemake.input.get("vcf") else ""


def contains_ext(extensions):
    return all(
        any(out_file.endswith(ext) for out_file in snakemake.output)
        for ext in extensions
    )


if any(map(lambda x: "giraffe" in x, snakemake.output)):
    workflow_cmd = "giraffe"
elif contains_ext([".xg", ".gcsa.lcp", ".gcsa"]):
    workflow_cmd = "map"
else:
    raise ValueError(
        "Output files do not match any supported indexing workflows. Currently only map and giraffe are supported."
    )

shell(
    "(vg autoindex"
    " -w {workflow_cmd}"
    " -p {prefix_path}"
    " -r {snakemake.input.ref}"
    " {vcf_cmd}"
    " -t {snakemake.threads}"
    " {extra}) {log}"
)