VEP DOWNLOAD CACHE

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/vep/cache?label=version%20update%20pull%20requests

Download VEP cache for given species, build and release.

URL: http://www.ensembl.org/info/docs/tools/vep/index.html

Example

This wrapper can be used in the following way:

rule get_vep_cache:
    output:
        directory("resources/vep/cache"),
    params:
        species="saccharomyces_cerevisiae",
        build="R64-1-1",
        release="98",
    log:
        "logs/vep/cache.log",
    cache: "omit-software"  # save space and time with between workflow caching (see docs)
    wrapper:
        "v7.3.0/bio/vep/cache"


rule get_indexed_vep_cache:
    output:
        directory("resources/vep/indexed_cache"),
    params:
        species="saccharomyces_cerevisiae",
        build="R64-1-1",
        release="98",
        indexed=True,
    log:
        "logs/vep/indexed_cache.log",
    cache: "omit-software"  # save space and time with between workflow caching (see docs)
    wrapper:
        "v7.3.0/bio/vep/cache"


rule get_vep_cache_ebi:
    output:
        directory("resources/vep/cache_ebi"),
    params:
        url="ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/plants",
        species="cyanidioschyzon_merolae",
        build="ASM9120v1",
        release="58",
    log:
        "logs/vep/cache_ebi.log",
    cache: "omit-software"  # save space and time with between workflow caching (see docs)
    wrapper:
        "v7.3.0/bio/vep/cache"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • ensembl-vep=115.1

Params

  • url: URL from where to download cache data (optional; by default is ftp://ftp.ensembl.org/pub)

  • species: species to download cache data

  • build: build to download cache data

  • release: release to download cache data

  • indexed: whether to download an already indexed cache

Authors

  • Johannes Köster

Code

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2023, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import subprocess as sp
import tempfile
from pathlib import Path
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


try:
    release = int(snakemake.params.release)
except ValueError:
    raise ValueError("The parameter release is supposed to be an integer.")


with tempfile.TemporaryDirectory() as tmpdir:
    # We download the cache tarball manually because vep_install does not consider proxy settings (in contrast to curl).
    # See https://github.com/bcbio/bcbio-nextgen/issues/1080
    user_url = snakemake.params.get("url", "ftp.ensembl.org/pub")
    cache_tarball = (
        f"{snakemake.params.species}_vep_{release}_{snakemake.params.build}.tar.gz"
    )
    if snakemake.params.get("indexed"):
        vep_dir = "indexed_vep_cache"
        convert = ""
    else:
        vep_dir = "vep" if snakemake.params.get("url") or release >= 97 else "VEP"
        convert = "--CONVERT "

    if user_url.startswith("https://"):
        url_https = f"{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
        url_ftp = url_https.replace("https://", "ftp://")
    elif user_url.startswith("ftp://"):
        url_ftp = f"{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
        url_https = url_ftp.replace("ftp://", "https://")
    else:
        url_https = (
            f"https://{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
        )
        url_ftp = url_https.replace("https://", "ftp://")

    try:
        shell(f"curl --fail -L {url_https} -o {tmpdir}/{cache_tarball} {log}")
        shell(f"gzip -t {tmpdir}/{cache_tarball}")
    except sp.CalledProcessError:
        shell(f"curl -L {url_ftp} -o {tmpdir}/{cache_tarball} {log}")

    log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
    shell(
        "vep_install --AUTO c "
        "--SPECIES {snakemake.params.species} "
        "--ASSEMBLY {snakemake.params.build} "
        "--CACHE_VERSION {release} "
        "--CACHEURL {tmpdir} "
        "--CACHEDIR {snakemake.output} "
        "{convert}"
        "--NO_UPDATE "
        "{extra} {log}"
    )