VEP DOWNLOAD CACHE
Download VEP cache for given species, build and release.
URL: http://www.ensembl.org/info/docs/tools/vep/index.html
Example
This wrapper can be used in the following way:
rule get_vep_cache:
output:
directory("resources/vep/cache"),
params:
species="saccharomyces_cerevisiae",
build="R64-1-1",
release="98",
log:
"logs/vep/cache.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v7.6.0/bio/vep/cache"
rule get_indexed_vep_cache:
output:
directory("resources/vep/indexed_cache"),
params:
species="saccharomyces_cerevisiae",
build="R64-1-1",
release="98",
indexed=True,
log:
"logs/vep/indexed_cache.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v7.6.0/bio/vep/cache"
rule get_vep_cache_ebi:
output:
directory("resources/vep/cache_ebi"),
params:
url="ftp://ftp.ebi.ac.uk/ensemblgenomes/pub/plants",
species="cyanidioschyzon_merolae",
build="ASM9120v1",
release="58",
log:
"logs/vep/cache_ebi.log",
cache: "omit-software" # save space and time with between workflow caching (see docs)
wrapper:
"v7.6.0/bio/vep/cache"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
ensembl-vep=115.2
Params
url: URL from where to download cache data (optional; by default isftp://ftp.ensembl.org/pub)species: species to download cache databuild: build to download cache datarelease: release to download cache dataindexed: whether to download an already indexed cache (default isTrue)
Code
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2023, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess as sp
import tempfile
from pathlib import Path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
try:
release = int(snakemake.params.release)
except ValueError:
raise ValueError("The parameter release is supposed to be an integer.")
with tempfile.TemporaryDirectory() as tmpdir:
# We download the cache tarball manually because vep_install does not consider proxy settings (in contrast to curl).
# See https://github.com/bcbio/bcbio-nextgen/issues/1080
user_url = snakemake.params.get("url", "ftp.ensembl.org/pub")
cache_tarball = (
f"{snakemake.params.species}_vep_{release}_{snakemake.params.build}.tar.gz"
)
if snakemake.params.get("indexed", True):
vep_dir = "indexed_vep_cache"
convert = ""
else:
if release >= 114:
raise ValueError(
"Releases >= 114 are only supported for indexed VEP caches."
)
vep_dir = "vep" if snakemake.params.get("url") or release >= 97 else "VEP"
convert = "--CONVERT "
if user_url.startswith("https://"):
url_https = f"{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
url_ftp = url_https.replace("https://", "ftp://")
elif user_url.startswith("ftp://"):
url_ftp = f"{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
url_https = url_ftp.replace("ftp://", "https://")
else:
url_https = (
f"https://{user_url}/release-{release}/variation/{vep_dir}/{cache_tarball}"
)
url_ftp = url_https.replace("https://", "ftp://")
try:
shell(f"curl --fail -L {url_https} -o {tmpdir}/{cache_tarball} {log}")
shell(f"gzip -t {tmpdir}/{cache_tarball}")
except sp.CalledProcessError:
shell(f"curl -L {url_ftp} -o {tmpdir}/{cache_tarball} {log}")
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
shell(
"vep_install --AUTO c "
"--SPECIES {snakemake.params.species} "
"--ASSEMBLY {snakemake.params.build} "
"--CACHE_VERSION {release} "
"--CACHEURL {tmpdir} "
"--CACHEDIR {snakemake.output} "
"{convert}"
"--NO_UPDATE "
"{extra} {log}"
)