ENSEMBL-ANNOTATION¶
Download annotation of genomic sites (e.g. transcripts) from ENSEMBL FTP servers, and store them in a single .gtf or .gff3 file.
URL:
Example¶
This wrapper can be used in the following way:
rule get_annotation:
output:
"refs/annotation.gtf"
params:
species="homo_sapiens",
release="87",
build="GRCh37",
fmt="gtf",
flavor="" # optional, e.g. chr_patch_hapl_scaff, see Ensembl FTP.
log:
"logs/get_annotation.log"
cache: True # save space and time with between workflow caching (see docs)
wrapper:
"0.80.1/bio/reference/ensembl-annotation"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
curl
Authors¶
- Johannes Köster
Code¶
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"
import subprocess
import sys
from snakemake.shell import shell
species = snakemake.params.species.lower()
release = int(snakemake.params.release)
fmt = snakemake.params.fmt
build = snakemake.params.build
flavor = snakemake.params.get("flavor", "")
branch = ""
if release >= 81 and build == "GRCh37":
# use the special grch37 branch for new releases
branch = "grch37/"
if flavor:
flavor += "."
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
suffix = ""
if fmt == "gtf":
suffix = "gtf.gz"
elif fmt == "gff3":
suffix = "gff3.gz"
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{fmt}/{species}/{species_cap}.{build}.{release}.{flavor}{suffix}".format(
release=release,
build=build,
species=species,
fmt=fmt,
species_cap=species.capitalize(),
suffix=suffix,
flavor=flavor,
branch=branch,
)
try:
shell("(curl -L {url} | gzip -d > {snakemake.output[0]}) {log}")
except subprocess.CalledProcessError as e:
if snakemake.log:
sys.stderr = open(snakemake.log[0], "a")
print(
"Unable to download annotation data from Ensembl. "
"Did you check that this combination of species, build, and release is actually provided?",
file=sys.stderr,
)
exit(1)