GENOMEPY

Download genomes the easy way: https://github.com/vanheeringen-lab/genomepy

URL:

Example

This wrapper can be used in the following way:

rule genomepy:
    output:
        multiext("{assembly}/{assembly}", ".fa", ".fa.fai", ".fa.sizes", ".gaps.bed",
                 ".annotation.gtf.gz", ".blacklist.bed")
    log:
        "logs/genomepy_{assembly}.log"
    params:
        provider="UCSC"  # optional, defaults to ucsc. Choose from ucsc, ensembl, and ncbi
    cache: True  # mark as eligible for between workflow caching
    wrapper:
        "v1.1.0/bio/genomepy"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bioconda::genomepy==0.8.3

Params

  • provider: which provider to download from, defaults to UCSC (choose from UCSC, Ensembl, NCBI).

Authors

  • Maarten van der Sande

Code

__author__ = "Maarten van der Sande"
__copyright__ = "Copyright 2020, Maarten van der Sande"
__email__ = "M.vanderSande@science.ru.nl"
__license__ = "MIT"


from snakemake.shell import shell

# Optional parameters
provider = snakemake.params.get("provider", "UCSC")

# set options for plugins
all_plugins = "blacklist,bowtie2,bwa,gmap,hisat2,minimap2,star"
req_plugins = ","
if any(["blacklist" in out for out in snakemake.output]):
    req_plugins = "blacklist,"

annotation = ""
if any(["annotation" in out for out in snakemake.output]):
    annotation = "--annotation"

# parse the genome dir
genome_dir = "./"
if snakemake.output[0].count("/") > 1:
    genome_dir = "/".join(snakemake.output[0].split("/")[:-1])

log = snakemake.log

# Finally execute genomepy
shell(
    """
    # set a trap so we can reset to original user's settings
    active_plugins=$(genomepy config show | grep -Po '(?<=- ).*' | paste -s -d, -) || echo ""
    trap "genomepy plugin disable {{{all_plugins}}} >> {log} 2>&1;\
          genomepy plugin enable {{$active_plugins,}} >> {log} 2>&1" EXIT

    # disable all, then enable the ones we need
    genomepy plugin disable {{{all_plugins}}} >  {log} 2>&1
    genomepy plugin enable  {{{req_plugins}}} >> {log} 2>&1

    # install the genome
    genomepy install {snakemake.wildcards.assembly} \
    {provider} {annotation} -g {genome_dir} >> {log} 2>&1
    """
)