REFGENIE

Deploy biomedical reference datasets via refgenie.

Example

This wrapper can be used in the following way:

rule obtain_asset:
    output:
        # the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
        fai="refs/genome.fasta"
        # Multiple outputs/seek keys are possible here.
    params:
        genome="human_alu",
        asset="fasta",
        tag="default"
    wrapper:
        "0.75.0-7-g74e079c/bio/refgenie"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • refgenie=0.9.2
  • refgenconf=0.9.0

Authors

  • Johannes Köster

Code

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2019, Johannes Köster"
__email__ = "johannes.koester@uni-due.de"
__license__ = "MIT"

import os
import refgenconf

genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag

conf_path = os.environ["REFGENIE"]

rgc = refgenconf.RefGenConf(conf_path, writable=True)

# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)

for seek_key, out in snakemake.output.items():
    path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
    os.symlink(path, out)