EGA FETCH

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/ega/fetch?label=version%20update%20pull%20requests

Fetch files from EGA with pyega3.

URL: https://github.com/EGA-archive/ega-download-client

Example

This wrapper can be used in the following way:

rule download_file:
    output:
        "data/{egafile}.cram",
    log:
        "logs/ega/fetch/{egafile}.log",
    params:
        fileid=lambda wildcards: wildcards.egafile,
        extra_pyega3="-t", # optional extra args for pyega3
        extra_fetch="",  # optional extra args for the fetch subcommand
    wrapper:
        "v3.9.0/bio/ega/fetch"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • pyega3=5.2.0

Authors

  • Johannes Köster

Code

from pathlib import Path
import shlex
import shutil
import subprocess as sp
import sys
import tempfile

if snakemake.log:
    sys.stderr = open(snakemake.log[0], "w")

fileid = snakemake.params.fileid

fmt = Path(snakemake.output[0]).suffix[1:].upper()

extra_pyega3 = shlex.split(snakemake.params.get("extra_pyega3", ""))
extra_fetch = shlex.split(snakemake.params.get("extra_fetch", ""))

with tempfile.TemporaryDirectory() as tmpdir:
    cmd = (
        ["pyega3"]
        + extra_pyega3
        + ["fetch", "--output-dir", tmpdir, "--format", fmt, fileid]
        + extra_fetch
    )
    sp.run(
        cmd,
        stdout=sys.stderr,
        stderr=sp.STDOUT,
        check=True,
    )
    # obtain path to the downloaded file (it should be the only file with that
    # extension in the temp dir)
    glob_res = list((Path(tmpdir) / fileid).glob(f"*.{fmt.lower()}"))
    assert (
        len(glob_res) == 1
    ), "bug: more than one file with desired extension downloaded by pyega3"

    # Move the file to the output
    shutil.move(glob_res[0], snakemake.output[0])