GDC API-BASED DATA DOWNLOAD OF BAM SLICES

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gdc-api/bam-slicing?label=version%20update%20pull%20requests

Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.

Example

This wrapper can be used in the following way:

rule gdc_api_bam_slice_download:
    output:
        bam="raw/{sample}.bam",
    log:
        "logs/gdc-api/bam-slicing/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
        # a gdc_token is required for controlled access and all BAM files
        # on GDC seem to be controlled access (adjust if this changes)
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # provide wanted `region=` or `gencode=` slices joined with `&`
        slices="region=chr22&region=chr5:1000-2000&region=unmapped&gencode=BRCA2",
        # extra command line arguments passed to curl
        extra=""
    wrapper:
        "v3.8.0-49-g6f33607/bio/gdc-api/bam-slicing"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Software dependencies

  • curl=8.7.1

Authors

  • David Lähnemann

Code

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
    raise ValueError(
        "You need to provide a GDC data access token file via the 'token' in 'params'."
    )
token = ""
with open(token_file) as tf:
    token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)

slices = snakemake.params.get("slices", "")
if slices == "":
    raise ValueError(
        "You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s)  via the 'slices' in 'params'."
    )

extra = snakemake.params.get("extra", "")

shell(
    "curl --silent"
    " --header $CURL_HEADER_TOKEN"
    " 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
    " {extra}"
    " --output {snakemake.output.bam} {log}"
)

if os.path.getsize(snakemake.output.bam) < 100000:
    with open(snakemake.output.bam) as f:
        if "error" in f.read():
            shell("cat {snakemake.output.bam} {log}")
            raise RuntimeError(
                "Your GDC API request returned an error, check your log file for the error message."
            )