GDC API-BASED DATA DOWNLOAD OF BAM SLICES

Download slices of GDC BAM files using curl and the GDC API for BAM Slicing.

Example

This wrapper can be used in the following way:

rule gdc_api_bam_slice_download:
    output:
        bam="raw/{sample}.bam",
    log:
        "logs/gdc-api/bam-slicing/{sample}.log"
    params:
        # to use this rule flexibly, make uuid a function that maps your
        # sample names of choice to the UUIDs they correspond to (they are
        # the column `id` in the GDC manifest files, which can be used to
        # systematically construct sample sheets)
        uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89",
        # a gdc_token is required for controlled access and all BAM files
        # on GDC seem to be controlled access (adjust if this changes)
        gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt",
        # provide wanted `region=` or `gencode=` slices joined with `&`
        slices="region=chr22&region=chr5:1000-2000&region=unmapped&gencode=BRCA2",
        # extra command line arguments passed to curl
        extra=""
    wrapper:
        "v1.9.0/bio/gdc-api/bam-slicing"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Software dependencies

  • curl==7.69.1

Authors

  • David Lähnemann

Code

__author__ = "David Lähnemann"
__copyright__ = "Copyright 2020, David Lähnemann"
__email__ = "david.laehnemann@uni-due.de"
__license__ = "MIT"

from snakemake.shell import shell
import os

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

uuid = snakemake.params.get("uuid", "")
if uuid == "":
    raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.")

token_file = snakemake.params.get("gdc_token", "")
if token_file == "":
    raise ValueError(
        "You need to provide a GDC data access token file via the 'token' in 'params'."
    )
token = ""
with open(token_file) as tf:
    token = tf.read()
os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token)

slices = snakemake.params.get("slices", "")
if slices == "":
    raise ValueError(
        "You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s)  via the 'slices' in 'params'."
    )

extra = snakemake.params.get("extra", "")

shell(
    "curl --silent"
    " --header $CURL_HEADER_TOKEN"
    " 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'"
    " {extra}"
    " --output {snakemake.output.bam} {log}"
)

if os.path.getsize(snakemake.output.bam) < 100000:
    with open(snakemake.output.bam) as f:
        if "error" in f.read():
            shell("cat {snakemake.output.bam} {log}")
            raise RuntimeError(
                "Your GDC API request returned an error, check your log file for the error message."
            )