.. _`bio/gdc-api/bam-slicing`: GDC API-BASED DATA DOWNLOAD OF BAM SLICES ========================================= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gdc-api/bam-slicing?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gdc-api/bam-slicing Download slices of GDC BAM files using `curl `_ and the `GDC API for BAM Slicing `_. Example ------- This wrapper can be used in the following way: .. code-block:: python rule gdc_api_bam_slice_download: output: bam="raw/{sample}.bam", log: "logs/gdc-api/bam-slicing/{sample}.log" params: # to use this rule flexibly, make uuid a function that maps your # sample names of choice to the UUIDs they correspond to (they are # the column `id` in the GDC manifest files, which can be used to # systematically construct sample sheets) uuid="092c8a6d-aad5-41bf-b186-e68e613c0e89", # a gdc_token is required for controlled access and all BAM files # on GDC seem to be controlled access (adjust if this changes) gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt", # provide wanted `region=` or `gencode=` slices joined with `&` slices="region=chr22®ion=chr5:1000-2000®ion=unmapped&gencode=BRCA2", # extra command line arguments passed to curl extra="" wrapper: "v3.0.1/bio/gdc-api/bam-slicing" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- - BAM file UUIDs can be found via the `GDC repository query `_, either by clicking on individual files or systematically by creating a cart and downloading a manifest file. - Slicing can be performed using `region syntax like 'region=chr20:3000-4000' `_, `gene name syntax like 'gencode=BRCA2' `_ (this uses `Gene symbols of GENCODE v22 `_) or `'region=unmapped' to get unmapped reads `_. Multiple such entries can be joined with ampersands (e.g. ``region=chr5:200-300®ion=unmapped&gencode=BRCA1``). - All BAM data files in GDC are controlled access according to `this GDC repository query `_, thus a GDC access token file is always required and must be provided via ``params: gdc_token: "path/to/access_token.txt"``. Should this change in the future, feel free to adjust this wrapper or contact the original author. Software dependencies --------------------- * ``curl=8.4.0`` Authors ------- * David Lähnemann Code ---- .. code-block:: python __author__ = "David Lähnemann" __copyright__ = "Copyright 2020, David Lähnemann" __email__ = "david.laehnemann@uni-due.de" __license__ = "MIT" from snakemake.shell import shell import os log = snakemake.log_fmt_shell(stdout=True, stderr=True) uuid = snakemake.params.get("uuid", "") if uuid == "": raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.") token_file = snakemake.params.get("gdc_token", "") if token_file == "": raise ValueError( "You need to provide a GDC data access token file via the 'token' in 'params'." ) token = "" with open(token_file) as tf: token = tf.read() os.environ["CURL_HEADER_TOKEN"] = "'X-Auth-Token: {}'".format(token) slices = snakemake.params.get("slices", "") if slices == "": raise ValueError( "You need to provide 'region=chr1:1000-2000' or 'gencode=BRCA2' slice(s) via the 'slices' in 'params'." ) extra = snakemake.params.get("extra", "") shell( "curl --silent" " --header $CURL_HEADER_TOKEN" " 'https://api.gdc.cancer.gov/slicing/view/{uuid}?{slices}'" " {extra}" " --output {snakemake.output.bam} {log}" ) if os.path.getsize(snakemake.output.bam) < 100000: with open(snakemake.output.bam) as f: if "error" in f.read(): shell("cat {snakemake.output.bam} {log}") raise RuntimeError( "Your GDC API request returned an error, check your log file for the error message." ) .. |nl| raw:: html