.. _`bio/gdc-client/download`: GDC DATA TRANSFER TOOL DATA DOWNLOAD ==================================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/gdc-client/download?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/gdc-client/download Download GDC data files with the gdc-client. Example ------- This wrapper can be used in the following way: .. code-block:: python rule gdc_download: output: # the file extension (up to two components, here .maf.gz), has # to uniquely map to one of the files downloaded for that UUID "raw/{sample}.maf.gz" log: "logs/gdc-client/download/{sample}.log" params: # to use this rule flexibly, make uuid a function that maps your # sample names of choice to the UUIDs they correspond to (they are # the column `id` in the GDC manifest files, which can be used to # systematically construct sample sheets) uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb", # a gdc_token is only required for controlled access samples, # leave blank otherwise (`gdc_token=""`) or skip this param entirely gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt", # for valid extra command line arguments, check command line help or: # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/ extra = "" threads: 4 wrapper: "v3.0.1/bio/gdc-client/download" rule gdc_download_bam: output: # specify all the downloaded files you want to keep, as all other # downloaded files will be removed automatically e.g. for # BAM data this could be "raw/{sample}.bam", "raw/{sample}.bam.bai", "raw/{sample}.annotations.txt", directory("raw/{sample}/logs") log: "logs/gdc-client/download/{sample}.log" params: # to use this rule flexibly, make uuid a function that maps your # sample names of choice to the UUIDs they correspond to (they are # the column `id` in the GDC manifest files, which can be used to # systematically construct sample sheets) uuid="34b80c89-c41e-47be-84fb-0c0ea493b5bb", # a gdc_token is only required for controlled access samples, # leave blank otherwise (`gdc_token=""`) or skip this param entirely gdc_token="gdc/gdc-user-token.2020-05-07T10_00_00.555Z.txt", # for valid extra command line arguments, check command line help or: # https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/ extra = "" threads: 4 wrapper: "v3.0.1/bio/gdc-client/download" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``gdc-client=1.6.1`` Authors ------- * David Lähnemann Code ---- .. code-block:: python __author__ = "David Lähnemann" __copyright__ = "Copyright 2020, David Lähnemann" __email__ = "david.laehnemann@uni-due.de" __license__ = "MIT" from snakemake.shell import shell import os.path as path from tempfile import TemporaryDirectory import glob uuid = snakemake.params.get("uuid", "") if uuid == "": raise ValueError("You need to provide a GDC UUID via the 'uuid' in 'params'.") extra = snakemake.params.get("extra", "") token = snakemake.params.get("gdc_token", "") if token != "": token = "--token-file {}".format(token) with TemporaryDirectory() as tempdir: shell( "gdc-client download" " {token}" " {extra}" " -n {snakemake.threads} " " --log-file {snakemake.log} " " --dir {tempdir}" " {uuid}" ) for out_path in snakemake.output: tmp_path = path.join(tempdir, uuid, path.basename(out_path)) if not path.exists(tmp_path): (root, ext1) = path.splitext(out_path) paths = glob.glob(path.join(tempdir, uuid, "*" + ext1)) if len(paths) > 1: (root, ext2) = path.splitext(root) paths = glob.glob(path.join(tempdir, uuid, "*" + ext2 + ext1)) if len(paths) == 0: raise ValueError( "{} file extension {} does not match any downloaded file.\n" "Are you sure that UUID {} provides a file of such format?\n".format( out_path, ext1, uuid ) ) if len(paths) > 1: raise ValueError( "Found more than one downloaded file with extension '{}':\n" "{}\n" "Cannot match requested output file {} unambiguously.\n".format( ext2 + ext1, paths, out_path ) ) tmp_path = paths[0] shell("mv {tmp_path} {out_path}") .. |nl| raw:: html