.. _`bio/arriba`: ARRIBA ====== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/arriba?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/arriba Detect gene fusions from chimeric STAR output **URL**: https://github.com/suhrig/arriba Example ------- This wrapper can be used in the following way: .. code-block:: python rule arriba: input: # STAR bam containing chimeric alignments bam="{sample}.bam", # path to reference genome genome="genome.fasta", # path to annotation gtf annotation="annotation.gtf", # optional arriba blacklist file custom_blacklist=[], output: # approved gene fusions fusions="fusions/{sample}.tsv", # discarded gene fusions discarded="fusions/{sample}.discarded.tsv", # optional log: "logs/arriba/{sample}.log", params: # required when blacklist or known_fusions is set genome_build="GRCh38", # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist # only set blacklist input-file or blacklist-param default_blacklist=False, # optional default_known_fusions=True, # optional # file containing information from structural variant analysis sv_file="", # optional # optional parameters extra="-i 1,2", threads: 1 wrapper: "v3.0.1/bio/arriba" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- This tool/wrapper does not handle multi threading. Software dependencies --------------------- * ``arriba=2.4.0`` Input/Output ------------ **Input:** * ``bam``: Path to bam formatted alignment file from STAR * ``genome``: Path to fasta formatted genome sequence * ``annotation``: Path to GTF formatted genome annotation **Output:** * ``fusions``: Path to output fusion file Params ------ * ``known_fusions``: Path to known fusions file, see `official documentation on known fusions `_ for more information. * ``blacklist``: Path to blacklist file, see `official documentation on blacklist `_ for more information. * ``sv_file``: Path to structural variations calls from WGS, see `official documentation on SV `_ for more information. * ``extra``: Other `optional parameters `_ Authors ------- * Jan Forster * Felix Mölder Code ---- .. code-block:: python __author__ = "Jan Forster" __copyright__ = "Copyright 2019, Jan Forster" __email__ = "j.forster@dkfz.de" __license__ = "MIT" import os import json from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) discarded_fusions = snakemake.output.get("discarded", "") if discarded_fusions: discarded_cmd = "-O " + discarded_fusions else: discarded_cmd = "" database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba") build = snakemake.params.get("genome_build", None) blacklist_input = snakemake.input.get("custom_blacklist") default_blacklist = snakemake.params.get("default_blacklist", False) default_known_fusions = snakemake.params.get("default_known_fusions", False) if default_blacklist or default_known_fusions: if not build: raise ValueError( "Please provide a genome build when using blacklist- or known_fusion-filtering" ) arriba_vers = [ entry["version"] for entry in json.load(os.popen("conda list --json")) if entry["name"] == "arriba" ][0] if blacklist_input and not default_blacklist: blacklist_cmd = "-b " + blacklist_input elif not blacklist_input and default_blacklist: blacklist_dict = { "GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz", "GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz", "GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz", "GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz", } blacklist_path = os.path.join(database_dir, blacklist_dict[build]) blacklist_cmd = "-b " + blacklist_path elif not blacklist_input and not default_blacklist: blacklist_cmd = "-f blacklist" else: raise ValueError( "custom_blacklist input file and default_blacklist parameter option defined. Please set only one of both." ) if default_known_fusions: fusions_dict = { "GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz", "GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz", "GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz", "GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz", } known_fusions_path = os.path.join(database_dir, fusions_dict[build]) known_cmd = "-k " + known_fusions_path else: known_cmd = "" sv_file = snakemake.params.get("sv_file") if sv_file: sv_cmd = "-d " + sv_file else: sv_cmd = "" shell( "arriba " "-x {snakemake.input.bam} " "-a {snakemake.input.genome} " "-g {snakemake.input.annotation} " "{blacklist_cmd} " "{known_cmd} " "{sv_cmd} " "-o {snakemake.output.fusions} " "{discarded_cmd} " "{extra} " "{log}" ) .. |nl| raw:: html