ARRIBA

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/arriba?label=version%20update%20pull%20requests

Detect gene fusions from chimeric STAR output

URL: https://github.com/suhrig/arriba

Example

This wrapper can be used in the following way:

rule arriba:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fasta",
        # path to annotation gtf
        annotation="annotation.gtf",
        # optional arriba blacklist file
        custom_blacklist=[],
    output:
        # approved gene fusions
        fusions="fusions/{sample}.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.discarded.tsv",  # optional
    log:
        "logs/arriba/{sample}.log",
    params:
        # required when blacklist or known_fusions is set
        genome_build="GRCh38",
        # strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
        # only set blacklist input-file or blacklist-param
        default_blacklist=False,  # optional
        default_known_fusions=True,  # optional
        # file containing information from structural variant analysis
        sv_file="",  # optional
        # optional parameters
        extra="-i 1,2",
    threads: 1
    wrapper:
        "v3.6.0-3-gc8272d7/bio/arriba"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

This tool/wrapper does not handle multi threading.

Software dependencies

  • arriba=2.4.0

Input/Output

Input:

  • bam: Path to bam formatted alignment file from STAR

  • genome: Path to fasta formatted genome sequence

  • annotation: Path to GTF formatted genome annotation

Output:

  • fusions: Path to output fusion file

Params

Authors

  • Jan Forster

  • Felix Mölder

Code

__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
import json
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
    discarded_cmd = "-O " + discarded_fusions
else:
    discarded_cmd = ""

database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba")
build = snakemake.params.get("genome_build", None)

blacklist_input = snakemake.input.get("custom_blacklist")
default_blacklist = snakemake.params.get("default_blacklist", False)

default_known_fusions = snakemake.params.get("default_known_fusions", False)

if default_blacklist or default_known_fusions:
    if not build:
        raise ValueError(
            "Please provide a genome build when using blacklist- or known_fusion-filtering"
        )
    arriba_vers = [
        entry["version"]
        for entry in json.load(os.popen("conda list --json"))
        if entry["name"] == "arriba"
    ][0]


if blacklist_input and not default_blacklist:
    blacklist_cmd = "-b " + blacklist_input
elif not blacklist_input and default_blacklist:
    blacklist_dict = {
        "GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
        "GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz",
        "GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz",
        "GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz",
    }
    blacklist_path = os.path.join(database_dir, blacklist_dict[build])
    blacklist_cmd = "-b " + blacklist_path
elif not blacklist_input and not default_blacklist:
    blacklist_cmd = "-f blacklist"
else:
    raise ValueError(
        "custom_blacklist input file and default_blacklist parameter option defined. Please set only one of both."
    )

if default_known_fusions:
    fusions_dict = {
        "GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
        "GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz",
        "GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz",
        "GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz",
    }
    known_fusions_path = os.path.join(database_dir, fusions_dict[build])
    known_cmd = "-k " + known_fusions_path
else:
    known_cmd = ""

sv_file = snakemake.params.get("sv_file")
if sv_file:
    sv_cmd = "-d " + sv_file
else:
    sv_cmd = ""

shell(
    "arriba "
    "-x {snakemake.input.bam} "
    "-a {snakemake.input.genome} "
    "-g {snakemake.input.annotation} "
    "{blacklist_cmd} "
    "{known_cmd} "
    "{sv_cmd} "
    "-o {snakemake.output.fusions} "
    "{discarded_cmd} "
    "{extra} "
    "{log}"
)