ARRIBA

https://img.shields.io/badge/wrapper_version-v9.0.1-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/arriba?label=version%20update%20pull%20requests&color=1cb481

Detect gene fusions from chimeric STAR output

URL: https://github.com/suhrig/arriba

Example

This wrapper can be used in the following way:

rule arriba:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fasta",
        # path to annotation gtf
        annotation="annotation.gtf",
        # optional arriba blacklist file
        custom_blacklist="minimal_blacklist.tsv.gz",
        # optional file with known fusions
        custom_known_fusions="minimal_known_fusions.tsv.gz",
    output:
        # approved gene fusions
        fusions="fusions/{sample}.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.discarded.tsv",  # optional
    log:
        "logs/arriba/{sample}.log",
    params:
        # strongly recommended, see https://github.com/suhrig/arriba/wiki/04-Input-files#blacklist
        # only set blacklist input-file or blacklist-param
        default_blacklist=False,  # optional
        default_known_fusions=False,  # optional
        # optional parameters
        extra="-i 1,2",
    threads: 1
    wrapper:
        "v9.0.1/bio/arriba"


rule arriba_with_sv:
    input:
        # STAR bam containing chimeric alignments
        bam="{sample}.bam",
        # path to reference genome
        genome="genome.fa.gz",
        # path to annotation gtf
        annotation="annotation.gtf",
        # optional arriba blacklist file
        custom_blacklist="blacklist.tsv",
        # optional file with known structural variants
        sv_file="sv_list_from_wgs.vcf"
    output:
        # approved gene fusions
        fusions="fusions/{sample}.with_sv.tsv",
        # discarded gene fusions
        discarded="fusions/{sample}.with_sv.discarded.tsv",  # optional
    log:
        "logs/arriba/{sample}.with_sv.log",
    params:
        # required when any of blacklist or known_fusions is set to True
        genome_build="GRCh38",
        # strongly recommended, see https://github.com/suhrig/arriba/wiki/04-Input-files#blacklist
        # only set blacklist input-file or blacklist-param
        default_blacklist=False,  # optional
        default_known_fusions=True,  # optional
    threads: 1
    wrapper:
        "v9.0.1/bio/arriba"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

This tool/wrapper does not handle multi threading, as arriba indicates that no significant speedup is expected from it. Also see the -@ THREADS option description description in the the command line arguments documentation.

Software dependencies

  • arriba=2.5.1

Input/Output

Input:

Output:

  • fusions: Path to output file for fusions after filtering.

  • discarded: (optional) Path to output file for fusions that were filtered out by arriba.

Params

  • genome_build: Required if any of default_blacklist or default_known_fusions is set to True

  • default_blacklist: Set to True to use default blacklist. Must be False (or omitted) if a custom_blacklist file is specified under input:. See the documentation on blacklist.

  • default_known_fusions: Set to True to use default known fusions. Must be False (or omitted) if a custom_known_fusions file is specified under input:. See the documentation on known fusions.

  • extra: Other optional parameters.

Authors

  • Jan Forster

  • Felix Mölder

  • David Lähnemann

Code

__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
import subprocess as sp
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
    discarded_cmd = "-O " + discarded_fusions
else:
    discarded_cmd = ""

database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba")
build = snakemake.params.get("genome_build", None)

blacklist_input = snakemake.input.get("custom_blacklist")
default_blacklist = snakemake.params.get("default_blacklist", False)

known_fusions_input = snakemake.input.get("custom_known_fusions")
default_known_fusions = snakemake.params.get("default_known_fusions", False)

if default_blacklist or default_known_fusions:
    if not build:
        raise ValueError(
            "Please provide a genome build when using blacklist- or known_fusion-filtering"
        )
    command = "arriba -h | grep -e 'Arriba ' -e '^Version: ' | grep -om1 '[0-9.]\\+$'"
    arriba_vers = sp.run(
        command, shell=True, capture_output=True, text=True
    ).stdout.strip()

if blacklist_input and not default_blacklist:
    blacklist_cmd = "-b " + blacklist_input
elif not blacklist_input and default_blacklist:
    blacklist_dict = {
        "GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
        "GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz",
        "GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz",
        "GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz",
    }
    blacklist_path = os.path.join(database_dir, blacklist_dict[build])
    blacklist_cmd = "-b " + blacklist_path
elif not blacklist_input and not default_blacklist:
    blacklist_cmd = "-f blacklist"
else:
    raise ValueError(
        "A custom_blacklist input file is given and the default_blacklist parameter is set to 'True'. Please set only one of both."
    )

if known_fusions_input and not default_known_fusions:
    known_cmd = "-k " + known_fusions_input
elif not known_fusions_input and default_known_fusions:
    fusions_dict = {
        "GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
        "GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz",
        "GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz",
        "GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz",
    }
    known_fusions_path = os.path.join(database_dir, fusions_dict[build])
    known_cmd = "-k " + known_fusions_path
elif not known_fusions_input and not default_known_fusions:
    known_cmd = "-f known_fusions"
else:
    raise ValueError(
        "A custom_known_fusions input file is given and the default_known_fusions parameter is set to 'True'. Please set only one of both."
    )

sv_file_input = snakemake.input.get("sv_file")
if sv_file_input:
    sv_cmd = "-d " + sv_file_input
else:
    sv_cmd = ""

shell(
    "arriba "
    "-x {snakemake.input.bam} "
    "-a {snakemake.input.genome} "
    "-g {snakemake.input.annotation} "
    "{blacklist_cmd} "
    "{known_cmd} "
    "{sv_cmd} "
    "-o {snakemake.output.fusions} "
    "{discarded_cmd} "
    "{extra} "
    "{log}"
)