ARRIBA
Detect gene fusions from chimeric STAR output
URL: https://github.com/suhrig/arriba
Example
This wrapper can be used in the following way:
rule arriba:
input:
# STAR bam containing chimeric alignments
bam="{sample}.bam",
# path to reference genome
genome="genome.fasta",
# path to annotation gtf
annotation="annotation.gtf",
# optional arriba blacklist file
custom_blacklist=[],
output:
# approved gene fusions
fusions="fusions/{sample}.tsv",
# discarded gene fusions
discarded="fusions/{sample}.discarded.tsv", # optional
log:
"logs/arriba/{sample}.log",
params:
# required when blacklist or known_fusions is set
genome_build="GRCh38",
# strongly recommended, see https://arriba.readthedocs.io/en/latest/input-files/#blacklist
# only set blacklist input-file or blacklist-param
default_blacklist=False, # optional
default_known_fusions=True, # optional
# file containing information from structural variant analysis
sv_file="", # optional
# optional parameters
extra="-i 1,2",
threads: 1
wrapper:
"v5.8.0-3-g915ba34/bio/arriba"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
This tool/wrapper does not handle multi threading.
Software dependencies
arriba=2.4.0
Input/Output
Input:
bam
: Path to bam formatted alignment file from STARgenome
: Path to fasta formatted genome sequenceannotation
: Path to GTF formatted genome annotation
Output:
fusions
: Path to output fusion file
Params
known_fusions
: Path to known fusions file, see official documentation on known fusions for more information.blacklist
: Path to blacklist file, see official documentation on blacklist for more information.sv_file
: Path to structural variations calls from WGS, see official documentation on SV for more information.extra
: Other optional parameters
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
import subprocess as sp
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
discarded_cmd = "-O " + discarded_fusions
else:
discarded_cmd = ""
database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba")
build = snakemake.params.get("genome_build", None)
blacklist_input = snakemake.input.get("custom_blacklist")
default_blacklist = snakemake.params.get("default_blacklist", False)
default_known_fusions = snakemake.params.get("default_known_fusions", False)
if default_blacklist or default_known_fusions:
if not build:
raise ValueError(
"Please provide a genome build when using blacklist- or known_fusion-filtering"
)
command = "arriba -h | grep -e 'Arriba ' -e '^Version: ' | grep -om1 '[0-9.]\\+$'"
arriba_vers = sp.run(
command, shell=True, capture_output=True, text=True
).stdout.strip()
if blacklist_input and not default_blacklist:
blacklist_cmd = "-b " + blacklist_input
elif not blacklist_input and default_blacklist:
blacklist_dict = {
"GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
blacklist_path = os.path.join(database_dir, blacklist_dict[build])
blacklist_cmd = "-b " + blacklist_path
elif not blacklist_input and not default_blacklist:
blacklist_cmd = "-f blacklist"
else:
raise ValueError(
"custom_blacklist input file and default_blacklist parameter option defined. Please set only one of both."
)
if default_known_fusions:
fusions_dict = {
"GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
known_fusions_path = os.path.join(database_dir, fusions_dict[build])
known_cmd = "-k " + known_fusions_path
else:
known_cmd = ""
sv_file = snakemake.params.get("sv_file")
if sv_file:
sv_cmd = "-d " + sv_file
else:
sv_cmd = ""
shell(
"arriba "
"-x {snakemake.input.bam} "
"-a {snakemake.input.genome} "
"-g {snakemake.input.annotation} "
"{blacklist_cmd} "
"{known_cmd} "
"{sv_cmd} "
"-o {snakemake.output.fusions} "
"{discarded_cmd} "
"{extra} "
"{log}"
)