ARRIBA
Detect gene fusions from chimeric STAR output
URL: https://github.com/suhrig/arriba
Example
This wrapper can be used in the following way:
rule arriba:
input:
# STAR bam containing chimeric alignments
bam="{sample}.bam",
# path to reference genome
genome="genome.fasta",
# path to annotation gtf
annotation="annotation.gtf",
# optional arriba blacklist file
custom_blacklist="minimal_blacklist.tsv.gz",
# optional file with known fusions
custom_known_fusions="minimal_known_fusions.tsv.gz",
output:
# approved gene fusions
fusions="fusions/{sample}.tsv",
# discarded gene fusions
discarded="fusions/{sample}.discarded.tsv", # optional
log:
"logs/arriba/{sample}.log",
params:
# strongly recommended, see https://github.com/suhrig/arriba/wiki/04-Input-files#blacklist
# only set blacklist input-file or blacklist-param
default_blacklist=False, # optional
default_known_fusions=False, # optional
# optional parameters
extra="-i 1,2",
threads: 1
wrapper:
"v9.0.1/bio/arriba"
rule arriba_with_sv:
input:
# STAR bam containing chimeric alignments
bam="{sample}.bam",
# path to reference genome
genome="genome.fa.gz",
# path to annotation gtf
annotation="annotation.gtf",
# optional arriba blacklist file
custom_blacklist="blacklist.tsv",
# optional file with known structural variants
sv_file="sv_list_from_wgs.vcf"
output:
# approved gene fusions
fusions="fusions/{sample}.with_sv.tsv",
# discarded gene fusions
discarded="fusions/{sample}.with_sv.discarded.tsv", # optional
log:
"logs/arriba/{sample}.with_sv.log",
params:
# required when any of blacklist or known_fusions is set to True
genome_build="GRCh38",
# strongly recommended, see https://github.com/suhrig/arriba/wiki/04-Input-files#blacklist
# only set blacklist input-file or blacklist-param
default_blacklist=False, # optional
default_known_fusions=True, # optional
threads: 1
wrapper:
"v9.0.1/bio/arriba"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
This tool/wrapper does not handle multi threading, as arriba indicates that no significant speedup is expected from it. Also see the -@ THREADS option description description in the the command line arguments documentation.
Software dependencies
arriba=2.5.1
Input/Output
Input:
bam: Path to SAM, BAM or CRAM formatted alignment file from STAR. See the documentation on alignment file inputs.genome: Path to FASTA formatted genome sequence file (may be gzipped). See the documentation on assembly file inputs.annotation: Path to GTF formatted genome annotation file (may be gzipped). See the documentation on annotation file inputs.custom_blacklist: (optional) Path to custom arriba blacklist file. See the documentation on blacklist.custom_known_fusions: (optional) Path to known fusions file. See the documentation on known fusions.sv_file: (optional) Path to structural variations calls from WGS. See the documentation on SV.
Output:
fusions: Path to output file for fusions after filtering.discarded: (optional) Path to output file for fusions that were filtered out by arriba.
Params
genome_build: Required if any ofdefault_blacklistordefault_known_fusionsis set toTruedefault_blacklist: Set toTrueto use default blacklist. Must beFalse(or omitted) if acustom_blacklistfile is specified underinput:. See the documentation on blacklist.default_known_fusions: Set toTrueto use default known fusions. Must beFalse(or omitted) if acustom_known_fusionsfile is specified underinput:. See the documentation on known fusions.extra: Other optional parameters.
Code
__author__ = "Jan Forster"
__copyright__ = "Copyright 2019, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"
import os
import subprocess as sp
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
discarded_fusions = snakemake.output.get("discarded", "")
if discarded_fusions:
discarded_cmd = "-O " + discarded_fusions
else:
discarded_cmd = ""
database_dir = os.path.join(os.environ["CONDA_PREFIX"], "var/lib/arriba")
build = snakemake.params.get("genome_build", None)
blacklist_input = snakemake.input.get("custom_blacklist")
default_blacklist = snakemake.params.get("default_blacklist", False)
known_fusions_input = snakemake.input.get("custom_known_fusions")
default_known_fusions = snakemake.params.get("default_known_fusions", False)
if default_blacklist or default_known_fusions:
if not build:
raise ValueError(
"Please provide a genome build when using blacklist- or known_fusion-filtering"
)
command = "arriba -h | grep -e 'Arriba ' -e '^Version: ' | grep -om1 '[0-9.]\\+$'"
arriba_vers = sp.run(
command, shell=True, capture_output=True, text=True
).stdout.strip()
if blacklist_input and not default_blacklist:
blacklist_cmd = "-b " + blacklist_input
elif not blacklist_input and default_blacklist:
blacklist_dict = {
"GRCh37": f"blacklist_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"blacklist_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"blacklist_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"blacklist_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
blacklist_path = os.path.join(database_dir, blacklist_dict[build])
blacklist_cmd = "-b " + blacklist_path
elif not blacklist_input and not default_blacklist:
blacklist_cmd = "-f blacklist"
else:
raise ValueError(
"A custom_blacklist input file is given and the default_blacklist parameter is set to 'True'. Please set only one of both."
)
if known_fusions_input and not default_known_fusions:
known_cmd = "-k " + known_fusions_input
elif not known_fusions_input and default_known_fusions:
fusions_dict = {
"GRCh37": f"known_fusions_hg19_hs37d5_GRCh37_v{arriba_vers}.tsv.gz",
"GRCh38": f"known_fusions_hg38_GRCh38_v{arriba_vers}.tsv.gz",
"GRCm38": f"known_fusions_mm10_GRCm38_v{arriba_vers}.tsv.gz",
"GRCm39": f"known_fusions_mm39_GRCm39_v{arriba_vers}.tsv.gz",
}
known_fusions_path = os.path.join(database_dir, fusions_dict[build])
known_cmd = "-k " + known_fusions_path
elif not known_fusions_input and not default_known_fusions:
known_cmd = "-f known_fusions"
else:
raise ValueError(
"A custom_known_fusions input file is given and the default_known_fusions parameter is set to 'True'. Please set only one of both."
)
sv_file_input = snakemake.input.get("sv_file")
if sv_file_input:
sv_cmd = "-d " + sv_file_input
else:
sv_cmd = ""
shell(
"arriba "
"-x {snakemake.input.bam} "
"-a {snakemake.input.genome} "
"-g {snakemake.input.annotation} "
"{blacklist_cmd} "
"{known_cmd} "
"{sv_cmd} "
"-o {snakemake.output.fusions} "
"{discarded_cmd} "
"{extra} "
"{log}"
)