STAR-ARRIBA

A subworkflow for fusion detection from RNA-seq data with arriba. The fusion calling is based on splice-aware, chimeric alignments done with STAR. STAR is used with specific parameters to ensure optimal functionality of the arriba fusion detection, for details, see the documentation.

Usage

Via module

This usage is recommended with Snakemake >=7.9. You can include this meta-wrapper in your workflow via the Snakemake module system:

module star_arriba:
    meta_wrapper: "v9.0.1/meta/bio/star_arriba"
    pathvars:
        results="...", # Path to results directory
        resources="...", # Path to resources directory
        logs="...", # Path to logs directory
        genome_sequence="...", # Path to FASTA file with genome sequence
        genome_annotation="...", # Path to GTF file with genome annotation
        reads_r1="...", # Path/pattern for FASTQ files with R1 reads
        reads_r2="...", # Path/pattern for FASTQ files with R2 reads
        per="...", # Pattern for sample identifiers, e.g. ``"{sample}"``


use rule * from star_arriba as star_arriba_*

Upon using the rules, you can additionally modify input, output, log, and params as needed (see the definition of each rule below and the modules documentation). For additional parameters in each individual wrapper, please refer to their corresponding documentation (see links below).

Via copy-paste

Alternatively, you can directly copy-paste and modify the full meta-wrapper code below into your workflow.

Execution

When running with

snakemake --sdm conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Used wrappers

The following individual wrappers are used in this meta-wrapper:

Please refer to each wrapper in above list for additional configuration parameters and information about the executed code.

Authors

Jan Forster

Code

rule star_index:
    input:
        fasta="<genome_sequence>",
        gtf="<genome_annotation>",
    output:
        directory("<resources>/star_genome"),
    threads: 4
    params:
        sjdbOverhang=100,
        extra="",
    log:
        "<logs>/star_index_genome.log",
    cache: True  # mark as eligible for between workflow caching
    wrapper:
        "v3.3.7/bio/star/index"


rule star_align:
    input:
        # use a list for multiple fastq files for one sample
        # usually technical replicates across lanes/flowcells
        fq1="<reads_r1>",
        fq2="<reads_r2>",  #optional
        idx="<resources>/star_genome",
        annotation="<genome_annotation>",
    output:
        # see STAR manual for additional output files
        aln="<results>/star/<per>/Aligned.out.bam",
        reads_per_gene="<results>/star/<per>/ReadsPerGene.out.tab",
    log:
        "<logs>/star/<per>.log",
    params:
        # specific parameters to work well with arriba
        extra=lambda wc, input: (
            f"--quantMode GeneCounts --sjdbGTFfile {input.annotation}"
            " --outSAMtype BAM Unsorted --chimSegmentMin 10 --chimOutType WithinBAM SoftClip"
            " --chimJunctionOverhangMin 10 --chimScoreMin 1 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0"
            " --chimScoreSeparation 1 --alignSJstitchMismatchNmax 5 -1 5 5 --chimSegmentReadGapMax 3"
        ),
    threads: 12
    wrapper:
        "v3.3.7/bio/star/align"


rule arriba:
    input:
        bam=rules.star_align.output.aln,
        genome="<genome_sequence>",
        annotation="<genome_annotation>",
        # optional: # A custom tsv containing identified artifacts, such as read-through fusions of neighbouring genes.
        # default blacklists are selected via blacklist parameter
        # see https://github.com/suhrig/arriba/wiki/04-Input-files#blacklist
        custom_blacklist=[],
    output:
        fusions="<results>/arriba/<per>.fusions.tsv",
        discarded="<results>/arriba/<per>.fusions.discarded.tsv",
    params:
        # required if blacklist or known_fusions is set
        genome_build="GRCh38",
        default_blacklist=False,
        default_known_fusions=True,
        extra="",
    log:
        "<logs>/arriba/<per>.log",
    threads: 1
    wrapper:
        "v7.3.0/bio/arriba"