RSEM CALCULATE EXPRESSION

Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data.

URL: http://deweylab.github.io/RSEM/rsem-calculate-expression.html

Example

This wrapper can be used in the following way:

rule calculate_expression:
    input:
        # input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end)
        # an aligned to transcriptome BAM
        bam="mapped/a.bam",
        # Index files created by rsem-prepare-reference
        reference=multiext("index/reference", ".grp", ".ti", ".transcripts.fa", ".seq", ".idx.fa", ".n2g.idx.fa"),
        # reference_bowtie: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome
        # reference_bowtie=multiext("index/reference", ".1.ebwt", ".2.ebwt", ".3.ebwt", ".4.ebwt", ".rev.1.ebwt", ".rev.2.ebwt"),
    output:
        # genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix
        # this file contains per-gene quantification data for the sample
        genes_results="output/a.genes.results",
        # isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results
        # this file contains per-transcript quantification data for the sample
        isoforms_results="output/a.isoforms.results",
    params:
        # optional, specify if sequencing is paired-end
        paired_end=True,
        # additional optional parameters to pass to rsem, for example,
        extra="--seed 42",
    log:
        "logs/rsem/calculate_expression/a.log",
    threads: 2
    wrapper:
        "v3.8.0-49-g6f33607/bio/rsem/calculate-expression"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

For more information, see https://github.com/deweylab/RSEM.

Software dependencies

rsem=1.3.3
bowtie=1.3.1

Input/Output

Input:

bam: BAM file with reads aligned to transcriptome
fq_one: FASTQ file of reads (read_1 for paired-end sequencing)
fq_two: Optional second FASTQ file of reads (read_2 for paired-end sequencing)
reference: Index files created by rsem-prepare-reference
reference_bowtie: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome

Output:

genes_results: This file contains per-gene quantification data for the sample
isoforms_results: This file contains per-transcript quantification data for the sample

Authors

Brett Copeland

Code

__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

bam = snakemake.input.get("bam", "")
fq_one = snakemake.input.get("fq_one", "")
fq_two = snakemake.input.get("fq_two", "")
if bam:
    if fq_one:
        raise Exception("Only input.bam or input.fq_one expected, got both.")
    input_bam = "--alignments"
    input_string = bam
    paired_end = snakemake.params.get("paired_end", False)
else:
    input_bam = ""
    if fq_one:
        if isinstance(fq_one, list):
            num_fq_one = len(fq_one)
            input_string = ",".join(fq_one)
        else:
            num_fq_one = 1
            input_string = fq_one
        if fq_two:
            paired_end = True
            if isinstance(fq_two, list):
                num_fq_two = len(fq_two)
                if num_fq_one != num_fq_two:
                    raise Exception(
                        "Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two)
                    )
            else:
                fq_two = [fq_two]
            input_string += " " + ",".join(fq_two)
        else:
            paired_end = False
    else:
        raise Exception("Expected input.bam or input.fq_one, got neither.")

if paired_end:
    paired_end_string = "--paired-end"
else:
    paired_end_string = ""

genes_results = snakemake.output.genes_results
if genes_results.endswith(".genes.results"):
    output_prefix = genes_results[: -len(".genes.results")]
else:
    raise Exception(
        "output.genes_results file name malformed "
        "(rsem will append .genes.results suffix)"
    )
if not snakemake.output.isoforms_results.endswith(".isoforms.results"):
    raise Exception(
        "output.isoforms_results file name malformed "
        "(rsem will append .isoforms.results suffix)"
    )

reference_prefix = os.path.splitext(snakemake.input.reference[0])[0]

extra = snakemake.params.get("extra", "")
threads = snakemake.threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "rsem-calculate-expression --num-threads {snakemake.threads} {extra} "
    "{paired_end_string} {input_bam} {input_string} "
    "{reference_prefix} {output_prefix} "
    "{log}"
)