RSEM CALCULATE EXPRESSION

Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data.

URL:

Example

This wrapper can be used in the following way:

rule calculate_expression:
    input:
        # input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end)
        # an aligned to transcriptome BAM
        bam="mapped/a.bam",
        # one of the index files created by rsem-prepare-reference; the file suffix is stripped and passed on to rsem
        reference="index/reference.seq",
    output:
        # genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix
        # this file contains per-gene quantification data for the sample
        genes_results="output/a.genes.results",
        # isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results
        # this file contains per-transcript quantification data for the sample
        isoforms_results="output/a.isoforms.results",
    params:
        # optional, specify if sequencing is paired-end
        paired_end=True,
        # additional optional parameters to pass to rsem, for example,
        extra="--seed 42",
    log:
        "logs/rsem/calculate_expression/a.log",
    wrapper:
        "v1.1.0/bio/rsem/calculate-expression"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • rsem==1.3.3

Input/Output

Input:

  • BAM aligned to transcriptome

Output:

  • Per-gene and per-isoform read quantification.

Notes

Authors

  • Brett Copeland

Code

__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"


import os

from snakemake.shell import shell

bam = snakemake.input.get("bam", "")
fq_one = snakemake.input.get("fq_one", "")
fq_two = snakemake.input.get("fq_two", "")
if bam:
    if fq_one:
        raise Exception("Only input.bam or input.fq_one expected, got both.")
    input_bam = " --bam"
    input_string = bam
    paired_end = snakemake.params.get("paired-end", False)
else:
    input_bam = ""
    if fq:
        input_bam = False
        if isinstance(fq, list):
            num_fq_one = len(fq)
            input_string = ",".join(fq)
        else:
            num_fq_one = 1
            input_string = fq
        if fq_two:
            paired_end = True
            if isinstance(fq_two, list):
                num_fq_two = len(fq_two)
                if num_fq_one != num_fq_two:
                    raise Exception(
                        "Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two)
                    )
            else:
                fq_two = [fq_two]
            input_string += " " + ",".join(fq_two)
        else:
            paired_end = False
    else:
        raise Exception("Expected input.bam or input.fq_one, got neither.")

if paired_end:
    paired_end_string = "--paired-end"
else:
    paired_end_string = ""

genes_results = snakemake.output.genes_results
if genes_results.endswith(".genes.results"):
    output_prefix = genes_results[: -len(".genes.results")]
else:
    raise Exception(
        "output.genes_results file name malformed "
        "(rsem will append .genes.results suffix)"
    )
if not snakemake.output.isoforms_results.endswith(".isoforms.results"):
    raise Exception(
        "output.isoforms_results file name malformed "
        "(rsem will append .isoforms.results suffix)"
    )

reference_prefix = os.path.splitext(snakemake.input.reference)[0]

extra = snakemake.params.get("extra", "")
threads = snakemake.threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
    "rsem-calculate-expression --num-threads {snakemake.threads} {extra} "
    "{paired_end_string} {input_bam} {input_string} "
    "{reference_prefix} {output_prefix} "
    "{log}"
)