RSEM CALCULATE EXPRESSION¶
Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data.
URL: http://deweylab.github.io/RSEM/rsem-calculate-expression.html
Example¶
This wrapper can be used in the following way:
rule calculate_expression:
input:
# input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end)
# an aligned to transcriptome BAM
bam="mapped/a.bam",
# Index files created by rsem-prepare-reference
reference=multiext("index/reference", ".grp", ".ti", ".transcripts.fa", ".seq", ".idx.fa", ".n2g.idx.fa"),
# reference_bowtie: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome
# reference_bowtie=multiext("index/reference", ".1.ebwt", ".2.ebwt", ".3.ebwt", ".4.ebwt", ".rev.1.ebwt", ".rev.2.ebwt"),
output:
# genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix
# this file contains per-gene quantification data for the sample
genes_results="output/a.genes.results",
# isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results
# this file contains per-transcript quantification data for the sample
isoforms_results="output/a.isoforms.results",
params:
# optional, specify if sequencing is paired-end
paired_end=True,
# additional optional parameters to pass to rsem, for example,
extra="--seed 42",
log:
"logs/rsem/calculate_expression/a.log",
threads: 2
wrapper:
"v2.6.0/bio/rsem/calculate-expression"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- For more information, see https://github.com/deweylab/RSEM.
Software dependencies¶
rsem=1.3.3
bowtie=1.3.1
Input/Output¶
Input:
bam
: BAM file with reads aligned to transcriptomefq_one
: FASTQ file of reads (read_1 for paired-end sequencing)fq_two
: Optional second FASTQ file of reads (read_2 for paired-end sequencing)reference
: Index files created by rsem-prepare-referencereference_bowtie
: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome
Output:
genes_results
: This file contains per-gene quantification data for the sampleisoforms_results
: This file contains per-transcript quantification data for the sample
Authors¶
- Brett Copeland
Code¶
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
bam = snakemake.input.get("bam", "")
fq_one = snakemake.input.get("fq_one", "")
fq_two = snakemake.input.get("fq_two", "")
if bam:
if fq_one:
raise Exception("Only input.bam or input.fq_one expected, got both.")
input_bam = "--alignments"
input_string = bam
paired_end = snakemake.params.get("paired_end", False)
else:
input_bam = ""
if fq_one:
if isinstance(fq_one, list):
num_fq_one = len(fq_one)
input_string = ",".join(fq_one)
else:
num_fq_one = 1
input_string = fq_one
if fq_two:
paired_end = True
if isinstance(fq_two, list):
num_fq_two = len(fq_two)
if num_fq_one != num_fq_two:
raise Exception(
"Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two)
)
else:
fq_two = [fq_two]
input_string += " " + ",".join(fq_two)
else:
paired_end = False
else:
raise Exception("Expected input.bam or input.fq_one, got neither.")
if paired_end:
paired_end_string = "--paired-end"
else:
paired_end_string = ""
genes_results = snakemake.output.genes_results
if genes_results.endswith(".genes.results"):
output_prefix = genes_results[: -len(".genes.results")]
else:
raise Exception(
"output.genes_results file name malformed "
"(rsem will append .genes.results suffix)"
)
if not snakemake.output.isoforms_results.endswith(".isoforms.results"):
raise Exception(
"output.isoforms_results file name malformed "
"(rsem will append .isoforms.results suffix)"
)
reference_prefix = os.path.splitext(snakemake.input.reference[0])[0]
extra = snakemake.params.get("extra", "")
threads = snakemake.threads
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
shell(
"rsem-calculate-expression --num-threads {snakemake.threads} {extra} "
"{paired_end_string} {input_bam} {input_string} "
"{reference_prefix} {output_prefix} "
"{log}"
)