.. _`bio/rsem/calculate-expression`: RSEM CALCULATE EXPRESSION ========================= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/rsem/calculate-expression?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/rsem/calculate-expression Run rsem-calculate-expression to estimate gene and isoform expression from RNA-Seq data. **URL**: http://deweylab.github.io/RSEM/rsem-calculate-expression.html Example ------- This wrapper can be used in the following way: .. code-block:: python rule calculate_expression: input: # input.bam or input.fq_one must be specified (and if input.fq_one, optionally input.fq_two if paired-end) # an aligned to transcriptome BAM bam="mapped/a.bam", # Index files created by rsem-prepare-reference reference=multiext("index/reference", ".grp", ".ti", ".transcripts.fa", ".seq", ".idx.fa", ".n2g.idx.fa"), # reference_bowtie: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome # reference_bowtie=multiext("index/reference", ".1.ebwt", ".2.ebwt", ".3.ebwt", ".4.ebwt", ".rev.1.ebwt", ".rev.2.ebwt"), output: # genes_results must end in .genes.results; this suffix is stripped and passed to rsem as an output name prefix # this file contains per-gene quantification data for the sample genes_results="output/a.genes.results", # isoforms_results must end in .isoforms.results and otherwise have the same prefix as genes_results # this file contains per-transcript quantification data for the sample isoforms_results="output/a.isoforms.results", params: # optional, specify if sequencing is paired-end paired_end=True, # additional optional parameters to pass to rsem, for example, extra="--seed 42", log: "logs/rsem/calculate_expression/a.log", threads: 2 wrapper: "v3.0.1/bio/rsem/calculate-expression" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * For more information, see https://github.com/deweylab/RSEM. Software dependencies --------------------- * ``rsem=1.3.3`` * ``bowtie=1.3.1`` Input/Output ------------ **Input:** * ``bam``: BAM file with reads aligned to transcriptome * ``fq_one``: FASTQ file of reads (read_1 for paired-end sequencing) * ``fq_two``: Optional second FASTQ file of reads (read_2 for paired-end sequencing) * ``reference``: Index files created by rsem-prepare-reference * ``reference_bowtie``: Additionally needed for FASTQ input; Index files created (by bowtie-build) from the reference transcriptome **Output:** * ``genes_results``: This file contains per-gene quantification data for the sample * ``isoforms_results``: This file contains per-transcript quantification data for the sample Authors ------- * Brett Copeland Code ---- .. code-block:: python __author__ = "Brett Copeland" __copyright__ = "Copyright 2021, Brett Copeland" __email__ = "brcopeland@ucsd.edu" __license__ = "MIT" import os from snakemake.shell import shell bam = snakemake.input.get("bam", "") fq_one = snakemake.input.get("fq_one", "") fq_two = snakemake.input.get("fq_two", "") if bam: if fq_one: raise Exception("Only input.bam or input.fq_one expected, got both.") input_bam = "--alignments" input_string = bam paired_end = snakemake.params.get("paired_end", False) else: input_bam = "" if fq_one: if isinstance(fq_one, list): num_fq_one = len(fq_one) input_string = ",".join(fq_one) else: num_fq_one = 1 input_string = fq_one if fq_two: paired_end = True if isinstance(fq_two, list): num_fq_two = len(fq_two) if num_fq_one != num_fq_two: raise Exception( "Got {} R1 FASTQs, {} R2 FASTQs.".format(num_fq_one, num_fq_two) ) else: fq_two = [fq_two] input_string += " " + ",".join(fq_two) else: paired_end = False else: raise Exception("Expected input.bam or input.fq_one, got neither.") if paired_end: paired_end_string = "--paired-end" else: paired_end_string = "" genes_results = snakemake.output.genes_results if genes_results.endswith(".genes.results"): output_prefix = genes_results[: -len(".genes.results")] else: raise Exception( "output.genes_results file name malformed " "(rsem will append .genes.results suffix)" ) if not snakemake.output.isoforms_results.endswith(".isoforms.results"): raise Exception( "output.isoforms_results file name malformed " "(rsem will append .isoforms.results suffix)" ) reference_prefix = os.path.splitext(snakemake.input.reference[0])[0] extra = snakemake.params.get("extra", "") threads = snakemake.threads log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "rsem-calculate-expression --num-threads {snakemake.threads} {extra} " "{paired_end_string} {input_bam} {input_string} " "{reference_prefix} {output_prefix} " "{log}" ) .. |nl| raw:: html