SALMON_INDEX

Index a transcriptome assembly with salmon

URL: https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode

Example

This wrapper can be used in the following way:

rule salmon_index:
    input:
        sequences="assembly/transcriptome.fasta",
    output:
        multiext(
            "salmon/transcriptome_index/",
            "complete_ref_lens.bin",
            "ctable.bin",
            "ctg_offsets.bin",
            "duplicate_clusters.tsv",
            "info.json",
            "mphf.bin",
            "pos.bin",
            "pre_indexing.log",
            "rank.bin",
            "refAccumLengths.bin",
            "ref_indexing.log",
            "reflengths.bin",
            "refseq.bin",
            "seq.bin",
            "versionInfo.json",
        ),
    log:
        "logs/salmon/transcriptome_index.log",
    threads: 2
    params:
        # optional parameters
        extra="",
    wrapper:
        "v1.9.0/bio/salmon/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • salmon==1.8.0

Input/Output

Input:

  • sequences: Path to sequences to index with Salmon. This can be transcriptome sequences or gentrome.
  • decoys: Optional path to decoy sequences name, in case the above sequence was a gentrome.

Output:

  • indexed assembly

Params

  • extra: Optional parameters besides –tmpdir, –threads, and IO.

Authors

  • Tessa Pierce
  • Thibault Dayris

Code

"""Snakemake wrapper for Salmon Index."""

__author__ = "Tessa Pierce"
__copyright__ = "Copyright 2018, Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os.path import dirname
from snakemake.shell import shell
from tempfile import TemporaryDirectory

log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")

decoys = snakemake.input.get("decoys", "")
if decoys:
    decoys = f"--decoys {decoys}"

output = snakemake.output
if len(output) > 1:
    output = dirname(snakemake.output[0])

with TemporaryDirectory() as tempdir:
    shell(
        "salmon index "
        "--transcripts {snakemake.input.sequences} "
        "--index {output} "
        "--threads {snakemake.threads} "
        "--tmpdir {tempdir} "
        "{decoys} "
        "{extra} "
        "{log}"
    )