PYROE MAKE-SPLICED+INTRONIC

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/pyroe/makesplicedintronic?label=version%20update%20pull%20requests

Build splici reference files for Alevin-fry. The splici index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species.

URL: https://pyroe.readthedocs.io/en/latest/building_splici_index.html#preparing-a-spliced-intronic-transcriptome-reference

Example

This wrapper can be used in the following way:

rule test_pyroe_makesplicedintronic:
    input:
        fasta="genome.fasta",
        gtf="annotation.gtf",
        spliced="extra_spliced.fasta",  # Optional path to additional spliced sequences (FASTA)
        unspliced="extra_unspliced.fasta",  # Optional path to additional unspliced sequences (FASTA)
    output:
        fasta="splici_full/spliced_intronic_sequences.fasta",
        gene_id_to_name="splici_full/gene_id_to_name.tsv",
        t2g="splici_full/t2g.tsv",
    threads: 1
    log:
        "logs/pyroe.log",
    params:
        read_length=91,  # Required
        flank_trim_length=5,  # Optional l
        extra="",  # Optional parameters
    wrapper:
        "v3.9.0-14-g476823b/bio/pyroe/makesplicedintronic"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • pyroe=0.9.3

  • bedtools=2.31.1

Input/Output

Input:

  • gtf: Path to the genome annotation (GTF formatted)

  • fasta: Path to the genome sequence (Fasta formatted)

  • spliced: Optional path to additional spliced sequences (Fasta formatted)

  • unspliced: Optional path to unspliced sequences (Fasta formatted)

Output:

  • fasta: Path to spliced+intronic sequences (Fasta formatted)

  • gene_id_to_name: Path to a TSV formatted text file containing gene_id <-> gene_name correspondence

  • t2g: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence

Params

  • read_length: The read length of the single-cell experiment being processed (determines flank size). Default is 100.

  • extra: Optional parameters to be passed to pyroe

Authors

Code

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"


from tempfile import TemporaryDirectory
from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")

# pyroe uses the flank-length value to name its output files
# in the result directory. We need this value to acquired output
# files and let snakemake-wrapper choose its output file names.
read_length = snakemake.params.get("read_length", 101)
flank_trim_length = snakemake.params.get("flank_trim_length", 5)
flank_length = read_length - flank_trim_length

spliced = snakemake.input.get("spliced", "")
if spliced:
    spliced = "--extra-spliced " + spliced


unspliced = snakemake.input.get("unspliced", "")
if unspliced:
    unspliced = "--extra-unspliced " + unspliced


with TemporaryDirectory() as tempdir:
    shell(
        "pyroe make-spliced+intronic "
        "{extra} {spliced} "
        "{unspliced} "
        "{snakemake.input.fasta} "
        "{snakemake.input.gtf} "
        "{read_length} "
        "{tempdir} "
        "{log}"
    )

    if snakemake.output.get("fasta", False):
        shell(
            "mv --verbose "
            "{tempdir}/splici_fl{flank_length}.fa "
            "{snakemake.output.fasta} {log}"
        )

    if snakemake.output.get("gene_id_to_name", False):
        shell(
            "mv --verbose "
            "{tempdir}/gene_id_to_name.tsv "
            "{snakemake.output.gene_id_to_name} {log}"
        )

    if snakemake.output.get("t2g", False):
        shell(
            "mv --verbose "
            "{tempdir}/splici_fl{flank_length}_t2g_3col.tsv "
            "{snakemake.output.t2g} {log} "
        )