PYROE MAKE-SPLICED+UNSPLICED

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/pyroe/makeunspliceunspliced?label=version%20update%20pull%20requests

Build spliceu reference files for Alevin-fry. The spliceu (the spliced + unspliced) transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene.

URL: https://pyroe.readthedocs.io/en/latest/building_splici_index.html#preparing-a-spliced-unspliced-transcriptome-reference

Example

This wrapper can be used in the following way:

rule test_pyroe_makesplicedunspliced:
    input:
        fasta="genome.fasta",
        gtf="annotation.gtf",
        spliced="extra_spliced.fasta",  # Optional path to additional spliced sequences (FASTA)
        unspliced="extra_unspliced.fasta",  # Optional path to additional unspliced sequences (FASTA)
    output:
        gene_id_to_name="gene_id_to_name.tsv",
        fasta="spliceu.fa",
        g2g="spliceu_g2g.tsv",
        t2g_3col="spliceu_t2g_3col.tsv",
        t2g="spliceu_t2g.tsv",
    threads: 1
    log:
        "logs/pyroe.log",
    params:
        extra="",  # Optional parameters
    wrapper:
        "v4.6.0-24-g250dd3e/bio/pyroe/makeunspliceunspliced/"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • pyroe=0.9.3

  • bedtools=2.31.1

Input/Output

Input:

  • gtf: Path to the genome annotation (GTF formatted)

  • fasta: Path to the genome sequence (Fasta formatted)

  • spliced: Optional path to additional spliced sequences (Fasta formatted)

  • unspliced: Optional path to unspliced sequences (Fasta formatted)

Output:

  • fasta: Path to spliced+unspliced sequences (Fasta formatted)

  • gene_id_to_name: Path to a TSV formatted text file containing gene_id <-> gene_name correspondence

  • t2g_3col: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence

  • t2g: Path to a TSV formatted text file containing the transcript_id <-> gene_name

  • g2g: Path to a TSV formatted text file containing the gene_id <-> gene_name

Params

  • extra: Optional parameters to be passed to pyroe

Authors

Code

__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"


from tempfile import TemporaryDirectory
from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")

spliced = snakemake.input.get("spliced", "")
if spliced:
    spliced = "--extra-spliced " + spliced


unspliced = snakemake.input.get("unspliced", "")
if unspliced:
    unspliced = "--extra-unspliced " + unspliced


with TemporaryDirectory() as tempdir:
    shell(
        "pyroe make-spliced+unspliced "
        "{extra} {spliced} "
        "{unspliced} "
        "{snakemake.input.fasta} "
        "{snakemake.input.gtf} "
        "{tempdir} "
        "{log}"
    )

    if snakemake.output.get("fasta", False):
        shell("mv --verbose {tempdir}/spliceu.fa {snakemake.output.fasta} {log}")

    if snakemake.output.get("gene_id_to_name", False):
        shell(
            "mv --verbose "
            "{tempdir}/gene_id_to_name.tsv "
            "{snakemake.output.gene_id_to_name} {log}"
        )

    if snakemake.output.get("t2g_3col", False):
        shell(
            "mv --verbose "
            "{tempdir}/spliceu_t2g_3col.tsv "
            "{snakemake.output.t2g_3col} {log} "
        )

    if snakemake.output.get("t2g", False):
        shell("mv --verbose {tempdir}/spliceu_t2g.tsv {snakemake.output.t2g} {log} ")

    if snakemake.output.get("g2g", False):
        shell("mv --verbose {tempdir}/spliceu_g2g.tsv {snakemake.output.g2g} {log} ")