.. _`bio/pyroe/makesplicedintronic`: PYROE MAKE-SPLICED+INTRONIC =========================== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/pyroe/makesplicedintronic?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/pyroe/makesplicedintronic Build splici reference files for Alevin-fry. The splici index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species. **URL**: https://pyroe.readthedocs.io/en/latest/building_splici_index.html#preparing-a-spliced-intronic-transcriptome-reference Example ------- This wrapper can be used in the following way: .. code-block:: python rule test_pyroe_makesplicedintronic: input: fasta="genome.fasta", gtf="annotation.gtf", spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA) unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA) output: fasta="splici_full/spliced_intronic_sequences.fasta", gene_id_to_name="splici_full/gene_id_to_name.tsv", t2g="splici_full/t2g.tsv", threads: 1 log: "logs/pyroe.log", params: read_length=91, # Required flank_trim_length=5, # Optional l extra="", # Optional parameters wrapper: "v3.0.1/bio/pyroe/makesplicedintronic" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``pyroe=0.9.3`` * ``bedtools=2.31.1`` Input/Output ------------ **Input:** * ``gtf``: Path to the genome annotation (GTF formatted) * ``fasta``: Path to the genome sequence (Fasta formatted) * ``spliced``: Optional path to additional spliced sequences (Fasta formatted) * ``unspliced``: Optional path to unspliced sequences (Fasta formatted) **Output:** * ``fasta``: Path to spliced+intronic sequences (Fasta formatted) * ``gene_id_to_name``: Path to a TSV formatted text file containing gene_id <-> gene_name correspondence * ``t2g``: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence Params ------ * ``read_length``: The read length of the single-cell experiment being processed (determines flank size). Default is 100. * ``extra``: Optional parameters to be passed to pyroe Authors ------- Code ---- .. code-block:: python __author__ = "Thibault Dayris" __copyright__ = "Copyright 2023, Thibault Dayris" __email__ = "thibault.dayris@gustaveroussy.fr" __license__ = "MIT" from tempfile import TemporaryDirectory from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True) extra = snakemake.params.get("extra", "") # pyroe uses the flank-length value to name its output files # in the result directory. We need this value to acquired output # files and let snakemake-wrapper choose its output file names. read_length = snakemake.params.get("read_length", 101) flank_trim_length = snakemake.params.get("flank_trim_length", 5) flank_length = read_length - flank_trim_length spliced = snakemake.input.get("spliced", "") if spliced: spliced = "--extra-spliced " + spliced unspliced = snakemake.input.get("unspliced", "") if unspliced: unspliced = "--extra-unspliced " + unspliced with TemporaryDirectory() as tempdir: shell( "pyroe make-spliced+intronic " "{extra} {spliced} " "{unspliced} " "{snakemake.input.fasta} " "{snakemake.input.gtf} " "{read_length} " "{tempdir} " "{log}" ) if snakemake.output.get("fasta", False): shell( "mv --verbose " "{tempdir}/splici_fl{flank_length}.fa " "{snakemake.output.fasta} {log}" ) if snakemake.output.get("gene_id_to_name", False): shell( "mv --verbose " "{tempdir}/gene_id_to_name.tsv " "{snakemake.output.gene_id_to_name} {log}" ) if snakemake.output.get("t2g", False): shell( "mv --verbose " "{tempdir}/splici_fl{flank_length}_t2g_3col.tsv " "{snakemake.output.t2g} {log} " ) .. |nl| raw:: html