PYROE MAKE-SPLICED+INTRONIC
Build splici reference files for Alevin-fry. The splici index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species.
Example
This wrapper can be used in the following way:
rule test_pyroe_makesplicedintronic:
input:
fasta="genome.fasta",
gtf="annotation.gtf",
spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA)
unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA)
output:
fasta="splici_full/spliced_intronic_sequences.fasta",
gene_id_to_name="splici_full/gene_id_to_name.tsv",
t2g="splici_full/t2g.tsv",
threads: 1
log:
"logs/pyroe.log",
params:
read_length=91, # Required
flank_trim_length=5, # Optional l
extra="", # Optional parameters
wrapper:
"v3.3.5/bio/pyroe/makesplicedintronic"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
pyroe=0.9.3
bedtools=2.31.1
Input/Output
Input:
gtf
: Path to the genome annotation (GTF formatted)fasta
: Path to the genome sequence (Fasta formatted)spliced
: Optional path to additional spliced sequences (Fasta formatted)unspliced
: Optional path to unspliced sequences (Fasta formatted)
Output:
fasta
: Path to spliced+intronic sequences (Fasta formatted)gene_id_to_name
: Path to a TSV formatted text file containing gene_id <-> gene_name correspondencet2g
: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondence
Params
read_length
: The read length of the single-cell experiment being processed (determines flank size). Default is 100.extra
: Optional parameters to be passed to pyroe
Code
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
# pyroe uses the flank-length value to name its output files
# in the result directory. We need this value to acquired output
# files and let snakemake-wrapper choose its output file names.
read_length = snakemake.params.get("read_length", 101)
flank_trim_length = snakemake.params.get("flank_trim_length", 5)
flank_length = read_length - flank_trim_length
spliced = snakemake.input.get("spliced", "")
if spliced:
spliced = "--extra-spliced " + spliced
unspliced = snakemake.input.get("unspliced", "")
if unspliced:
unspliced = "--extra-unspliced " + unspliced
with TemporaryDirectory() as tempdir:
shell(
"pyroe make-spliced+intronic "
"{extra} {spliced} "
"{unspliced} "
"{snakemake.input.fasta} "
"{snakemake.input.gtf} "
"{read_length} "
"{tempdir} "
"{log}"
)
if snakemake.output.get("fasta", False):
shell(
"mv --verbose "
"{tempdir}/splici_fl{flank_length}.fa "
"{snakemake.output.fasta} {log}"
)
if snakemake.output.get("gene_id_to_name", False):
shell(
"mv --verbose "
"{tempdir}/gene_id_to_name.tsv "
"{snakemake.output.gene_id_to_name} {log}"
)
if snakemake.output.get("t2g", False):
shell(
"mv --verbose "
"{tempdir}/splici_fl{flank_length}_t2g_3col.tsv "
"{snakemake.output.t2g} {log} "
)