PYROE MAKE-SPLICED+UNSPLICED
Build spliceu reference files for Alevin-fry. The spliceu (the spliced + unspliced) transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene.
Example
This wrapper can be used in the following way:
rule test_pyroe_makesplicedunspliced:
input:
fasta="genome.fasta",
gtf="annotation.gtf",
spliced="extra_spliced.fasta", # Optional path to additional spliced sequences (FASTA)
unspliced="extra_unspliced.fasta", # Optional path to additional unspliced sequences (FASTA)
output:
gene_id_to_name="gene_id_to_name.tsv",
fasta="spliceu.fa",
g2g="spliceu_g2g.tsv",
t2g_3col="spliceu_t2g_3col.tsv",
t2g="spliceu_t2g.tsv",
threads: 1
log:
"logs/pyroe.log",
params:
extra="", # Optional parameters
wrapper:
"v4.6.0-24-g250dd3e/bio/pyroe/makeunspliceunspliced/"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
pyroe=0.9.3
bedtools=2.31.1
Input/Output
Input:
gtf
: Path to the genome annotation (GTF formatted)fasta
: Path to the genome sequence (Fasta formatted)spliced
: Optional path to additional spliced sequences (Fasta formatted)unspliced
: Optional path to unspliced sequences (Fasta formatted)
Output:
fasta
: Path to spliced+unspliced sequences (Fasta formatted)gene_id_to_name
: Path to a TSV formatted text file containing gene_id <-> gene_name correspondencet2g_3col
: Path to a TSV formatted text file containing the transcript_id <-> gene_name <-> splicing status correspondencet2g
: Path to a TSV formatted text file containing the transcript_id <-> gene_nameg2g
: Path to a TSV formatted text file containing the gene_id <-> gene_name
Params
extra
: Optional parameters to be passed to pyroe
Code
__author__ = "Thibault Dayris"
__copyright__ = "Copyright 2023, Thibault Dayris"
__email__ = "thibault.dayris@gustaveroussy.fr"
__license__ = "MIT"
from tempfile import TemporaryDirectory
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
spliced = snakemake.input.get("spliced", "")
if spliced:
spliced = "--extra-spliced " + spliced
unspliced = snakemake.input.get("unspliced", "")
if unspliced:
unspliced = "--extra-unspliced " + unspliced
with TemporaryDirectory() as tempdir:
shell(
"pyroe make-spliced+unspliced "
"{extra} {spliced} "
"{unspliced} "
"{snakemake.input.fasta} "
"{snakemake.input.gtf} "
"{tempdir} "
"{log}"
)
if snakemake.output.get("fasta", False):
shell("mv --verbose {tempdir}/spliceu.fa {snakemake.output.fasta} {log}")
if snakemake.output.get("gene_id_to_name", False):
shell(
"mv --verbose "
"{tempdir}/gene_id_to_name.tsv "
"{snakemake.output.gene_id_to_name} {log}"
)
if snakemake.output.get("t2g_3col", False):
shell(
"mv --verbose "
"{tempdir}/spliceu_t2g_3col.tsv "
"{snakemake.output.t2g_3col} {log} "
)
if snakemake.output.get("t2g", False):
shell("mv --verbose {tempdir}/spliceu_t2g.tsv {snakemake.output.t2g} {log} ")
if snakemake.output.get("g2g", False):
shell("mv --verbose {tempdir}/spliceu_g2g.tsv {snakemake.output.g2g} {log} ")