GTFTOGENEPRED
Convert a GTF file to genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9)
URL: https://hgdownload.cse.ucsc.edu/admin/exe/
Example
This wrapper can be used in the following way:
rule gtfToGenePred:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.genePred",
log:
"logs/gtfToGenePred.log",
params:
extra="-genePredExt", # optional parameters to pass to gtfToGenePred
wrapper:
"v4.6.0/bio/ucsc/gtfToGenePred"
rule gtfToGenePred_CollectRnaSeqMetrics:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.PicardCollectRnaSeqMetrics.genePred",
log:
"logs/gtfToGenePred.PicardCollectRnaSeqMetrics.log",
params:
convert_out="PicardCollectRnaSeqMetrics",
extra="-genePredExt -geneNameAsName2", # optional parameters to pass to gtfToGenePred
wrapper:
"v4.6.0/bio/ucsc/gtfToGenePred"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
The extra param allows for additional program arguments.
The convert_out param allows to apply some conversions to the refFlat output. For example, if set to PicardCollectRnaSeqMetrics it makes it compatible with Picard CollectRnaSeqMetrics (this one also requires extra to be set to -genePredExt -geneNameAsName2).
Software dependencies
ucsc-gtftogenepred=469
csvkit=2.0.1
Input/Output
Input:
GTF file
Output:
genePred table
Code
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
convert_out = snakemake.params.get("convert_out", "raw")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
pipes = ""
if convert_out == "raw":
pipes = ""
elif convert_out == "PicardCollectRnaSeqMetrics":
pipes += " | csvcut -t -c 12,1-10 | csvformat -T"
else:
raise ValueError(
f"Unsupported conversion mode {convert_out}. Please check wrapper documentation."
)
shell(
"(gtfToGenePred {extra} {snakemake.input} /dev/stdout {pipes} > {snakemake.output}) {log}"
)