GTFTOGENEPRED#
Convert a GTF file to genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9)
URL: https://hgdownload.cse.ucsc.edu/admin/exe/
Example#
This wrapper can be used in the following way:
rule gtfToGenePred:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.genePred",
log:
"logs/gtfToGenePred.log",
params:
extra="-genePredExt", # optional parameters to pass to gtfToGenePred
wrapper:
"v3.0.2-2-g0dea6a1/bio/ucsc/gtfToGenePred"
rule gtfToGenePred_CollectRnaSeqMetrics:
input:
# annotations containing gene, transcript, exon, etc. data in GTF format
"annotation.gtf",
output:
"annotation.PicardCollectRnaSeqMetrics.genePred",
log:
"logs/gtfToGenePred.PicardCollectRnaSeqMetrics.log",
params:
convert_out="PicardCollectRnaSeqMetrics",
extra="-genePredExt -geneNameAsName2", # optional parameters to pass to gtfToGenePred
wrapper:
"v3.0.2-2-g0dea6a1/bio/ucsc/gtfToGenePred"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes#
The extra param allows for additional program arguments.
The convert_out param allows to apply some conversions to the refFlat output. For example, if set to PicardCollectRnaSeqMetrics it makes it compatible with Picard CollectRnaSeqMetrics (this one also requires extra to be set to -genePredExt -geneNameAsName2).
Software dependencies#
ucsc-gtftogenepred=447
csvkit=1.3.0
Input/Output#
Input:
GTF file
Output:
genePred table
Code#
__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"
import os
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
convert_out = snakemake.params.get("convert_out", "raw")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
pipes = ""
if convert_out == "raw":
pipes = ""
elif convert_out == "PicardCollectRnaSeqMetrics":
pipes += " | csvcut -t -c 12,1-10 | csvformat -T"
else:
raise ValueError(
f"Unsupported conversion mode {convert_out}. Please check wrapper documentation."
)
shell(
"(gtfToGenePred {extra} {snakemake.input} /dev/stdout {pipes} > {snakemake.output}) {log}"
)