GTFTOGENEPRED

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/ucsc/gtfToGenePred?label=version%20update%20pull%20requests

Convert a GTF file to genePred format (see https://genome.ucsc.edu/FAQ/FAQformat.html#format9)

URL: https://hgdownload.cse.ucsc.edu/admin/exe/

Example

This wrapper can be used in the following way:

rule gtfToGenePred:
    input:
        # annotations containing gene, transcript, exon, etc. data in GTF format
        "annotation.gtf",
    output:
        "annotation.genePred",
    log:
        "logs/gtfToGenePred.log",
    params:
        extra="-genePredExt",  # optional parameters to pass to gtfToGenePred
    wrapper:
        "v5.0.0/bio/ucsc/gtfToGenePred"


rule gtfToGenePred_CollectRnaSeqMetrics:
    input:
        # annotations containing gene, transcript, exon, etc. data in GTF format
        "annotation.gtf",
    output:
        "annotation.PicardCollectRnaSeqMetrics.genePred",
    log:
        "logs/gtfToGenePred.PicardCollectRnaSeqMetrics.log",
    params:
        convert_out="PicardCollectRnaSeqMetrics",
        extra="-genePredExt -geneNameAsName2",  # optional parameters to pass to gtfToGenePred
    wrapper:
        "v5.0.0/bio/ucsc/gtfToGenePred"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • The extra param allows for additional program arguments.

  • The convert_out param allows to apply some conversions to the refFlat output. For example, if set to PicardCollectRnaSeqMetrics it makes it compatible with Picard CollectRnaSeqMetrics (this one also requires extra to be set to -genePredExt -geneNameAsName2).

Software dependencies

  • ucsc-gtftogenepred=469

  • csvkit=2.0.1

Input/Output

Input:

  • GTF file

Output:

  • genePred table

Authors

  • Brett Copeland

  • Filipe G. Vieira

Code

__author__ = "Brett Copeland"
__copyright__ = "Copyright 2021, Brett Copeland"
__email__ = "brcopeland@ucsd.edu"
__license__ = "MIT"

import os
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
convert_out = snakemake.params.get("convert_out", "raw")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)


pipes = ""
if convert_out == "raw":
    pipes = ""
elif convert_out == "PicardCollectRnaSeqMetrics":
    pipes += " | csvcut -t -c 12,1-10 | csvformat -T"
else:
    raise ValueError(
        f"Unsupported conversion mode {convert_out}. Please check wrapper documentation."
    )


shell(
    "(gtfToGenePred {extra} {snakemake.input} /dev/stdout {pipes} > {snakemake.output}) {log}"
)