TRANSDECODER PREDICT

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/transdecoder/predict?label=version%20update%20pull%20requests

Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.

Example

This wrapper can be used in the following way:

rule transdecoder_predict:
    input:
        fasta="test.fa.gz", # required input; optionally gzipped
        pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
        blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
        # you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
        #longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
    output:
        "test.fa.transdecoder.bed",
        "test.fa.transdecoder.cds",
        "test.fa.transdecoder.pep",
        "test.fa.transdecoder.gff3"
    log:
        "logs/transdecoder/test-predict.log"
    params:
        extra=""
    wrapper:
        "v4.6.0/bio/transdecoder/predict"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • transdecoder=5.7.1

Input/Output

Input:

  • fasta assembly

Output:

  • candidate coding regions (pep, cds, gff3, bed output formats)

Authors

    1. Tessa Pierce

Code

"""Snakemake wrapper for Transdecoder Predict"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
    addl_outputs += " --retain_pfam_hits " + pfam

blast = snakemake.input.get("blastp_hits", "")
if blast:
    addl_outputs += " --retain_blastp_hits " + blast

input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
    input_fa = input_fasta.rsplit(".gz")[0]
    shell("gunzip -c {input_fasta} > {input_fa}")
else:
    input_fa = input_fasta

shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")