TRANSDECODER PREDICT
Predict the likely coding regions from the ORFs identified by Transdecoder.LongOrfs. Optionally include results from homology searches (blast/hmmer results) as ORF retention criteria.
Example
This wrapper can be used in the following way:
rule transdecoder_predict:
input:
fasta="test.fa.gz", # required input; optionally gzipped
pfam_hits="pfam_hits.txt", # optionally retain ORFs with hits by inputting pfam results here (run separately)
blastp_hits="blastp_hits.txt", # optionally retain ORFs with hits by inputting blastp results here (run separately)
# you may also want to add your transdecoder longorfs result here - predict will fail if you haven't first run longorfs
#longorfs="test.fa.transdecoder_dir/longest_orfs.pep"
output:
"test.fa.transdecoder.bed",
"test.fa.transdecoder.cds",
"test.fa.transdecoder.pep",
"test.fa.transdecoder.gff3"
log:
"logs/transdecoder/test-predict.log"
params:
extra=""
wrapper:
"v5.2.1/bio/transdecoder/predict"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
transdecoder=5.7.1
Input/Output
Input:
fasta assembly
Output:
candidate coding regions (pep, cds, gff3, bed output formats)
Code
"""Snakemake wrapper for Transdecoder Predict"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
addl_outputs = ""
pfam = snakemake.input.get("pfam_hits", "")
if pfam:
addl_outputs += " --retain_pfam_hits " + pfam
blast = snakemake.input.get("blastp_hits", "")
if blast:
addl_outputs += " --retain_blastp_hits " + blast
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.Predict -t {input_fa} {addl_outputs} {extra} {log}")