TRANSDECODER LONGORFS¶
TransDecoder.LongOrfs will identify coding regions within transcript sequences (ORFs) that are at least 100 amino acids long. You can lower this via the ‘-m’ parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.
Example¶
This wrapper can be used in the following way:
rule transdecoder_longorfs:
input:
fasta="test.fa.gz", # required
gene_trans_map="test.gtm" # optional gene-to-transcript identifier mapping file (tab-delimited, gene_id<tab>trans_id<return> )
output:
"test.fa.transdecoder_dir/longest_orfs.pep"
log:
"logs/transdecoder/test-longorfs.log"
params:
extra=""
wrapper:
"v2.6.0-35-g755343f/bio/transdecoder/longorfs"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
transdecoder=5.7.1
Authors¶
- Tessa Pierce
Code¶
"""Snakemake wrapper for Transdecoder LongOrfs"""
__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
gtm_cmd = ""
gtm = snakemake.input.get("gene_trans_map", "")
if gtm:
gtm_cmd = " --gene_trans_map " + gtm
output_dir = path.dirname(str(snakemake.output))
# transdecoder fails if output already exists. No force option available
shell("rm -rf {output_dir}")
input_fasta = str(snakemake.input.fasta)
if input_fasta.endswith("gz"):
input_fa = input_fasta.rsplit(".gz")[0]
shell("gunzip -c {input_fasta} > {input_fa}")
else:
input_fa = input_fasta
shell("TransDecoder.LongOrfs -t {input_fa} {gtm_cmd} {log}")