INFERNAL CMSCAN

cmscan is used to search sequences against collections of covariance models that have been prepared with cmpress. The output format is designed to be human- readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The –tblout option saves output in a simple tabular format that is concise and easier to parse. The -o option allows redirecting the main output, including throwing it away in /dev/null. Infernal (‘INFERence of RNA ALignment’) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Software dependencies

  • infernal=1.1.2

Example

This wrapper can be used in the following way:

rule cmscan_profile:
    input:
        fasta="test-transcript.fa",
        profile="test-covariance-model.cm.i1i"
    output:
        tblout="tr-infernal-tblout.txt",
    log:
        "logs/cmscan.log"
    params:
        evalue_threshold=10, # In the per-target output, report target sequences with an E-value of <= <x>. default=10.0 (on average, ~10 false positives reported per query)
        extra= "",
        #score_threshold=50, # Instead of thresholding per-CM output on E-value, report target sequences with a bit score of >= <x>.
    threads: 4
    wrapper:
        "0.65.0/bio/infernal/cmscan"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

    1. Tessa Pierce

Code

"""Snakemake wrapper for Infernal CMscan"""

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from os import path
from snakemake.shell import shell

profile = snakemake.input.get("profile")
profile = profile.rsplit(".i", 1)[0]

assert profile.endswith(".cm"), 'your profile file should end with ".cm"'

# direct output to file <f>, not stdout
out_cmd = ""
outfile = snakemake.output.get("outfile", "")
if outfile:
    out_cmd += " -o {} ".format(outfile)

# save parseable table of hits to file <s>
tblout = snakemake.output.get("tblout", "")
if tblout:
    out_cmd += " --tblout {} ".format(tblout)

## default params: enable evalue threshold. If bitscore thresh is provided, use that instead (both not allowed)

# report <= this evalue threshold in output
evalue_threshold = snakemake.params.get("evalue_threshold", 10)  # use cmscan default
# report >= this score threshold in output
score_threshold = snakemake.params.get("score_threshold", "")

if score_threshold:
    thresh_cmd = f" -T {float(score_threshold)} "
else:
    thresh_cmd = f" -E {float(evalue_threshold)} "

extra = snakemake.params.get("extra", "")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "cmscan {out_cmd} {thresh_cmd} {extra} --cpu {snakemake.threads} {profile} {snakemake.input.fasta} {log}"
)