LASTAL

LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads)

Software dependencies

  • last=874

Example

This wrapper can be used in the following way:

rule lastal_nucl_x_nucl:
    input:
        data="test-transcript.fa",
        lastdb="test-transcript.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-transcript.maf",
        #tab="test-transcript.tab",
        #blasttab="test-transcript.blasttab",
        #blasttabplus="test-transcript.blasttabplus",
    params:
        #Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html
        D_length=1000000,
        extra=""
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.65.0/bio/last/lastal"

rule lastal_nucl_x_prot:
    input:
        data="test-transcript.fa",
        lastdb="test-protein.fa.prj"
    output:
        # only one of these outputs is allowed
        maf="test-tr-x-prot.maf"
        #tab="test-tr-x-prot.tab",
        #blasttab="test-tr-x-prot.blasttab",
        #blasttabplus="test-tr-x-prot.blasttabplus",
    params:
        frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.)
        extra="",
    log:
        "logs/lastal/test.log"
    threads: 8
    wrapper:
        "0.65.0/bio/last/lastal"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

    1. Tessa Pierce

Code

""" Snakemake wrapper for lastal """

__author__ = "N. Tessa Pierce"
__copyright__ = "Copyright 2019, N. Tessa Pierce"
__email__ = "ntpierce@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

# http://last.cbrc.jp/doc/last-evalues.html
d_len = float(snakemake.params.get("D_length", 1000000))  # last default

# set output file formats
maf_out = snakemake.output.get("maf", "")
tab_out = snakemake.output.get("tab", "")
btab_out = snakemake.output.get("blasttab", "")
btabplus_out = snakemake.output.get("blasttabplus", "")
outfiles = [maf_out, tab_out, btab_out, btabplus_out]
# TAB, MAF, BlastTab, BlastTab+ (default=MAF)
assert (
    list(map(bool, outfiles)).count(True) == 1
), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)"

out_cmd = ""

if maf_out:
    out_cmd = "-f {}".format("MAF")
    outF = maf_out
elif tab_out:
    out_cmd = "-f {}".format("TAB")
    outF = tab_out
if btab_out:
    out_cmd = "-f {}".format("BlastTab")
    outF = btab_out
if btabplus_out:
    out_cmd = "-f {}".format("BlastTab+")
    outF = btabplus_out

frameshift_cost = snakemake.params.get("frameshift_cost", "")
if frameshift_cost:
    f_cmd = f"-F {frameshift_cost}"


lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0]

shell(
    "lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}"
)