.. _`bio/last/lastal`: LASTAL ====== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/last/lastal?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/last/lastal LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads) Example ------- This wrapper can be used in the following way: .. code-block:: python rule lastal_nucl_x_nucl: input: data="test-transcript.fa", lastdb="test-transcript.fa.prj" output: # only one of these outputs is allowed maf="test-transcript.maf", #tab="test-transcript.tab", #blasttab="test-transcript.blasttab", #blasttabplus="test-transcript.blasttabplus", params: #Report alignments that are expected by chance at most once per LENGTH query letters. By default, LAST reports alignments that are expected by chance at most once per million query letters (for a given database). http://last.cbrc.jp/doc/last-evalues.html D_length=1000000, extra="" log: "logs/lastal/test.log" threads: 8 wrapper: "v3.0.1/bio/last/lastal" rule lastal_nucl_x_prot: input: data="test-transcript.fa", lastdb="test-protein.fa.prj" output: # only one of these outputs is allowed maf="test-tr-x-prot.maf" #tab="test-tr-x-prot.tab", #blasttab="test-tr-x-prot.blasttab", #blasttabplus="test-tr-x-prot.blasttabplus", params: frameshift_cost=15, #Align DNA queries to protein reference sequences using specified frameshift cost. 15 is reasonable. Special case, -F0 means DNA-versus-protein alignment without frameshifts, which is faster.) extra="", log: "logs/lastal/test.log" threads: 8 wrapper: "v3.0.1/bio/last/lastal" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``last=1513`` Authors ------- * N. Tessa Pierce Code ---- .. code-block:: python """ Snakemake wrapper for lastal """ __author__ = "N. Tessa Pierce" __copyright__ = "Copyright 2019, N. Tessa Pierce" __email__ = "ntpierce@gmail.com" __license__ = "MIT" from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=False, stderr=True) # http://last.cbrc.jp/doc/last-evalues.html d_len = float(snakemake.params.get("D_length", 1000000)) # last default # set output file formats maf_out = snakemake.output.get("maf", "") tab_out = snakemake.output.get("tab", "") btab_out = snakemake.output.get("blasttab", "") btabplus_out = snakemake.output.get("blasttabplus", "") outfiles = [maf_out, tab_out, btab_out, btabplus_out] # TAB, MAF, BlastTab, BlastTab+ (default=MAF) assert ( list(map(bool, outfiles)).count(True) == 1 ), "please specify ONE output file using one of: 'maf', 'tab', 'blasttab', or 'blasttabplus' keywords in the output field)" out_cmd = "" if maf_out: out_cmd = "-f {}".format("MAF") outF = maf_out elif tab_out: out_cmd = "-f {}".format("TAB") outF = tab_out if btab_out: out_cmd = "-f {}".format("BlastTab") outF = btab_out if btabplus_out: out_cmd = "-f {}".format("BlastTab+") outF = btabplus_out frameshift_cost = snakemake.params.get("frameshift_cost", "") if frameshift_cost: f_cmd = f"-F {frameshift_cost}" lastdb_name = str(snakemake.input["lastdb"]).rsplit(".", 1)[0] shell( "lastal -D {d_len} -P {snakemake.threads} {extra} {lastdb_name} {snakemake.input.data} > {outF} {log}" ) .. |nl| raw:: html