MLST¶

Scan contig files against traditional PubMLST typing schemes

URL:

Example¶

This wrapper can be used in the following way:

rule run_mlst:
    input:
        #Input assembly
        assembly="{sample}.fasta",
    output:
        #Tab delimited mlst designation
        mlst="{sample}_mlst.txt",
    params:
    #extra parameters should be space delimited
        # SYNOPSIS
        #   Automatic MLST calling from assembled contigs
        # USAGE
        #   % mlst --list                                            # list known schemes
        #   % mlst [options] <contigs.{fasta,gbk,embl}[.gz]          # auto-detect scheme
        #   % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
        # GENERAL
        #   --help            This help
        #   --version         Print version and exit(default ON)
        #   --check           Just check dependencies and exit (default OFF)
        #   --quiet           Quiet - no stderr output (default OFF)
        #   --threads [N]     Number of BLAST threads (suggest GNU Parallel instead) (default '1')
        #   --debug           Verbose debug output to stderr (default OFF)
        # SCHEME
        #   --scheme [X]      Don't autodetect, force this scheme on all inputs (default '')
        #   --list            List available MLST scheme names (default OFF)
        #   --longlist        List allelles for all MLST schemes (default OFF)
        #   --exclude [X]     Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
        # OUTPUT
        #   --csv             Output CSV instead of TSV (default OFF)
        #   --json [X]        Also write results to this file in JSON format (default '')
        #   --label [X]       Replace FILE with this name instead (default '')
        #   --nopath          Strip filename paths from FILE column (default OFF)
        #   --novel [X]       Save novel alleles to this FASTA file (default '')
        #   --legacy          Use old legacy output with allele header row (requires --scheme) (default OFF)
        # SCORING
        #   --minid [n.n]     DNA %identity of full allelle to consider 'similar' [~] (default '95')
        #   --mincov [n.n]    DNA %cov to report partial allele at all [?] (default '10')
        #   --minscore [n.n]  Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
        # PATHS
        #   --blastdb [X]     BLAST database
        #   --datadir [X]     PubMLST data
        # HOMEPAGE
        #   https://github.com/tseemann/mlst - Torsten Seemann
        extra="--nopath",
    log:
        "logs/{sample}.mlst.log",
    threads: 1
    wrapper:
        "v1.2.0/bio/mlst"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies¶

mlst=2.19

Input/Output¶

Input:

Genomic assembly (fasta format)

Output:

Returns a tab-separated line containing the filename, matching PubMLST scheme name, ST (sequence type) and the allele IDs. Other output formats are also available (eg. CSV, JSON)

Notes¶

The extra param allows for additional program arguments.
For more inforamtion see https://github.com/tseemann/mlst

Authors¶

Torsten Seeman (mlst tool) - https://github.com/tseemann/mlst
Max Cummins (Snakemake wrapper [unaffiliated with Torsten Seeman])

Code¶

__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "mlst"
    " {snakemake.params.extra}"
    " {snakemake.input.assembly}"
    " > {snakemake.output.mlst}"
    " {log}"
)