MLST

Scan contig files against traditional PubMLST typing schemes

URL:

Example

This wrapper can be used in the following way:

rule run_mlst:
    input:
        #Input assembly
        assembly="{sample}.fasta",
    output:
        #Tab delimited mlst designation
        mlst="{sample}_mlst.txt",
    params:
    #extra parameters should be space delimited
        # SYNOPSIS
        #   Automatic MLST calling from assembled contigs
        # USAGE
        #   % mlst --list                                            # list known schemes
        #   % mlst [options] <contigs.{fasta,gbk,embl}[.gz]          # auto-detect scheme
        #   % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
        # GENERAL
        #   --help            This help
        #   --version         Print version and exit(default ON)
        #   --check           Just check dependencies and exit (default OFF)
        #   --quiet           Quiet - no stderr output (default OFF)
        #   --threads [N]     Number of BLAST threads (suggest GNU Parallel instead) (default '1')
        #   --debug           Verbose debug output to stderr (default OFF)
        # SCHEME
        #   --scheme [X]      Don't autodetect, force this scheme on all inputs (default '')
        #   --list            List available MLST scheme names (default OFF)
        #   --longlist        List allelles for all MLST schemes (default OFF)
        #   --exclude [X]     Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
        # OUTPUT
        #   --csv             Output CSV instead of TSV (default OFF)
        #   --json [X]        Also write results to this file in JSON format (default '')
        #   --label [X]       Replace FILE with this name instead (default '')
        #   --nopath          Strip filename paths from FILE column (default OFF)
        #   --novel [X]       Save novel alleles to this FASTA file (default '')
        #   --legacy          Use old legacy output with allele header row (requires --scheme) (default OFF)
        # SCORING
        #   --minid [n.n]     DNA %identity of full allelle to consider 'similar' [~] (default '95')
        #   --mincov [n.n]    DNA %cov to report partial allele at all [?] (default '10')
        #   --minscore [n.n]  Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
        # PATHS
        #   --blastdb [X]     BLAST database
        #   --datadir [X]     PubMLST data
        # HOMEPAGE
        #   https://github.com/tseemann/mlst - Torsten Seemann
        extra="--nopath",
    log:
        "logs/{sample}.mlst.log",
    threads: 1
    wrapper:
        "v1.2.0/bio/mlst"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • mlst=2.19

Input/Output

Input:

  • Genomic assembly (fasta format)

Output:

  • Returns a tab-separated line containing the filename, matching PubMLST scheme name, ST (sequence type) and the allele IDs. Other output formats are also available (eg. CSV, JSON)

Notes

Authors

Code

__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

shell(
    "mlst"
    " {snakemake.params.extra}"
    " {snakemake.input.assembly}"
    " > {snakemake.output.mlst}"
    " {log}"
)