MLST¶
Scan contig files against traditional PubMLST typing schemes
Example¶
This wrapper can be used in the following way:
rule run_mlst:
input:
#Input assembly
assembly="{sample}.fasta",
output:
#Tab delimited mlst designation
mlst="{sample}_mlst.txt",
params:
#extra parameters should be space delimited
# SYNOPSIS
# Automatic MLST calling from assembled contigs
# USAGE
# % mlst --list # list known schemes
# % mlst [options] <contigs.{fasta,gbk,embl}[.gz] # auto-detect scheme
# % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme
# GENERAL
# --help This help
# --version Print version and exit(default ON)
# --check Just check dependencies and exit (default OFF)
# --quiet Quiet - no stderr output (default OFF)
# --threads [N] Number of BLAST threads (suggest GNU Parallel instead) (default '1')
# --debug Verbose debug output to stderr (default OFF)
# SCHEME
# --scheme [X] Don't autodetect, force this scheme on all inputs (default '')
# --list List available MLST scheme names (default OFF)
# --longlist List allelles for all MLST schemes (default OFF)
# --exclude [X] Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')
# OUTPUT
# --csv Output CSV instead of TSV (default OFF)
# --json [X] Also write results to this file in JSON format (default '')
# --label [X] Replace FILE with this name instead (default '')
# --nopath Strip filename paths from FILE column (default OFF)
# --novel [X] Save novel alleles to this FASTA file (default '')
# --legacy Use old legacy output with allele header row (requires --scheme) (default OFF)
# SCORING
# --minid [n.n] DNA %identity of full allelle to consider 'similar' [~] (default '95')
# --mincov [n.n] DNA %cov to report partial allele at all [?] (default '10')
# --minscore [n.n] Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')
# PATHS
# --blastdb [X] BLAST database
# --datadir [X] PubMLST data
# HOMEPAGE
# https://github.com/tseemann/mlst - Torsten Seemann
extra="--nopath",
log:
"logs/{sample}.mlst.log",
threads: 1
wrapper:
"v2.6.0-35-g755343f/bio/mlst"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes¶
- The extra param allows for additional program arguments.
- For more inforamtion see https://github.com/tseemann/mlst
Software dependencies¶
mlst=2.23.0
Input/Output¶
Input:
- Genomic assembly (fasta format)
Output:
- Returns a tab-separated line containing the filename, matching PubMLST scheme name, ST (sequence type) and the allele IDs. Other output formats are also available (eg. CSV, JSON)
Authors¶
- Torsten Seeman (mlst tool) - https://github.com/tseemann/mlst
- Max Cummins (Snakemake wrapper [unaffiliated with Torsten Seeman])
Code¶
__author__ = "Max Cummins"
__copyright__ = "Copyright 2021, Max Cummins"
__email__ = "max.l.cummins@gmail.com"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
shell(
"mlst"
" {snakemake.params.extra}"
" {snakemake.input.assembly}"
" > {snakemake.output.mlst}"
" {log}"
)