BLAST BLASTN¶
Blastn
performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.
Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.
Alignment view options Formatting
output option
Format
specifiers
Pairwise 0 Query-anchored showing identities 1 Query-anchored no identities 2 Flat query-anchored showing identities 3 Flat query-anchored no identities 4 BLAST XML 5 Tabular 6 available Tabular with comment lines 7 available Seqalign (Text ASN.1) 8 Seqalign (Binary ASN.1) 9 Comma-separated values 10 available BLAST archive (ASN.1) 11 Seqalign (JSON) 12 Multiple-file BLAST JSON 13 Multiple-file BLAST XML2 14 Single-file BLAST JSON 15 Single-file BLAST XML2 16 Sequence Alignment/Map (SAM) 17 Organism Report 18
Specifiers for formatting option 6,7 and 10:
Format
specifiers
qseqid Query Seq-id qgi Query GI qacc Query accesion qaccver Query accesion.version qlen Query sequence length sseqid Subject Seq-id sallseqid All subject Seq-id(s), separated by a ‘;’ sgi Subject GI sallgi All subject GIs sacc Subject accession saccver Subject accession.version sallacc All subject accessions slen Subject sequence length qstart Start of alignment in query qend End of alignment in query sstart Start of alignment in subject send End of alignment in subject qseq Aligned part of query sequence sseq Aligned part of subject sequence evalue Expect value bitscore Bit score score Raw score length Alignment length pident Percentage of identical matches nident Number of identical matches mismatch Number of mismatches positive Number of positive-scoring matches gapopen Number of gap openings gaps Total number of gaps ppos Percentage of positive-scoring matches frames Query and subject frames separated by a ‘/’ qframe Query frame sframe Subject frame btop Blast traceback operations (BTOP) staxid Subject Taxonomy ID ssciname Subject Scientific Name scomname Subject Common Name sblastname Subject Blast Name sskingdom Subject Super Kingdom staxids unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order) sscinames unique Subject Scientific Name(s), separated by a ‘;’ scomnames unique Subject Common Name(s), separated by a ‘;’ sblastnames unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order) sskingdoms unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order) stitle Subject Title salltitles All Subject Title(s), separated by a ‘<>’ sstrand Subject Strand qcovs Query Coverage Per Subject qcovhsp Query Coverage Per HSP qcovus Query Coverage Per Unique Subject (blastn only)
Example¶
This wrapper can be used in the following way:
rule blast_nucleotide:
input:
query = "{sample}.fasta",
blastdb=multiext("blastdb/blastdb",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
output:
"{sample}.blast.txt"
log:
"logs/{sample}.blast.log"
threads:
2
params:
# Usable options and specifiers for the different output formats are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
format="6 qseqid sseqid evalue",
extra=""
wrapper:
"v1.21.1/bio/blast/blastn"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
blast=2.13.0
Input/Output¶
Input:
- FASTA file OR
- bare sequence file (more information) OR
- identifiers (more information)
Output:
- depending on the formatting option, different output files can be generated (see tables above)
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]
if format:
out_format = " -outfmt '{}'".format(format)
shell(
"blastn"
" -query {snakemake.input.query}"
" {out_format}"
" {snakemake.params.extra}"
" -db {db_name}"
" -num_threads {snakemake.threads}"
" -out {snakemake.output[0]}"
)