BLAST BLASTN
Blastn
performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.
Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.
Alignment view options
Formatting
output option
Format
specifiers
Pairwise
0
Query-anchored showing identities
1
Query-anchored no identities
2
Flat query-anchored showing identities
3
Flat query-anchored no identities
4
BLAST XML
5
Tabular
6
available
Tabular with comment lines
7
available
Seqalign (Text ASN.1)
8
Seqalign (Binary ASN.1)
9
Comma-separated values
10
available
BLAST archive (ASN.1)
11
Seqalign (JSON)
12
Multiple-file BLAST JSON
13
Multiple-file BLAST XML2
14
Single-file BLAST JSON
15
Single-file BLAST XML2
16
Sequence Alignment/Map (SAM)
17
Organism Report
18
Specifiers for formatting option 6,7 and 10:
Format
specifiers
qseqid
Query Seq-id
qgi
Query GI
qacc
Query accesion
qaccver
Query accesion.version
qlen
Query sequence length
sseqid
Subject Seq-id
sallseqid
All subject Seq-id(s), separated by a ‘;’
sgi
Subject GI
sallgi
All subject GIs
sacc
Subject accession
saccver
Subject accession.version
sallacc
All subject accessions
slen
Subject sequence length
qstart
Start of alignment in query
qend
End of alignment in query
sstart
Start of alignment in subject
send
End of alignment in subject
qseq
Aligned part of query sequence
sseq
Aligned part of subject sequence
evalue
Expect value
bitscore
Bit score
score
Raw score
length
Alignment length
pident
Percentage of identical matches
nident
Number of identical matches
mismatch
Number of mismatches
positive
Number of positive-scoring matches
gapopen
Number of gap openings
gaps
Total number of gaps
ppos
Percentage of positive-scoring matches
frames
Query and subject frames separated by a ‘/’
qframe
Query frame
sframe
Subject frame
btop
Blast traceback operations (BTOP)
staxid
Subject Taxonomy ID
ssciname
Subject Scientific Name
scomname
Subject Common Name
sblastname
Subject Blast Name
sskingdom
Subject Super Kingdom
staxids
unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)
sscinames
unique Subject Scientific Name(s), separated by a ‘;’
scomnames
unique Subject Common Name(s), separated by a ‘;’
sblastnames
unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)
sskingdoms
unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)
stitle
Subject Title
salltitles
All Subject Title(s), separated by a ‘<>’
sstrand
Subject Strand
qcovs
Query Coverage Per Subject
qcovhsp
Query Coverage Per HSP
qcovus
Query Coverage Per Unique Subject (blastn only)
URL: https://blast.ncbi.nlm.nih.gov/
Example
This wrapper can be used in the following way:
rule blast_nucleotide:
input:
query = "{sample}.fasta",
blastdb=multiext("blastdb/blastdb",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
output:
"{sample}.blast.txt"
log:
"logs/{sample}.blast.log"
threads:
2
params:
# Usable options and specifiers for the different output formats are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
format="6 qseqid sseqid evalue",
extra=""
wrapper:
"v4.6.0-24-g250dd3e/bio/blast/blastn"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
blast=2.16.0
Input/Output
Input:
query
: FASTA file OR bare sequence file (more information) OR identifiers (more information)blastdb
: Path to blast database
Output:
Path to result file depending on the formatting option, different output files can be generated (see tables above)
Params
extra
: Optional parameters besides -query, -db, -num_threads and -out.
Code
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]
if format:
out_format = " -outfmt '{}'".format(format)
shell(
"blastn"
" -query {snakemake.input.query}"
" {out_format}"
" {snakemake.params.extra}"
" -db {db_name}"
" -num_threads {snakemake.threads}"
" -out {snakemake.output[0]}"
)