BLAST BLASTN

Blastn performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.

Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.

Alignment view options

Formatting

output option

Format

specifiers

Pairwise 0  
Query-anchored showing identities 1  
Query-anchored no identities 2  
Flat query-anchored showing identities 3  
Flat query-anchored no identities 4  
BLAST XML 5  
Tabular 6 available
Tabular with comment lines 7 available
Seqalign (Text ASN.1) 8  
Seqalign (Binary ASN.1) 9  
Comma-separated values 10 available
BLAST archive (ASN.1) 11  
Seqalign (JSON) 12  
Multiple-file BLAST JSON 13  
Multiple-file BLAST XML2 14  
Single-file BLAST JSON 15  
Single-file BLAST XML2 16  
Sequence Alignment/Map (SAM) 17  
Organism Report 18  

Specifiers for formatting option 6,7 and 10:

Format

specifiers

 
qseqid Query Seq-id
qgi Query GI
qacc Query accesion
qaccver Query accesion.version
qlen Query sequence length
sseqid Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ‘;’
sgi Subject GI
sallgi All subject GIs
sacc Subject accession
saccver Subject accession.version
sallacc All subject accessions
slen Subject sequence length
qstart Start of alignment in query
qend End of alignment in query
sstart Start of alignment in subject
send End of alignment in subject
qseq Aligned part of query sequence
sseq Aligned part of subject sequence
evalue Expect value
bitscore Bit score
score Raw score
length Alignment length
pident Percentage of identical matches
nident Number of identical matches
mismatch Number of mismatches
positive Number of positive-scoring matches
gapopen Number of gap openings
gaps Total number of gaps
ppos Percentage of positive-scoring matches
frames Query and subject frames separated by a ‘/’
qframe Query frame
sframe Subject frame
btop Blast traceback operations (BTOP)
staxid Subject Taxonomy ID
ssciname Subject Scientific Name
scomname Subject Common Name
sblastname Subject Blast Name
sskingdom Subject Super Kingdom
staxids unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)
sscinames unique Subject Scientific Name(s), separated by a ‘;’
scomnames unique Subject Common Name(s), separated by a ‘;’
sblastnames unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)
sskingdoms unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)
stitle Subject Title
salltitles All Subject Title(s), separated by a ‘<>’
sstrand Subject Strand
qcovs Query Coverage Per Subject
qcovhsp Query Coverage Per HSP
qcovus Query Coverage Per Unique Subject (blastn only)

Example

This wrapper can be used in the following way:

rule blast_nucleotide:
    input:
        query = "{sample}.fasta",
        blastdb=multiext("blastdb/blastdb",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    output:
        "{sample}.blast.txt"
    log:
        "logs/{sample}.blast.log"
    threads:
        2
    params:
        # Usable options and specifiers for the different output formats are listed here:
        # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
        format="6 qseqid sseqid evalue",
        extra=""
    wrapper:
        "v1.9.0/bio/blast/blastn"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • blast==2.11

Input/Output

Input:

Output:

  • depending on the formatting option, different output files can be generated (see tables above)

Authors

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]

if format:
    out_format = " -outfmt '{}'".format(format)

shell(
    "blastn"
    " -query {snakemake.input.query}"
    " {out_format}"
    " {snakemake.params.extra}"
    " -db {db_name}"
    " -num_threads {snakemake.threads}"
    " -out {snakemake.output[0]}"
)