BLAST BLASTN

Blastn performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.

Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.

Alignment view options

Formatting

output option

Format

specifiers

Pairwise

0

Query-anchored showing identities

1

Query-anchored no identities

2

Flat query-anchored showing identities

3

Flat query-anchored no identities

4

BLAST XML

5

Tabular

6

available

Tabular with comment lines

7

available

Seqalign (Text ASN.1)

8

Seqalign (Binary ASN.1)

9

Comma-separated values

10

available

BLAST archive (ASN.1)

11

Seqalign (JSON)

12

Multiple-file BLAST JSON

13

Multiple-file BLAST XML2

14

Single-file BLAST JSON

15

Single-file BLAST XML2

16

Sequence Alignment/Map (SAM)

17

Organism Report

18

Alignment view options	Formatting output option	Format specifiers
Pairwise	0
Query-anchored showing identities	1
Query-anchored no identities	2
Flat query-anchored showing identities	3
Flat query-anchored no identities	4
BLAST XML	5
Tabular	6	available
Tabular with comment lines	7	available
Seqalign (Text ASN.1)	8
Seqalign (Binary ASN.1)	9
Comma-separated values	10	available
BLAST archive (ASN.1)	11
Seqalign (JSON)	12
Multiple-file BLAST JSON	13
Multiple-file BLAST XML2	14
Single-file BLAST JSON	15
Single-file BLAST XML2	16
Sequence Alignment/Map (SAM)	17
Organism Report	18

Specifiers for formatting option 6,7 and 10:

Format

specifiers

qseqid

Query Seq-id

qgi

Query GI

qacc

Query accesion

qaccver

Query accesion.version

qlen

Query sequence length

sseqid

Subject Seq-id

sallseqid

All subject Seq-id(s), separated by a ‘;’

sgi

Subject GI

sallgi

All subject GIs

sacc

Subject accession

saccver

Subject accession.version

sallacc

All subject accessions

slen

Subject sequence length

qstart

Start of alignment in query

qend

End of alignment in query

sstart

Start of alignment in subject

send

End of alignment in subject

qseq

Aligned part of query sequence

sseq

Aligned part of subject sequence

evalue

Expect value

bitscore

Bit score

score

Raw score

length

Alignment length

pident

Percentage of identical matches

nident

Number of identical matches

mismatch

Number of mismatches

positive

Number of positive-scoring matches

gapopen

Number of gap openings

gaps

Total number of gaps

ppos

Percentage of positive-scoring matches

frames

Query and subject frames separated by a ‘/’

qframe

Query frame

sframe

Subject frame

btop

Blast traceback operations (BTOP)

staxid

Subject Taxonomy ID

ssciname

Subject Scientific Name

scomname

Subject Common Name

sblastname

Subject Blast Name

sskingdom

Subject Super Kingdom

staxids

unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)

sscinames

unique Subject Scientific Name(s), separated by a ‘;’

scomnames

unique Subject Common Name(s), separated by a ‘;’

sblastnames

unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)

sskingdoms

unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)

stitle

Subject Title

salltitles

All Subject Title(s), separated by a ‘<>’

sstrand

Subject Strand

qcovs

Query Coverage Per Subject

qcovhsp

Query Coverage Per HSP

qcovus

Query Coverage Per Unique Subject (blastn only)

Format specifiers
qseqid	Query Seq-id
qgi	Query GI
qacc	Query accesion
qaccver	Query accesion.version
qlen	Query sequence length
sseqid	Subject Seq-id
sallseqid	All subject Seq-id(s), separated by a ‘;’
sgi	Subject GI
sallgi	All subject GIs
sacc	Subject accession
saccver	Subject accession.version
sallacc	All subject accessions
slen	Subject sequence length
qstart	Start of alignment in query
qend	End of alignment in query
sstart	Start of alignment in subject
send	End of alignment in subject
qseq	Aligned part of query sequence
sseq	Aligned part of subject sequence
evalue	Expect value
bitscore	Bit score
score	Raw score
length	Alignment length
pident	Percentage of identical matches
nident	Number of identical matches
mismatch	Number of mismatches
positive	Number of positive-scoring matches
gapopen	Number of gap openings
gaps	Total number of gaps
ppos	Percentage of positive-scoring matches
frames	Query and subject frames separated by a ‘/’
qframe	Query frame
sframe	Subject frame
btop	Blast traceback operations (BTOP)
staxid	Subject Taxonomy ID
ssciname	Subject Scientific Name
scomname	Subject Common Name
sblastname	Subject Blast Name
sskingdom	Subject Super Kingdom
staxids	unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)
sscinames	unique Subject Scientific Name(s), separated by a ‘;’
scomnames	unique Subject Common Name(s), separated by a ‘;’
sblastnames	unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)
sskingdoms	unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)
stitle	Subject Title
salltitles	All Subject Title(s), separated by a ‘<>’
sstrand	Subject Strand
qcovs	Query Coverage Per Subject
qcovhsp	Query Coverage Per HSP
qcovus	Query Coverage Per Unique Subject (blastn only)

URL: https://blast.ncbi.nlm.nih.gov/

Example

This wrapper can be used in the following way:

rule blast_nucleotide:
    input:
        query = "{sample}.fasta",
        blastdb=multiext("blastdb/blastdb",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    output:
        "{sample}.blast.txt"
    log:
        "logs/{sample}.blast.log"
    threads:
        2
    params:
        # Usable options and specifiers for the different output formats are listed here:
        # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
        format="6 qseqid sseqid evalue",
        extra=""
    wrapper:
        "v3.8.0-1-g149ef14/bio/blast/blastn"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

blast=2.15.0

Input/Output

Input:

query: FASTA file OR bare sequence file (more information) OR identifiers (more information)
blastdb: Path to blast database

Output:

Path to result file depending on the formatting option, different output files can be generated (see tables above)

Params

extra: Optional parameters besides -query, -db, -num_threads and -out.

Authors

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]

if format:
    out_format = " -outfmt '{}'".format(format)

shell(
    "blastn"
    " -query {snakemake.input.query}"
    " {out_format}"
    " {snakemake.params.extra}"
    " -db {db_name}"
    " -num_threads {snakemake.threads}"
    " -out {snakemake.output[0]}"
)