BLAST BLASTN¶

Blastn performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.

Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.

Alignment view options
Formatting

output option

Format

specifiers

Pairwise 0

Query-anchored showing identities 1

Query-anchored no identities 2

Flat query-anchored showing identities 3

Flat query-anchored no identities 4

BLAST XML 5

Tabular 6 available

Tabular with comment lines 7 available

Seqalign (Text ASN.1) 8

Seqalign (Binary ASN.1) 9

Comma-separated values 10 available

BLAST archive (ASN.1) 11

Seqalign (JSON) 12

Multiple-file BLAST JSON 13

Multiple-file BLAST XML2 14

Single-file BLAST JSON 15

Single-file BLAST XML2 16

Sequence Alignment/Map (SAM) 17

Organism Report 18

Specifiers for formatting option 6,7 and 10:

Format

specifiers

qseqid Query Seq-id

qgi Query GI

qacc Query accesion

qaccver Query accesion.version

qlen Query sequence length

sseqid Subject Seq-id

sallseqid All subject Seq-id(s), separated by a ‘;’

sgi Subject GI

sallgi All subject GIs

sacc Subject accession

saccver Subject accession.version

sallacc All subject accessions

slen Subject sequence length

qstart Start of alignment in query

qend End of alignment in query

sstart Start of alignment in subject

send End of alignment in subject

qseq Aligned part of query sequence

sseq Aligned part of subject sequence

evalue Expect value

bitscore Bit score

score Raw score

length Alignment length

pident Percentage of identical matches

nident Number of identical matches

mismatch Number of mismatches

positive Number of positive-scoring matches

gapopen Number of gap openings

gaps Total number of gaps

ppos Percentage of positive-scoring matches

frames Query and subject frames separated by a ‘/’

qframe Query frame

sframe Subject frame

btop Blast traceback operations (BTOP)

staxid Subject Taxonomy ID

ssciname Subject Scientific Name

scomname Subject Common Name

sblastname Subject Blast Name

sskingdom Subject Super Kingdom

staxids unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)

sscinames unique Subject Scientific Name(s), separated by a ‘;’

scomnames unique Subject Common Name(s), separated by a ‘;’

sblastnames unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)

sskingdoms unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)

stitle Subject Title

salltitles All Subject Title(s), separated by a ‘<>’

sstrand Subject Strand

qcovs Query Coverage Per Subject

qcovhsp Query Coverage Per HSP

qcovus Query Coverage Per Unique Subject (blastn only)

Example¶

This wrapper can be used in the following way:

rule blast_nucleotide:
    input:
        query = "{sample}.fasta",
        blastdb=multiext("blastdb/blastdb",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    output:
        "{sample}.blast.txt"
    log:
        "logs/{sample}.blast.log"
    threads:
        2
    params:
        # Usable options and specifiers for the different output formats are listed here:
        # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
        format="6 qseqid sseqid evalue",
        extra=""
    wrapper:
        "v1.21.1/bio/blast/blastn"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies¶

blast=2.13.0

Input/Output¶

Input:

FASTA file OR
bare sequence file (more information) OR
identifiers (more information)

Output:

depending on the formatting option, different output files can be generated (see tables above)

Authors¶

Code¶

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]

if format:
    out_format = " -outfmt '{}'".format(format)

shell(
    "blastn"
    " -query {snakemake.input.query}"
    " {out_format}"
    " {snakemake.params.extra}"
    " -db {db_name}"
    " -num_threads {snakemake.threads}"
    " -out {snakemake.output[0]}"
)