BLAST BLASTN

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/blast/blastn?label=version%20update%20pull%20requests

Blastn performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see BLAST documentation.

Different formatting output options and formatting specifiers (see tables below) can be selected via the ‘format’ parameter as shown in example Snakemake rule below.

Alignment view options

Formatting

output option

Format

specifiers

Pairwise

0

Query-anchored showing identities

1

Query-anchored no identities

2

Flat query-anchored showing identities

3

Flat query-anchored no identities

4

BLAST XML

5

Tabular

6

available

Tabular with comment lines

7

available

Seqalign (Text ASN.1)

8

Seqalign (Binary ASN.1)

9

Comma-separated values

10

available

BLAST archive (ASN.1)

11

Seqalign (JSON)

12

Multiple-file BLAST JSON

13

Multiple-file BLAST XML2

14

Single-file BLAST JSON

15

Single-file BLAST XML2

16

Sequence Alignment/Map (SAM)

17

Organism Report

18

Specifiers for formatting option 6,7 and 10:

Format

specifiers

qseqid

Query Seq-id

qgi

Query GI

qacc

Query accesion

qaccver

Query accesion.version

qlen

Query sequence length

sseqid

Subject Seq-id

sallseqid

All subject Seq-id(s), separated by a ‘;’

sgi

Subject GI

sallgi

All subject GIs

sacc

Subject accession

saccver

Subject accession.version

sallacc

All subject accessions

slen

Subject sequence length

qstart

Start of alignment in query

qend

End of alignment in query

sstart

Start of alignment in subject

send

End of alignment in subject

qseq

Aligned part of query sequence

sseq

Aligned part of subject sequence

evalue

Expect value

bitscore

Bit score

score

Raw score

length

Alignment length

pident

Percentage of identical matches

nident

Number of identical matches

mismatch

Number of mismatches

positive

Number of positive-scoring matches

gapopen

Number of gap openings

gaps

Total number of gaps

ppos

Percentage of positive-scoring matches

frames

Query and subject frames separated by a ‘/’

qframe

Query frame

sframe

Subject frame

btop

Blast traceback operations (BTOP)

staxid

Subject Taxonomy ID

ssciname

Subject Scientific Name

scomname

Subject Common Name

sblastname

Subject Blast Name

sskingdom

Subject Super Kingdom

staxids

unique Subject Taxonomy ID(s), separated by a ‘;’ (in numerical order)

sscinames

unique Subject Scientific Name(s), separated by a ‘;’

scomnames

unique Subject Common Name(s), separated by a ‘;’

sblastnames

unique Subject Blast Name(s), separated by a ‘;’ (in alphabetical order)

sskingdoms

unique Subject Super Kingdom(s), separated by a ‘;’ (in alphabetical order)

stitle

Subject Title

salltitles

All Subject Title(s), separated by a ‘<>’

sstrand

Subject Strand

qcovs

Query Coverage Per Subject

qcovhsp

Query Coverage Per HSP

qcovus

Query Coverage Per Unique Subject (blastn only)

URL: https://blast.ncbi.nlm.nih.gov/

Example

This wrapper can be used in the following way:

rule blast_nucleotide:
    input:
        query = "{sample}.fasta",
        blastdb=multiext("blastdb/blastdb",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    output:
        "{sample}.blast.txt"
    log:
        "logs/{sample}.blast.log"
    threads:
        2
    params:
        # Usable options and specifiers for the different output formats are listed here:
        # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
        format="6 qseqid sseqid evalue",
        extra=""
    wrapper:
        "v3.10.2-32-gf4e5b66/bio/blast/blastn"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • blast=2.15.0

Input/Output

Input:

Output:

  • Path to result file depending on the formatting option, different output files can be generated (see tables above)

Params

  • extra: Optional parameters besides -query, -db, -num_threads and -out.

Authors

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]

if format:
    out_format = " -outfmt '{}'".format(format)

shell(
    "blastn"
    " -query {snakemake.input.query}"
    " {out_format}"
    " {snakemake.params.extra}"
    " -db {db_name}"
    " -num_threads {snakemake.threads}"
    " -out {snakemake.output[0]}"
)