.. _`bio/blast/blastn`:
BLAST BLASTN
============
.. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/blast/blastn?label=version%20update%20pull%20requests
:target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/blast/blastn
``Blastn`` performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see `BLAST documentation `_.
Different `formatting output options `_ and `formatting specifiers `_ (see tables below) can be selected via the 'format' parameter as shown in example Snakemake rule below.
+----------------------------------------+---------------+------------+
| Alignment view options | Formatting | Format |
| | | |
| | output option | specifiers |
+========================================+===============+============+
| Pairwise | 0 | |
+----------------------------------------+---------------+------------+
| Query-anchored showing identities | 1 | |
+----------------------------------------+---------------+------------+
| Query-anchored no identities | 2 | |
+----------------------------------------+---------------+------------+
| Flat query-anchored showing identities | 3 | |
+----------------------------------------+---------------+------------+
| Flat query-anchored no identities | 4 | |
+----------------------------------------+---------------+------------+
| BLAST XML | 5 | |
+----------------------------------------+---------------+------------+
| Tabular | 6 | available |
+----------------------------------------+---------------+------------+
| Tabular with comment lines | 7 | available |
+----------------------------------------+---------------+------------+
| Seqalign (Text ASN.1) | 8 | |
+----------------------------------------+---------------+------------+
| Seqalign (Binary ASN.1) | 9 | |
+----------------------------------------+---------------+------------+
| Comma-separated values | 10 | available |
+----------------------------------------+---------------+------------+
| BLAST archive (ASN.1) | 11 | |
+----------------------------------------+---------------+------------+
| Seqalign (JSON) | 12 | |
+----------------------------------------+---------------+------------+
| Multiple-file BLAST JSON | 13 | |
+----------------------------------------+---------------+------------+
| Multiple-file BLAST XML2 | 14 | |
+----------------------------------------+---------------+------------+
| Single-file BLAST JSON | 15 | |
+----------------------------------------+---------------+------------+
| Single-file BLAST XML2 | 16 | |
+----------------------------------------+---------------+------------+
| Sequence Alignment/Map (SAM) | 17 | |
+----------------------------------------+---------------+------------+
| Organism Report | 18 | |
+----------------------------------------+---------------+------------+
Specifiers for formatting option 6,7 and 10:
+-------------+-----------------------------------------------------------------------------+
| Format | |
| | |
| specifiers | |
+=============+=============================================================================+
| qseqid | Query Seq-id |
+-------------+-----------------------------------------------------------------------------+
| qgi | Query GI |
+-------------+-----------------------------------------------------------------------------+
| qacc | Query accesion |
+-------------+-----------------------------------------------------------------------------+
| qaccver | Query accesion.version |
+-------------+-----------------------------------------------------------------------------+
| qlen | Query sequence length |
+-------------+-----------------------------------------------------------------------------+
| sseqid | Subject Seq-id |
+-------------+-----------------------------------------------------------------------------+
| sallseqid | All subject Seq-id(s), separated by a ';' |
+-------------+-----------------------------------------------------------------------------+
| sgi | Subject GI |
+-------------+-----------------------------------------------------------------------------+
| sallgi | All subject GIs |
+-------------+-----------------------------------------------------------------------------+
| sacc | Subject accession |
+-------------+-----------------------------------------------------------------------------+
| saccver | Subject accession.version |
+-------------+-----------------------------------------------------------------------------+
| sallacc | All subject accessions |
+-------------+-----------------------------------------------------------------------------+
| slen | Subject sequence length |
+-------------+-----------------------------------------------------------------------------+
| qstart | Start of alignment in query |
+-------------+-----------------------------------------------------------------------------+
| qend | End of alignment in query |
+-------------+-----------------------------------------------------------------------------+
| sstart | Start of alignment in subject |
+-------------+-----------------------------------------------------------------------------+
| send | End of alignment in subject |
+-------------+-----------------------------------------------------------------------------+
| qseq | Aligned part of query sequence |
+-------------+-----------------------------------------------------------------------------+
| sseq | Aligned part of subject sequence |
+-------------+-----------------------------------------------------------------------------+
| evalue | Expect value |
+-------------+-----------------------------------------------------------------------------+
| bitscore | Bit score |
+-------------+-----------------------------------------------------------------------------+
| score | Raw score |
+-------------+-----------------------------------------------------------------------------+
| length | Alignment length |
+-------------+-----------------------------------------------------------------------------+
| pident | Percentage of identical matches |
+-------------+-----------------------------------------------------------------------------+
| nident | Number of identical matches |
+-------------+-----------------------------------------------------------------------------+
| mismatch | Number of mismatches |
+-------------+-----------------------------------------------------------------------------+
| positive | Number of positive-scoring matches |
+-------------+-----------------------------------------------------------------------------+
| gapopen | Number of gap openings |
+-------------+-----------------------------------------------------------------------------+
| gaps | Total number of gaps |
+-------------+-----------------------------------------------------------------------------+
| ppos | Percentage of positive-scoring matches |
+-------------+-----------------------------------------------------------------------------+
| frames | Query and subject frames separated by a '/' |
+-------------+-----------------------------------------------------------------------------+
| qframe | Query frame |
+-------------+-----------------------------------------------------------------------------+
| sframe | Subject frame |
+-------------+-----------------------------------------------------------------------------+
| btop | Blast traceback operations (BTOP) |
+-------------+-----------------------------------------------------------------------------+
| staxid | Subject Taxonomy ID |
+-------------+-----------------------------------------------------------------------------+
| ssciname | Subject Scientific Name |
+-------------+-----------------------------------------------------------------------------+
| scomname | Subject Common Name |
+-------------+-----------------------------------------------------------------------------+
| sblastname | Subject Blast Name |
+-------------+-----------------------------------------------------------------------------+
| sskingdom | Subject Super Kingdom |
+-------------+-----------------------------------------------------------------------------+
| staxids | unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) |
+-------------+-----------------------------------------------------------------------------+
| sscinames | unique Subject Scientific Name(s), separated by a ';' |
+-------------+-----------------------------------------------------------------------------+
| scomnames | unique Subject Common Name(s), separated by a ';' |
+-------------+-----------------------------------------------------------------------------+
| sblastnames | unique Subject Blast Name(s), separated by a ';' (in alphabetical order) |
+-------------+-----------------------------------------------------------------------------+
| sskingdoms | unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) |
+-------------+-----------------------------------------------------------------------------+
| stitle | Subject Title |
+-------------+-----------------------------------------------------------------------------+
| salltitles | All Subject Title(s), separated by a '<>' |
+-------------+-----------------------------------------------------------------------------+
| sstrand | Subject Strand |
+-------------+-----------------------------------------------------------------------------+
| qcovs | Query Coverage Per Subject |
+-------------+-----------------------------------------------------------------------------+
| qcovhsp | Query Coverage Per HSP |
+-------------+-----------------------------------------------------------------------------+
| qcovus | Query Coverage Per Unique Subject (blastn only) |
+-------------+-----------------------------------------------------------------------------+
**URL**: https://blast.ncbi.nlm.nih.gov/
Example
-------
This wrapper can be used in the following way:
.. code-block:: python
rule blast_nucleotide:
input:
query = "{sample}.fasta",
blastdb=multiext("blastdb/blastdb",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
output:
"{sample}.blast.txt"
log:
"logs/{sample}.blast.log"
threads:
2
params:
# Usable options and specifiers for the different output formats are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html.
format="6 qseqid sseqid evalue",
extra=""
wrapper:
"v3.0.1/bio/blast/blastn"
Note that input, output and log file paths can be chosen freely.
When running with
.. code-block:: bash
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
---------------------
* ``blast=2.15.0``
Input/Output
------------
**Input:**
* ``query``: FASTA file OR bare sequence file (`more information `_) OR identifiers (`more information `_)
* ``blastdb``: Path to blast database
**Output:**
* Path to result file depending on the formatting option, different output files can be generated (see tables above)
Params
------
* ``extra``: Optional parameters besides `-query`, `-db`, `-num_threads` and `-out`.
Authors
-------
Code
----
.. code-block:: python
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
format = snakemake.params.get("format", "")
blastdb = snakemake.input.get("blastdb", "")[0]
db_name = path.splitext(blastdb)[0]
if format:
out_format = " -outfmt '{}'".format(format)
shell(
"blastn"
" -query {snakemake.input.query}"
" {out_format}"
" {snakemake.params.extra}"
" -db {db_name}"
" -num_threads {snakemake.threads}"
" -out {snakemake.output[0]}"
)
.. |nl| raw:: html