.. _`bio/blast/blastn`: BLAST BLASTN ============ .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/blast/blastn?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/blast/blastn ``Blastn`` performs a sequence similarity search of nucleotide query sequences against a nucleotide database. For more information please see `BLAST documentation `_. Different `formatting output options `_ and `formatting specifiers `_ (see tables below) can be selected via the 'format' parameter as shown in example Snakemake rule below. +----------------------------------------+---------------+------------+ | Alignment view options | Formatting | Format | | | | | | | output option | specifiers | +========================================+===============+============+ | Pairwise | 0 | | +----------------------------------------+---------------+------------+ | Query-anchored showing identities | 1 | | +----------------------------------------+---------------+------------+ | Query-anchored no identities | 2 | | +----------------------------------------+---------------+------------+ | Flat query-anchored showing identities | 3 | | +----------------------------------------+---------------+------------+ | Flat query-anchored no identities | 4 | | +----------------------------------------+---------------+------------+ | BLAST XML | 5 | | +----------------------------------------+---------------+------------+ | Tabular | 6 | available | +----------------------------------------+---------------+------------+ | Tabular with comment lines | 7 | available | +----------------------------------------+---------------+------------+ | Seqalign (Text ASN.1) | 8 | | +----------------------------------------+---------------+------------+ | Seqalign (Binary ASN.1) | 9 | | +----------------------------------------+---------------+------------+ | Comma-separated values | 10 | available | +----------------------------------------+---------------+------------+ | BLAST archive (ASN.1) | 11 | | +----------------------------------------+---------------+------------+ | Seqalign (JSON) | 12 | | +----------------------------------------+---------------+------------+ | Multiple-file BLAST JSON | 13 | | +----------------------------------------+---------------+------------+ | Multiple-file BLAST XML2 | 14 | | +----------------------------------------+---------------+------------+ | Single-file BLAST JSON | 15 | | +----------------------------------------+---------------+------------+ | Single-file BLAST XML2 | 16 | | +----------------------------------------+---------------+------------+ | Sequence Alignment/Map (SAM) | 17 | | +----------------------------------------+---------------+------------+ | Organism Report | 18 | | +----------------------------------------+---------------+------------+ Specifiers for formatting option 6,7 and 10: +-------------+-----------------------------------------------------------------------------+ | Format | | | | | | specifiers | | +=============+=============================================================================+ | qseqid | Query Seq-id | +-------------+-----------------------------------------------------------------------------+ | qgi | Query GI | +-------------+-----------------------------------------------------------------------------+ | qacc | Query accesion | +-------------+-----------------------------------------------------------------------------+ | qaccver | Query accesion.version | +-------------+-----------------------------------------------------------------------------+ | qlen | Query sequence length | +-------------+-----------------------------------------------------------------------------+ | sseqid | Subject Seq-id | +-------------+-----------------------------------------------------------------------------+ | sallseqid | All subject Seq-id(s), separated by a ';' | +-------------+-----------------------------------------------------------------------------+ | sgi | Subject GI | +-------------+-----------------------------------------------------------------------------+ | sallgi | All subject GIs | +-------------+-----------------------------------------------------------------------------+ | sacc | Subject accession | +-------------+-----------------------------------------------------------------------------+ | saccver | Subject accession.version | +-------------+-----------------------------------------------------------------------------+ | sallacc | All subject accessions | +-------------+-----------------------------------------------------------------------------+ | slen | Subject sequence length | +-------------+-----------------------------------------------------------------------------+ | qstart | Start of alignment in query | +-------------+-----------------------------------------------------------------------------+ | qend | End of alignment in query | +-------------+-----------------------------------------------------------------------------+ | sstart | Start of alignment in subject | +-------------+-----------------------------------------------------------------------------+ | send | End of alignment in subject | +-------------+-----------------------------------------------------------------------------+ | qseq | Aligned part of query sequence | +-------------+-----------------------------------------------------------------------------+ | sseq | Aligned part of subject sequence | +-------------+-----------------------------------------------------------------------------+ | evalue | Expect value | +-------------+-----------------------------------------------------------------------------+ | bitscore | Bit score | +-------------+-----------------------------------------------------------------------------+ | score | Raw score | +-------------+-----------------------------------------------------------------------------+ | length | Alignment length | +-------------+-----------------------------------------------------------------------------+ | pident | Percentage of identical matches | +-------------+-----------------------------------------------------------------------------+ | nident | Number of identical matches | +-------------+-----------------------------------------------------------------------------+ | mismatch | Number of mismatches | +-------------+-----------------------------------------------------------------------------+ | positive | Number of positive-scoring matches | +-------------+-----------------------------------------------------------------------------+ | gapopen | Number of gap openings | +-------------+-----------------------------------------------------------------------------+ | gaps | Total number of gaps | +-------------+-----------------------------------------------------------------------------+ | ppos | Percentage of positive-scoring matches | +-------------+-----------------------------------------------------------------------------+ | frames | Query and subject frames separated by a '/' | +-------------+-----------------------------------------------------------------------------+ | qframe | Query frame | +-------------+-----------------------------------------------------------------------------+ | sframe | Subject frame | +-------------+-----------------------------------------------------------------------------+ | btop | Blast traceback operations (BTOP) | +-------------+-----------------------------------------------------------------------------+ | staxid | Subject Taxonomy ID | +-------------+-----------------------------------------------------------------------------+ | ssciname | Subject Scientific Name | +-------------+-----------------------------------------------------------------------------+ | scomname | Subject Common Name | +-------------+-----------------------------------------------------------------------------+ | sblastname | Subject Blast Name | +-------------+-----------------------------------------------------------------------------+ | sskingdom | Subject Super Kingdom | +-------------+-----------------------------------------------------------------------------+ | staxids | unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) | +-------------+-----------------------------------------------------------------------------+ | sscinames | unique Subject Scientific Name(s), separated by a ';' | +-------------+-----------------------------------------------------------------------------+ | scomnames | unique Subject Common Name(s), separated by a ';' | +-------------+-----------------------------------------------------------------------------+ | sblastnames | unique Subject Blast Name(s), separated by a ';' (in alphabetical order) | +-------------+-----------------------------------------------------------------------------+ | sskingdoms | unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) | +-------------+-----------------------------------------------------------------------------+ | stitle | Subject Title | +-------------+-----------------------------------------------------------------------------+ | salltitles | All Subject Title(s), separated by a '<>' | +-------------+-----------------------------------------------------------------------------+ | sstrand | Subject Strand | +-------------+-----------------------------------------------------------------------------+ | qcovs | Query Coverage Per Subject | +-------------+-----------------------------------------------------------------------------+ | qcovhsp | Query Coverage Per HSP | +-------------+-----------------------------------------------------------------------------+ | qcovus | Query Coverage Per Unique Subject (blastn only) | +-------------+-----------------------------------------------------------------------------+ **URL**: https://blast.ncbi.nlm.nih.gov/ Example ------- This wrapper can be used in the following way: .. code-block:: python rule blast_nucleotide: input: query = "{sample}.fasta", blastdb=multiext("blastdb/blastdb", ".ndb", ".nhr", ".nin", ".not", ".nsq", ".ntf", ".nto" ) output: "{sample}.blast.txt" log: "logs/{sample}.blast.log" threads: 2 params: # Usable options and specifiers for the different output formats are listed here: # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/blast/blastn.html. format="6 qseqid sseqid evalue", extra="" wrapper: "v3.0.1/bio/blast/blastn" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Software dependencies --------------------- * ``blast=2.15.0`` Input/Output ------------ **Input:** * ``query``: FASTA file OR bare sequence file (`more information `_) OR identifiers (`more information `_) * ``blastdb``: Path to blast database **Output:** * Path to result file depending on the formatting option, different output files can be generated (see tables above) Params ------ * ``extra``: Optional parameters besides `-query`, `-db`, `-num_threads` and `-out`. Authors ------- Code ---- .. code-block:: python __author__ = "Antonie Vietor" __copyright__ = "Copyright 2021, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" from snakemake.shell import shell from os import path log = snakemake.log_fmt_shell(stdout=False, stderr=True) format = snakemake.params.get("format", "") blastdb = snakemake.input.get("blastdb", "")[0] db_name = path.splitext(blastdb)[0] if format: out_format = " -outfmt '{}'".format(format) shell( "blastn" " -query {snakemake.input.query}" " {out_format}" " {snakemake.params.extra}" " -db {db_name}" " -num_threads {snakemake.threads}" " -out {snakemake.output[0]}" ) .. |nl| raw:: html