BLAST MAKEBLASTDB FOR FASTA FILES

Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.

Example

This wrapper can be used in the following way:

rule blast_makedatabase_nucleotide:
    input:
        fasta="genome/{genome}.fasta"
    output:
        multiext("results/{genome}.fasta",
            ".ndb",
            ".nhr",
            ".nin",
            ".not",
            ".nsq",
            ".ntf",
            ".nto"
        )
    log:
        "logs/{genome}.log"
    params:
        "-input_type fasta -blastdb_version 5 -parse_seqids"
    wrapper:
        "v1.9.0/bio/blast/makeblastdb"

rule blast_makedatabase_protein:
    input:
        fasta="protein/{protein}.fasta"
    output:
        multiext("results/{protein}.fasta",
            ".pdb",
            ".phr",
            ".pin",
            ".pot",
            ".psq",
            ".ptf",
            ".pto"
        )
    log:
        "logs/{protein}.log"
    params:
        "-input_type fasta -blastdb_version 5"
    wrapper:
        "v1.9.0/bio/blast/makeblastdb"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • blast==2.11.0

Input/Output

Input:

  • FASTA file

Output:

  • multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)

Authors

Code

__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

from snakemake.shell import shell
from os import path

log = snakemake.log
out = snakemake.output[0]

db_type = ""
(out_name, ext) = path.splitext(out)

if ext.startswith(".n"):
    db_type = "nucl"
elif ext.startswith(".p"):
    db_type = "prot"

shell(
    "makeblastdb"
    " -in {snakemake.input.fasta}"
    " -dbtype {db_type}"
    " {snakemake.params}"
    " -logfile {log}"
    " -out {out_name}"
)