BLAST MAKEBLASTDB FOR FASTA FILES#
Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.
URL: https://blast.ncbi.nlm.nih.gov/
Example#
This wrapper can be used in the following way:
rule blast_makedatabase_nucleotide:
input:
fasta="genome/{genome}.fasta"
output:
multiext("results/{genome}.fasta",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
log:
"logs/{genome}.log"
params:
"-input_type fasta -blastdb_version 5 -parse_seqids"
wrapper:
"v3.0.1-5-gc155ca9/bio/blast/makeblastdb"
rule blast_makedatabase_protein:
input:
fasta="protein/{protein}.fasta"
output:
multiext("results/{protein}.fasta",
".pdb",
".phr",
".pin",
".pot",
".psq",
".ptf",
".pto"
)
log:
"logs/{protein}.log"
params:
"-input_type fasta -blastdb_version 5"
wrapper:
"v3.0.1-5-gc155ca9/bio/blast/makeblastdb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies#
blast=2.15.0
Input/Output#
Input:
fasta
: Path to FASTA file
Output:
Path to database multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)
Params#
Optional parameters basides `-in`, `-dtype`, and `-out`
:
Code#
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log
out = snakemake.output[0]
db_type = ""
(out_name, ext) = path.splitext(out)
if ext.startswith(".n"):
db_type = "nucl"
elif ext.startswith(".p"):
db_type = "prot"
shell(
"makeblastdb"
" -in {snakemake.input.fasta}"
" -dbtype {db_type}"
" {snakemake.params}"
" -logfile {log}"
" -out {out_name}"
)