BLAST MAKEBLASTDB FOR FASTA FILES¶
Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.
Example¶
This wrapper can be used in the following way:
rule blast_makedatabase_nucleotide:
input:
fasta="genome/{genome}.fasta"
output:
multiext("results/{genome}.fasta",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
log:
"logs/{genome}.log"
params:
"-input_type fasta -blastdb_version 5 -parse_seqids"
wrapper:
"v1.17.4/bio/blast/makeblastdb"
rule blast_makedatabase_protein:
input:
fasta="protein/{protein}.fasta"
output:
multiext("results/{protein}.fasta",
".pdb",
".phr",
".pin",
".pot",
".psq",
".ptf",
".pto"
)
log:
"logs/{protein}.log"
params:
"-input_type fasta -blastdb_version 5"
wrapper:
"v1.17.4/bio/blast/makeblastdb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
blast==2.11.0
Input/Output¶
Input:
- FASTA file
Output:
- multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log
out = snakemake.output[0]
db_type = ""
(out_name, ext) = path.splitext(out)
if ext.startswith(".n"):
db_type = "nucl"
elif ext.startswith(".p"):
db_type = "prot"
shell(
"makeblastdb"
" -in {snakemake.input.fasta}"
" -dbtype {db_type}"
" {snakemake.params}"
" -logfile {log}"
" -out {out_name}"
)