BLAST MAKEBLASTDB FOR FASTA FILES¶
Makeblastdb produces local BLAST databases from nucleotide or protein FASTA files. For more information please see BLAST documentation.
URL:
Example¶
This wrapper can be used in the following way:
rule blast_makedatabase_nucleotide:
input:
fasta="genome/{genome}.fasta"
output:
multiext("results/{genome}.fasta",
".ndb",
".nhr",
".nin",
".not",
".nsq",
".ntf",
".nto"
)
log:
"logs/{genome}.log"
params:
"-input_type fasta -blastdb_version 5 -parse_seqids"
wrapper:
"v1.1.0/bio/blast/makeblastdb"
rule blast_makedatabase_protein:
input:
fasta="protein/{protein}.fasta"
output:
multiext("results/{protein}.fasta",
".pdb",
".phr",
".pin",
".pot",
".psq",
".ptf",
".pto"
)
log:
"logs/{protein}.log"
params:
"-input_type fasta -blastdb_version 5"
wrapper:
"v1.1.0/bio/blast/makeblastdb"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
blast==2.11.0
Input/Output¶
Input:
- FASTA file
Output:
- multiple files with different extensions (e.g. .nin, .nsq, .nhr for nucleotides or .pin, .psq, .phr for proteins)
Authors¶
Code¶
__author__ = "Antonie Vietor"
__copyright__ = "Copyright 2021, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
from snakemake.shell import shell
from os import path
log = snakemake.log
out = snakemake.output[0]
db_type = ""
(out_name, ext) = path.splitext(out)
if ext.startswith(".n"):
db_type = "nucl"
elif ext.startswith(".p"):
db_type = "prot"
shell(
"makeblastdb"
" -in {snakemake.input.fasta}"
" -dbtype {db_type}"
" {snakemake.params}"
" -logfile {log}"
" -out {out_name}"
)