BWA INDEX
Creates a BWA index.
URL: https://github.com/lh3/bwa
Example
This wrapper can be used in the following way:
rule bwa_index:
input:
"{genome}.fasta",
output:
idx=multiext("{genome}.{alg}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
log:
"logs/bwa_index/{genome}.{alg}.log",
params:
extra=lambda w: f"-a {w.alg}",
wrapper:
"v3.9.0-11-gcbe905b/bio/bwa/index"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
Wrapper automatically calculates block_size.
Software dependencies
bwa=0.7.18
Input/Output
Input:
fasta file
Output:
BWA index files
Params
extra
: aditional program arguments
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"
from os.path import splitext
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")
# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", splitext(snakemake.output[0])[0])
# Block size should be a 10th of the reference length (https://github.com/lh3/bwa/issues/104)
block_size = Path(snakemake.input[0]).stat().st_size / 1024 / 1024 / 10
# If GZip, assume a 4-fold compression rate:
# - https://scfbm.biomedcentral.com/articles/10.1186/s13029-019-0073-5/tables/3
# - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336184/
# - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/bin/supp_btt594_supplement-rev2.pdf
# - https://softpanorama.org/HPC/DNA_sequencing/Genomic_data_compression/index.shtml
if snakemake.input[0].endswith(".gz"):
block_size *= 4
# Ensure minimum (10 Mb as BWA default) and maximum (50Gb since no apparent gain and to limit memory usage) block size
block_size = min(50 * 1024, max(10, int(block_size)))
shell("bwa index -b {block_size}M -p {prefix} {extra} {snakemake.input} {log}")