BWA INDEX

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/bwa/index?label=version%20update%20pull%20requests

Creates a BWA index.

URL: https://github.com/lh3/bwa

Example

This wrapper can be used in the following way:

rule bwa_index:
    input:
        "{genome}.fasta",
    output:
        idx=multiext("{genome}.{alg}", ".amb", ".ann", ".bwt", ".pac", ".sa"),
    log:
        "logs/bwa_index/{genome}.{alg}.log",
    params:
        extra=lambda w: f"-a {w.alg}",
    wrapper:
        "v3.9.0/bio/bwa/index"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

  • Wrapper automatically calculates block_size.

Software dependencies

  • bwa=0.7.18

Input/Output

Input:

  • fasta file

Output:

  • BWA index files

Params

  • extra: aditional program arguments

Authors

  • Patrik Smeds

  • Filipe G. Vieira

Code

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2016, Patrik Smeds"
__email__ = "patrik.smeds@gmail.com"
__license__ = "MIT"

from os.path import splitext
from pathlib import Path
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
extra = snakemake.params.get("extra", "")

# Prefix that should be used for the database
prefix = snakemake.params.get("prefix", splitext(snakemake.output[0])[0])

# Block size should be a 10th of the reference length (https://github.com/lh3/bwa/issues/104)
block_size = Path(snakemake.input[0]).stat().st_size / 1024 / 1024 / 10
# If GZip, assume a 4-fold compression rate:
# - https://scfbm.biomedcentral.com/articles/10.1186/s13029-019-0073-5/tables/3
# - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336184/
# - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866555/bin/supp_btt594_supplement-rev2.pdf
# - https://softpanorama.org/HPC/DNA_sequencing/Genomic_data_compression/index.shtml
if snakemake.input[0].endswith(".gz"):
    block_size *= 4

# Ensure minimum (10 Mb as BWA default) and maximum (50Gb since no apparent gain and to limit memory usage) block size
block_size = min(50 * 1024, max(10, int(block_size)))

shell("bwa index -b {block_size}M -p {prefix} {extra} {snakemake.input} {log}")