BWA-MEM2¶
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine. Optional sorting using samtools or picard.
Example¶
This wrapper can be used in the following way:
rule bwa_mem2_mem:
input:
reads=["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"],
# Index can be a list of (all) files created by bwa, or one of them
idx=multiext("genome.fasta", ".amb", ".ann", ".bwt.2bit.64", ".pac"),
output:
"mapped/{sample}.bam",
log:
"logs/bwa_mem2/{sample}.log",
params:
extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
sort="none", # Can be 'none', 'samtools' or 'picard'.
sort_order="coordinate", # Can be 'coordinate' (default) or 'queryname'.
sort_extra="", # Extra args for samtools/picard.
threads: 8
wrapper:
"v1.12.2/bio/bwa-mem2/mem"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies¶
bwa-mem2==2.2.1
samtools==1.12
picard-slim=2.27
Authors¶
- Christopher Schröder
- Johannes Köster
- Julian de Ruiter
Code¶
__author__ = "Christopher Schröder, Johannes Köster, Julian de Ruiter"
__copyright__ = (
"Copyright 2020, Christopher Schröder, Johannes Köster and Julian de Ruiter"
)
__email__ = "christopher.schroeder@tu-dortmund.de koester@jimmy.harvard.edu, julianderuiter@gmail.com"
__license__ = "MIT"
from os import path
from snakemake.shell import shell
# Extract arguments.
extra = snakemake.params.get("extra", "")
sort = snakemake.params.get("sort", "none")
sort_order = snakemake.params.get("sort_order", "coordinate")
sort_extra = snakemake.params.get("sort_extra", "")
index = snakemake.input.get("index", "")
if isinstance(index, str):
index = path.splitext(snakemake.input.idx)[0]
else:
index = path.splitext(snakemake.input.idx[0])[0]
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
# Check inputs/arguments.
if not isinstance(snakemake.input.reads, str) and len(snakemake.input.reads) not in {
1,
2,
}:
raise ValueError("input must have 1 (single-end) or 2 (paired-end) elements")
if sort_order not in {"coordinate", "queryname"}:
raise ValueError("Unexpected value for sort_order ({})".format(sort_order))
# Determine which pipe command to use for converting to bam or sorting.
if sort == "none":
# Simply convert to bam using samtools view.
pipe_cmd = "samtools view -Sbh -o {snakemake.output[0]} -"
elif sort == "samtools":
# Sort alignments using samtools sort.
pipe_cmd = "samtools sort {sort_extra} -o {snakemake.output[0]} -"
# Add name flag if needed.
if sort_order == "queryname":
sort_extra += " -n"
prefix = path.splitext(snakemake.output[0])[0]
sort_extra += " -T " + prefix + ".tmp"
elif sort == "picard":
# Sort alignments using picard SortSam.
pipe_cmd = (
"picard SortSam {sort_extra} INPUT=/dev/stdin"
" OUTPUT={snakemake.output[0]} SORT_ORDER={sort_order}"
)
else:
raise ValueError("Unexpected value for params.sort ({})".format(sort))
shell(
"(bwa-mem2 mem"
" -t {snakemake.threads}"
" {extra}"
" {index}"
" {snakemake.input.reads}"
" | " + pipe_cmd + ") {log}"
)