BAZAM

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/bazam?label=version%20update%20pull%20requests

Bazam is a smarter way to realign reads from one genome to another. If you’ve tried to use Picard SAMtoFASTQ or samtools bam2fq before and ended up unsatisfied with complicated, long running inefficient pipelines, bazam might be what you wanted. Bazam will output FASTQ in a form that can stream directly into common aligners such as BWA or Bowtie2, so that you can quickly and easily realign reads without extraction to any intermediate format. Bazam can target a specific region of the genome, specified as a region or a gene name if you prefer.

URL: https://github.com/ssadedin/bazam

Example

This wrapper can be used in the following way:

rule bazam_interleaved:
    input:
        bam="mapped/{sample}.bam",
        bai="mapped/{sample}.bam.bai",
    output:
        reads="results/reads/{sample}.fastq.gz",
    resources:
        # suggestion according to:
        # https://github.com/ssadedin/bazam/blob/c5988daf4cda4492e3d519c94f2f1e2022af5efe/README.md?plain=1#L46-L55
        mem_mb=lambda wildcards, input: max([0.2 * input.size_mb, 200]),
    log:
        "logs/bazam/{sample}.log",
    wrapper:
        "v4.6.0-24-g250dd3e/bio/bazam"


rule bazam_separated:
    input:
        bam="mapped/{sample}.cram",
        bai="mapped/{sample}.cram.crai",
        reference="genome.fasta",
    output:
        r1="results/reads/{sample}.r1.fastq.gz",
        r2="results/reads/{sample}.r2.fastq.gz",
    resources:
        # suggestion according to:
        # https://github.com/ssadedin/bazam/blob/c5988daf4cda4492e3d519c94f2f1e2022af5efe/README.md?plain=1#L46-L55
        mem_mb=lambda wildcards, input: max([0.4 * input.size_mb, 200]),
    log:
        "logs/bazam/{sample}.log",
    wrapper:
        "v4.6.0-24-g250dd3e/bio/bazam"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bazam=1.0.1

  • snakemake-wrapper-utils=0.6.2

Input/Output

Input:

  • bam: Path to mapping file (BAM/CRAM formatted)

  • reference: Optional path to reference genome sequence (FASTA formatted). Required for CRAM input.

Output:

  • reads: Path to realigned reads (single-ended or interleaved) (FASTQ formatted) OR

  • r1: Path to upstream reads (FASTQ formatted) AND

  • r2: Path to downstream reads (FASTQ formatted)

Params

  • extra: Optional parameters passed to bazam

Authors

  • Christopher Schröder

Code

__author__ = "Christopher Schröder"
__copyright__ = "Copyright 2022, Christopher Schröder"
__email__ = "christopher.schroeder@tu-dortmund.de"
__license__ = "MIT"

from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

java_opts = get_java_opts(snakemake)

log = snakemake.log_fmt_shell(stdout=False, stderr=True)
bam = snakemake.input.bam

# Extra parameters default value is an empty string
extra = snakemake.params.get("extra", "")

if bam.endswith(".cram"):
    if not (reference := snakemake.input.get("reference", "")):
        raise ValueError(
            "input 'reference' is required when working with CRAM input files"
        )
    reference_cmd = f"-Dsamjdk.reference_fasta={reference}"
else:
    reference_cmd = ""

# Extract arguments.
if reads := snakemake.output.get("reads", ""):
    out_cmd = f"-o {reads}"
elif (r1 := snakemake.output.get("r1", "")) and (r2 := snakemake.output.get("r2", "")):
    out_cmd = f"-r1 {r1} -r2 {r2}"
else:
    raise ValueError("either 'reads' or 'r1' and 'r2' must be specified in output")

shell("(bazam {java_opts} {reference_cmd} {extra} -bam {bam} {out_cmd}) {log}")