PBMM2 ALIGN

Align reads using pbmm2, a minimap2 SMRT wrapper for PacBio data https://github.com/PacificBiosciences/pbmm2/

Example

This wrapper can be used in the following way:

rule pbmm2_align:
    input:
        reference="target/{reference}.fasta", # can be either genome index or genome fasta
        query="{query}.bam", # can be either unaligned bam, fastq, or fasta
    output:
        bam="aligned/{query}.{reference}.bam",
        index="aligned/{query}.{reference}.bam.bai",
    log:
        "logs/pbmm2_align/{query}.{reference}.log",
    params:
        preset="CCS", # SUBREAD, CCS, HIFI, ISOSEQ, UNROLLED
        sample="", # sample name for @RG header
        extra="--sort", # optional additional args
        loglevel="INFO",
    threads: 12
    wrapper:
        "v3.9.0/bio/pbmm2/align"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

pbmm2=1.13.1

Authors

William Rowell

Code

__author__ = "William Rowell"
__copyright__ = "Copyright 2020, William Rowell"
__email__ = "wrowell@pacb.com"
__license__ = "MIT"

import tempfile
from snakemake.shell import shell


extra = snakemake.params.get("extra", "")
tmp_root = snakemake.params.get("tmp_root", None)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

with tempfile.TemporaryDirectory(dir=tmp_root) as tmp_dir:
    shell(
        """
        (TMPDIR={tmp_dir}; \
        pbmm2 align --num-threads {snakemake.threads} \
            --preset {snakemake.params.preset} \
            --sample {snakemake.params.sample} \
            --log-level {snakemake.params.loglevel} \
            {extra} \
            {snakemake.input.reference} \
            {snakemake.input.query} \
            {snakemake.output.bam}) {log}
        """
    )