BIOBAMBAM2 BAMSORMADUP

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/biobambam2/bamsormadup?label=version%20update%20pull%20requests

Mark PCR and optical duplicates, followed with sorting, with BioBamBam2 tools.

Example

This wrapper can be used in the following way:

rule mark_duplicates_bamsormadup:
    input:
        "mapped/{sample}.bam",
    output:
        bam="dedup/{sample}.bam",
        index="dedup/{sample}.bai",
        metrics="dedup/{sample}.metrics.txt",
    log:
        "logs/{sample}.log",
    params:
        extra="SO=coordinate",
    resources:
        mem_mb=1024,
    wrapper:
        "v1.21.4-1-g701abe08/bio/biobambam2/bamsormadup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Software dependencies

  • biobambam=2.0.183

Input/Output

Input:

  • SAM/BAM/CRAM file
  • reference (for CRAM output)

Output:

  • SAM/BAM/CRAM file with marked duplicates
  • BAM index file (optional)
  • metrics file (optional)

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


import os
import random
import tempfile
from pathlib import Path
from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
extra = snakemake.params.get("extra", "")


# File formats
in_name, in_format = os.path.splitext(snakemake.input[0])
in_format = in_format.lstrip(".")
out_name, out_format = os.path.splitext(snakemake.output[0])
out_format = out_format.lstrip(".")


index = snakemake.output.get("index", "")
if index:
    index = f"indexfilename={index}"


metrics = snakemake.output.get("metrics", "")
if metrics:
    metrics = f"M={metrics}"


with tempfile.TemporaryDirectory() as tmpdir:
    # This folder must not exist; it is created by BamSorMaDup
    tmpdir_bamsormadup = Path(tmpdir) / "bamsormadup_{:06d}".format(
        random.randrange(10**6)
    )

    shell(
        "bamsormadup threads={snakemake.threads}"
        " inputformat={in_format}"
        " tmpfile={tmpdir_bamsormadup}"
        " outputformat={out_format}"
        " {index} {metrics} {extra}"
        " < {snakemake.input[0]} > {snakemake.output[0]}"
        " {log}"
    )