BIOBAMBAM2 BAMSORMADUP

Mark PCR and optical duplicates, followed with sorting, with BioBamBam2 tools

Example

This wrapper can be used in the following way:

rule mark_duplicates:
    input:
        "mapped/{sample}.bam"
    output:
        bam="dedup/{sample}.bam",
        index="dedup/{sample}.bai",
        metrics="dedup/{sample}.metrics.txt",
    log:
        "logs/{sample}.log"
    params:
        extra="SO=coordinate"
    resources:
        mem_mb=1024
    wrapper:
        "0.76.0/bio/biobambam2/bamsormadup"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • biobambam=2.0

Input/Output

Input:

  • SAM/BAM/CRAM file
  • reference (for CRAM output)

Output:

  • SAM/BAM/CRAM file with marked duplicates
  • BAM index file (optional)
  • metrics file (optional)

Notes

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"


import os
from snakemake.shell import shell


log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
extra = snakemake.params.get("extra", "")


# File formats
in_name, in_format = os.path.splitext(snakemake.input[0])
in_format = in_format.lstrip(".")
out_name, out_format = os.path.splitext(snakemake.output[0])
out_format = out_format.lstrip(".")


index = snakemake.output.get("index", "")
if index:
    index = f"indexfilename={index}"


metrics = snakemake.output.get("metrics", "")
if metrics:
    metrics = f"M={metrics}"


shell(
    "bamsormadup threads={snakemake.threads} inputformat={in_format} outputformat={out_format} {index} {metrics} {extra} < {snakemake.input[0]} > {snakemake.output[0]} {log}"
)