PBMARKDUP

https://img.shields.io/badge/wrapper_version-v9.8.0-10785b https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/pbmarkdup?label=version%20update%20pull%20requests&color=1cb481

Takes one or multiple sequencing chips of an amplified library as HiFi reads and marks or removes duplicates.

URL: https://github.com/PacificBiosciences/pbmarkdup

Example

This wrapper can be used in the following way:

rule pbmarkdup:
    input:
        "reads.fastq.gz",
    output:
        dedup="pbmarkdup1.fastq.gz",
    log:
        "logs/pbmarkdup.log",
    params:
        extra="",
    wrapper:
        "v9.8.0/bio/pbmarkdup"


rule pbmarkdup_dedup:
    input:
        "reads.fastq.gz",
    output:
        dedup="pbmarkdup2.fastq.gz",
        dup="dedup.fastq.gz",
    log:
        "logs/pbmarkdup_dedup.log",
    params:
        extra="",
    wrapper:
        "v9.8.0/bio/pbmarkdup"


rule pbmarkdup_rmdup:
    input:
        reads="reads.fastq.gz",
    output:
        "pbmarkdup3.fastq.gz",
    log:
        "logs/pbmarkdup_rmdup.log",
    params:
        extra="--rmdup",
    wrapper:
        "v9.8.0/bio/pbmarkdup"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • pbmarkdup=1.2.0

Input/Output

Input:

  • One or more BAM, FASTQ, FASTA (gzipped or plain), or FOFN input files

Output:

  • dedup: Deduplicated reads file (BAM, FASTQ, or FASTA; format inferred from file extension)

  • dup: Duplicated reads, can not be combined with extra –rmdup (optional)

Params

  • extra: Additional program arguments (not –dup-file). Use –rmdup to remove duplicates instead of marking, –cross-library for library-agnostic detection.

Authors

  • Patrik Smeds

Code

__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2026, Patrik Smeds"
__license__ = "MIT"

from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

dedup_file = snakemake.output.get("dedup", snakemake.output[0])

dup_file = snakemake.output.get("dup", "")
if dup_file:
    dup_file = f"--dup-file {dup_file}"

inputs = " ".join(snakemake.input)

if "--rmdup" in extra and dup_file:
    raise ValueError(
        "Cannot specify --rmdup and output duplicates to file using dup_file output option."
    )

shell(
    "pbmarkdup --num-threads {snakemake.threads} {extra} {inputs} {dedup_file} {dup_file} {log}"
)