PBMARKDUP
Takes one or multiple sequencing chips of an amplified library as HiFi reads and marks or removes duplicates.
URL: https://github.com/PacificBiosciences/pbmarkdup
Example
This wrapper can be used in the following way:
rule pbmarkdup:
input:
"reads.fastq.gz",
output:
dedup="pbmarkdup1.fastq.gz",
log:
"logs/pbmarkdup.log",
params:
extra="",
wrapper:
"v9.8.0/bio/pbmarkdup"
rule pbmarkdup_dedup:
input:
"reads.fastq.gz",
output:
dedup="pbmarkdup2.fastq.gz",
dup="dedup.fastq.gz",
log:
"logs/pbmarkdup_dedup.log",
params:
extra="",
wrapper:
"v9.8.0/bio/pbmarkdup"
rule pbmarkdup_rmdup:
input:
reads="reads.fastq.gz",
output:
"pbmarkdup3.fastq.gz",
log:
"logs/pbmarkdup_rmdup.log",
params:
extra="--rmdup",
wrapper:
"v9.8.0/bio/pbmarkdup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
pbmarkdup=1.2.0
Input/Output
Input:
One or more BAM, FASTQ, FASTA (gzipped or plain), or FOFN input files
Output:
dedup: Deduplicated reads file (BAM, FASTQ, or FASTA; format inferred from file extension)dup: Duplicated reads, can not be combined with extra –rmdup (optional)
Params
extra: Additional program arguments (not –dup-file). Use –rmdup to remove duplicates instead of marking, –cross-library for library-agnostic detection.
Code
__author__ = "Patrik Smeds"
__copyright__ = "Copyright 2026, Patrik Smeds"
__license__ = "MIT"
from snakemake.shell import shell
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
dedup_file = snakemake.output.get("dedup", snakemake.output[0])
dup_file = snakemake.output.get("dup", "")
if dup_file:
dup_file = f"--dup-file {dup_file}"
inputs = " ".join(snakemake.input)
if "--rmdup" in extra and dup_file:
raise ValueError(
"Cannot specify --rmdup and output duplicates to file using dup_file output option."
)
shell(
"pbmarkdup --num-threads {snakemake.threads} {extra} {inputs} {dedup_file} {dup_file} {log}"
)