BIOBAMBAM2 BAMSORMADUP
Mark PCR and optical duplicates, followed with sorting, with BioBamBam2 tools.
URL: https://gitlab.com/german.tischler/biobambam2
Example
This wrapper can be used in the following way:
rule mark_duplicates_bamsormadup:
input:
"mapped/{sample}.bam",
output:
bam="dedup/{sample}.bam",
index="dedup/{sample}.bai",
metrics="dedup/{sample}.metrics.txt",
log:
"logs/{sample}.log",
params:
extra="SO=coordinate",
resources:
mem_mb=1024,
wrapper:
"v5.8.0-3-g915ba34/bio/biobambam2/bamsormadup"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Software dependencies
biobambam=2.0.185
Input/Output
Input:
Path to SAM/BAM/CRAM file, this must be the first file in the input file list.
Path to reference (for CRAM output)
Output:
Path to SAM/BAM/CRAM file with marked duplicates. This must be the fist output file in the output file list.
index
: Path to BAM index file (optional)metrics
: Path to metrics file (optional)
Params
extra
: additional program arguments (not inputformat, outputformat or tmpfile).
Code
__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2021, Filipe G. Vieira"
__license__ = "MIT"
import os
import random
import tempfile
from pathlib import Path
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=True)
extra = snakemake.params.get("extra", "")
# File formats
in_name, in_format = os.path.splitext(snakemake.input[0])
in_format = in_format.lstrip(".")
out_name, out_format = os.path.splitext(snakemake.output[0])
out_format = out_format.lstrip(".")
index = snakemake.output.get("index", "")
if index:
index = f"indexfilename={index}"
metrics = snakemake.output.get("metrics", "")
if metrics:
metrics = f"M={metrics}"
with tempfile.TemporaryDirectory() as tmpdir:
# This folder must not exist; it is created by BamSorMaDup
tmpdir_bamsormadup = Path(tmpdir) / "bamsormadup_{:06d}".format(
random.randrange(10**6)
)
shell(
"bamsormadup threads={snakemake.threads}"
" inputformat={in_format}"
" tmpfile={tmpdir_bamsormadup}"
" outputformat={out_format}"
" {index} {metrics} {extra}"
" < {snakemake.input[0]} > {snakemake.output[0]}"
" {log}"
)