MAPDAMAGE2

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/mapdamage2?label=version%20update%20pull%20requests

mapDamage2 is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

Example

This wrapper can be used in the following way:

rule mapdamage2:
    input:
        ref="genome.fasta",
        bam="mapped/{sample}.bam",
    output:
        log="results/{sample}/Runtime_log.txt",  # output folder is infered from this file, so it needs to be the same folder for all output files
        GtoA3p="results/{sample}/3pGtoA_freq.txt",
        CtoT5p="results/{sample}/5pCtoT_freq.txt",
        dnacomp="results/{sample}/dnacomp.txt",
        frag_misincorp="results/{sample}/Fragmisincorporation_plot.pdf",
        len="results/{sample}/Length_plot.pdf",
        lg_dist="results/{sample}/lgdistribution.txt",
        misincorp="results/{sample}/misincorporation.txt",
    #   rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file
    params:
        extra="--no-stats",  # optional parameters for mapdamage2 (except -i, -r, -d, --rescale)
    log:
        "logs/{sample}/mapdamage2.log",
    threads: 1  # MapDamage2 is not threaded
    wrapper:
        "v1.20.0-15-g28df43c2/bio/mapdamage2"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

Software dependencies

  • mapdamage2=2.2.1
  • python=3.10.6
  • pysam=0.19.1

Input/Output

Input:

  • reference genome
  • SAM/BAM/CRAM alignemnt

Output:

  • Runtime_log.txt: log file with a summary of command lines used and timestamps.
  • Fragmisincorporation_plot.pdf, a pdf file that displays both fragmentation and misincorporation patterns.
  • Length_plot.pdf, a pdf file that displays length distribution of singleton reads per strand and cumulative frequencies of C->T at 5’-end and G->A at 3’-end are also displayed per strand.
  • misincorporation.txt, contains a table with occurrences for each type of mutations and relative positions from the reads ends.
  • 5pCtoT_freq.txt, contains frequencies of Cytosine to Thymine mutations per position from the 5’-ends.
  • 3pGtoA_freq.txt, contains frequencies of Guanine to Adenine mutations per position from the 3’-ends.
  • dnacomp.txt, contains a table of the reference genome base composition per position, inside reads and adjacent regions.
  • lgdistribution.txt, contains a table with read length distributions per strand.
  • Stats_out_MCMC_hist.pdf, MCMC histogram for the damage parameters and log likelihood.
  • Stats_out_MCMC_iter.csv, values for the damage parameters and log likelihood in each MCMC iteration.
  • Stats_out_MCMC_trace.pdf, a MCMC trace plot for the damage parameters and log likelihood.
  • Stats_out_MCMC_iter_summ_stat.csv, summary statistics for the damage parameters estimated posterior distributions.
  • Stats_out_post_pred.pdf, empirical misincorporation frequency and posterior predictive intervals from the fitted model.
  • Stats_out_MCMC_correct_prob.csv, position specific probability of a C->T and G->A misincorporation is due to damage.
  • dnacomp_genome.txt, contains the global reference genome base composition (computed by seqtk).
  • Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores.

Authors

  • Filipe G. Vieira

Code

__author__ = "Filipe G. Vieira"
__copyright__ = "Copyright 2020, Filipe G. Vieira"
__license__ = "MIT"

import os.path
from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

in_bam = snakemake.input.get("bam", "")
if in_bam:
    in_bam = "--input " + in_bam

output_folder = os.path.dirname(snakemake.output.get("log", ""))
if not output_folder:
    raise ValueError("mapDamage2 rule needs output 'log'.")

rescaled_bam = snakemake.output.get("rescaled_bam", "")
if rescaled_bam:
    rescaled_bam = "--rescale-out " + rescaled_bam


shell(
    "mapDamage "
    "{in_bam} "
    "--reference {snakemake.input.ref} "
    "--folder {output_folder} "
    "{rescaled_bam} "
    "{extra} "
    "{log}"
)