.. _`bio/mapdamage2`: MAPDAMAGE2 ========== .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/mapdamage2?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/mapdamage2 mapDamage2 is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. Example ------- This wrapper can be used in the following way: .. code-block:: python rule mapdamage2: input: ref="genome.fasta", bam="mapped/{sample}.bam", output: log="results/{sample}/Runtime_log.txt", # output folder is infered from this file, so it needs to be the same folder for all output files GtoA3p="results/{sample}/3pGtoA_freq.txt", CtoT5p="results/{sample}/5pCtoT_freq.txt", dnacomp="results/{sample}/dnacomp.txt", frag_misincorp="results/{sample}/Fragmisincorporation_plot.pdf", len="results/{sample}/Length_plot.pdf", lg_dist="results/{sample}/lgdistribution.txt", misincorp="results/{sample}/misincorporation.txt", # rescaled_bam="results/{sample}.rescaled.bam", # uncomment if you want the rescaled BAM file params: extra="--no-stats", # optional parameters for mapdamage2 (except -i, -r, -d, --rescale) log: "logs/{sample}/mapdamage2.log", threads: 1 # MapDamage2 is not threaded wrapper: "v3.0.1/bio/mapdamage2" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * The `extra` param allows for additional program arguments. * For more information see, https://ginolhac.github.io/mapDamage/ Software dependencies --------------------- * ``mapdamage2=2.2.2`` * ``python=3.10.13`` * ``pysam=0.22.0`` Input/Output ------------ **Input:** * reference genome * SAM/BAM/CRAM alignemnt **Output:** * ``Runtime_log.txt``: log file with a summary of command lines used and timestamps. * Fragmisincorporation_plot.pdf, a pdf file that displays both fragmentation and misincorporation patterns. * Length_plot.pdf, a pdf file that displays length distribution of singleton reads per strand and cumulative frequencies of C->T at 5'-end and G->A at 3'-end are also displayed per strand. * misincorporation.txt, contains a table with occurrences for each type of mutations and relative positions from the reads ends. * 5pCtoT_freq.txt, contains frequencies of Cytosine to Thymine mutations per position from the 5'-ends. * 3pGtoA_freq.txt, contains frequencies of Guanine to Adenine mutations per position from the 3'-ends. * dnacomp.txt, contains a table of the reference genome base composition per position, inside reads and adjacent regions. * lgdistribution.txt, contains a table with read length distributions per strand. * Stats_out_MCMC_hist.pdf, MCMC histogram for the damage parameters and log likelihood. * Stats_out_MCMC_iter.csv, values for the damage parameters and log likelihood in each MCMC iteration. * Stats_out_MCMC_trace.pdf, a MCMC trace plot for the damage parameters and log likelihood. * Stats_out_MCMC_iter_summ_stat.csv, summary statistics for the damage parameters estimated posterior distributions. * Stats_out_post_pred.pdf, empirical misincorporation frequency and posterior predictive intervals from the fitted model. * Stats_out_MCMC_correct_prob.csv, position specific probability of a C->T and G->A misincorporation is due to damage. * dnacomp_genome.txt, contains the global reference genome base composition (computed by seqtk). * Rescaled BAM file, where likely post-mortem damaged bases have downscaled quality scores. Authors ------- * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "Filipe G. Vieira" __copyright__ = "Copyright 2020, Filipe G. Vieira" __license__ = "MIT" import os.path from snakemake.shell import shell extra = snakemake.params.get("extra", "") log = snakemake.log_fmt_shell(stdout=True, stderr=True) in_bam = snakemake.input.get("bam", "") if in_bam: in_bam = "--input " + in_bam output_folder = os.path.dirname(snakemake.output.get("log", "")) if not output_folder: raise ValueError("mapDamage2 rule needs output 'log'.") rescaled_bam = snakemake.output.get("rescaled_bam", "") if rescaled_bam: rescaled_bam = "--rescale-out " + rescaled_bam shell( "mapDamage " "{in_bam} " "--reference {snakemake.input.ref} " "--folder {output_folder} " "{rescaled_bam} " "{extra} " "{log}" ) .. |nl| raw:: html