PICARD COLLECTMULTIPLEMETRICS

A picard meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard’s documentation. For more information about picard, also see the source code.

You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below).

Tool Extension(s) for the output files
CollectAlignmentSummaryMetrics “.alignment_summary_metrics”
CollectInsertSizeMetrics

“.insert_size_metrics”,

“.insert_size_histogram.pdf”

QualityScoreDistribution

“.quality_distribution_metrics”,

“.quality_distribution.pdf”

MeanQualityByCycle

“.quality_by_cycle_metrics”,

“.quality_by_cycle.pdf”

CollectBaseDistributionByCycle

“.base_distribution_by_cycle_metrics”,

“.base_distribution_by_cycle.pdf”

CollectGcBiasMetrics

“.gc_bias.detail_metrics”,

“.gc_bias.summary_metrics”,

“.gc_bias.pdf”

RnaSeqMetrics “.rna_metrics”
CollectSequencingArtifactMetrics

“.bait_bias_detail_metrics”,

“.bait_bias_summary_metrics”,

“.error_summary_metrics”,

“.pre_adapter_detail_metrics”,

“.pre_adapter_summary_metrics”

CollectQualityYieldMetrics “.quality_yield_metrics”

Software dependencies

  • picard ==2.23.0

Example

This wrapper can be used in the following way:

rule collect_multiple_metrics:
    input:
         bam="mapped/{sample}.bam",
         ref="genome.fasta"
    output:
        # Through the output file extensions the different tools for the metrics can be selected
        # so that it is not necessary to specify them under params with the "PROGRAM" option.
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
        multiext("stats/{sample}",
                 ".alignment_summary_metrics",
                 ".insert_size_metrics",
                 ".insert_size_histogram.pdf",
                 ".quality_distribution_metrics",
                 ".quality_distribution.pdf",
                 ".quality_by_cycle_metrics",
                 ".quality_by_cycle.pdf",
                 ".base_distribution_by_cycle_metrics",
                 ".base_distribution_by_cycle.pdf",
                 ".gc_bias.detail_metrics",
                 ".gc_bias.summary_metrics",
                 ".gc_bias.pdf",
                 ".rna_metrics",
                 ".bait_bias_detail_metrics",
                 ".bait_bias_summary_metrics",
                 ".error_summary_metrics",
                 ".pre_adapter_detail_metrics",
                 ".pre_adapter_summary_metrics",
                 ".quality_yield_metrics"
                 )
    resources:
        # This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
        #     https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
        mem_gb=3
    log:
        "logs/picard/multiple_metrics/{sample}.log"
    params:
        # optional parameters
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE "
        "REF_FLAT=ref_flat.txt "   # is required if RnaSeqMetrics are used
    wrapper:
        "0.65.0/bio/picard/collectmultiplemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • David Laehnemann
  • Antonie Vietor

Code

__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import sys
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

res = snakemake.resources.get("mem_gb", "3")
if not res or res is None:
    res = 3

exts_to_prog = {
    ".alignment_summary_metrics": "CollectAlignmentSummaryMetrics",
    ".insert_size_metrics": "CollectInsertSizeMetrics",
    ".insert_size_histogram.pdf": "CollectInsertSizeMetrics",
    ".quality_distribution_metrics": "QualityScoreDistribution",
    ".quality_distribution.pdf": "QualityScoreDistribution",
    ".quality_by_cycle_metrics": "MeanQualityByCycle",
    ".quality_by_cycle.pdf": "MeanQualityByCycle",
    ".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle",
    ".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle",
    ".gc_bias.detail_metrics": "CollectGcBiasMetrics",
    ".gc_bias.summary_metrics": "CollectGcBiasMetrics",
    ".gc_bias.pdf": "CollectGcBiasMetrics",
    ".rna_metrics": "RnaSeqMetrics",
    ".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics",
    ".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics",
    ".error_summary_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics",
    ".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics",
    ".quality_yield_metrics": "CollectQualityYieldMetrics",
}
progs = set()

for file in snakemake.output:
    matched = False
    for ext in exts_to_prog:
        if file.endswith(ext):
            progs.add(exts_to_prog[ext])
            matched = True
    if not matched:
        sys.exit(
            "Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
        )

programs = " PROGRAM=" + " PROGRAM=".join(progs)

out = str(snakemake.wildcards.sample)  # as default
output_file = str(snakemake.output[0])
for ext in exts_to_prog:
    if output_file.endswith(ext):
        out = output_file[: -len(ext)]
        break

shell(
    "(picard -Xmx{res}g CollectMultipleMetrics "
    "I={snakemake.input.bam} "
    "O={out} "
    "R={snakemake.input.ref} "
    "{snakemake.params}{programs}) {log}"
)