PICARD COLLECTMULTIPLEMETRICS

A picard meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard’s documentation. For more information about picard, also see the source code.

Software dependencies

  • picard ==2.23.0

Example

This wrapper can be used in the following way:

rule collect_multiple_metrics:
    input:
         bam="mapped/{sample}.bam",
         ref="genome.fasta"
    output:
        # Through the output file extensions the different tools for the metrics can be selected
        # so that it is not necessary to specify them under params with the "PROGRAM" option.
        # Usable extensions (and which tools they implicitly call) are listed here:
        #         https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
        multiext("stats/{sample}",
                 ".alignment_summary_metrics",
                 ".insert_size_metrics",
                 ".insert_size_histogram.pdf",
                 ".quality_distribution_metrics",
                 ".quality_distribution.pdf",
                 ".quality_by_cycle_metrics",
                 ".quality_by_cycle.pdf",
                 ".base_distribution_by_cycle_metrics",
                 ".base_distribution_by_cycle.pdf",
                 ".gc_bias.detail_metrics",
                 ".gc_bias.summary_metrics",
                 ".gc_bias.pdf",
                 ".rna_metrics",
                 ".bait_bias_detail_metrics",
                 ".bait_bias_summary_metrics",
                 ".error_summary_metrics",
                 ".pre_adapter_detail_metrics",
                 ".pre_adapter_summary_metrics",
                 ".quality_yield_metrics"
                 )
    resources:
        # This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
        #     https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
        mem_gb=3
    log:
        "logs/picard/multiple_metrics/{sample}.log"
    params:
        # optional parameters
        "VALIDATION_STRINGENCY=LENIENT "
        "METRIC_ACCUMULATION_LEVEL=null "
        "METRIC_ACCUMULATION_LEVEL=SAMPLE "
        "REF_FLAT=ref_flat.txt "   # is required if RnaSeqMetrics are used
    wrapper:
        "0.62.0/bio/picard/collectmultiplemetrics"

Note that input, output and log file paths can be chosen freely. When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Authors

  • David Laehnemann
  • Antonie Vietor

Code

__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"

import sys
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

res = snakemake.resources.get("mem_gb", "3")
if not res or res is None:
    res = 3

progs = set()
extensions = set()

for file in snakemake.output:
    if "alignment_summary" in file:
        progs.add("CollectAlignmentSummaryMetrics ")
        extensions.add(".alignment_summary_metrics")
    elif "insert_size" in file:
        progs.add("CollectInsertSizeMetrics ")
        extensions.add(".insert_size_metrics")
        extensions.add(".insert_size_histogram.pdf")
    elif "quality_distribution" in file:
        progs.add("QualityScoreDistribution ")
        extensions.add(".quality_distribution_metrics")
        extensions.add(".quality_distribution.pdf")
    elif "quality_by_cycle" in file:
        progs.add("MeanQualityByCycle ")
        extensions.add(".quality_by_cycle_metrics")
        extensions.add(".quality_by_cycle.pdf")
    elif "base_distribution_by_cycle" in file:
        progs.add("CollectBaseDistributionByCycle ")
        extensions.add(".base_distribution_by_cycle_metrics")
        extensions.add(".base_distribution_by_cycle.pdf")
    elif "gc_bias" in file:
        progs.add("CollectGcBiasMetrics ")
        extensions.add(".gc_bias.detail_metrics")
        extensions.add(".gc_bias.summary_metrics")
        extensions.add(".gc_bias.pdf")
    elif "rna_metrics" in file:
        progs.add("RnaSeqMetrics ")
        extensions.add(".rna_metrics")
    elif "bait_bias" in file or "error_summary" in file or "pre_adapter" in file:
        progs.add("CollectSequencingArtifactMetrics ")
        extensions.add(".bait_bias_detail_metrics")
        extensions.add(".bait_bias_summary_metrics")
        extensions.add(".error_summary_metrics")
        extensions.add(".pre_adapter_detail_metrics")
        extensions.add(".pre_adapter_summary_metrics")
    elif "quality_yield" in file:
        progs.add("CollectQualityYieldMetrics ")
        extensions.add(".quality_yield_metrics")
    else:
        sys.exit(
            "Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
        )
programs = " PROGRAM=" + "PROGRAM=".join(progs)

out = str(snakemake.wildcards.sample)  # as default
output_file = str(snakemake.output[0])
for ext in extensions:
    if ext in output_file:
        if output_file.endswith(ext):
            out = output_file[: -len(ext)]
            break

shell(
    "(picard -Xmx{res}g CollectMultipleMetrics "
    "I={snakemake.input.bam} "
    "O={out} "
    "R={snakemake.input.ref} "
    "{snakemake.params}{programs}) {log}"
)