PICARD COLLECTMULTIPLEMETRICS¶
A picard
meta-metrics tool that collects multiple classes of metrics. For usage information about CollectMultipleMetrics, please see picard
’s documentation. For more information about picard
, also see the source code.
Software dependencies¶
- picard ==2.23.0
Example¶
This wrapper can be used in the following way:
rule collect_multiple_metrics:
input:
bam="mapped/{sample}.bam",
ref="genome.fasta"
output:
# Through the output file extensions the different tools for the metrics can be selected
# so that it is not necessary to specify them under params with the "PROGRAM" option.
# Usable extensions (and which tools they implicitly call) are listed here:
# https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html.
multiext("stats/{sample}",
".alignment_summary_metrics",
".insert_size_metrics",
".insert_size_histogram.pdf",
".quality_distribution_metrics",
".quality_distribution.pdf",
".quality_by_cycle_metrics",
".quality_by_cycle.pdf",
".base_distribution_by_cycle_metrics",
".base_distribution_by_cycle.pdf",
".gc_bias.detail_metrics",
".gc_bias.summary_metrics",
".gc_bias.pdf",
".rna_metrics",
".bait_bias_detail_metrics",
".bait_bias_summary_metrics",
".error_summary_metrics",
".pre_adapter_detail_metrics",
".pre_adapter_summary_metrics",
".quality_yield_metrics"
)
resources:
# This parameter (default 3 GB) can be used to limit the total resources a pipeline is allowed to use, see:
# https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources
mem_gb=3
log:
"logs/picard/multiple_metrics/{sample}.log"
params:
# optional parameters
"VALIDATION_STRINGENCY=LENIENT "
"METRIC_ACCUMULATION_LEVEL=null "
"METRIC_ACCUMULATION_LEVEL=SAMPLE "
"REF_FLAT=ref_flat.txt " # is required if RnaSeqMetrics are used
wrapper:
"0.62.0/bio/picard/collectmultiplemetrics"
Note that input, output and log file paths can be chosen freely. When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Authors¶
- David Laehnemann
- Antonie Vietor
Code¶
__author__ = "David Laehnemann, Antonie Vietor"
__copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor"
__email__ = "antonie.v@gmx.de"
__license__ = "MIT"
import sys
from snakemake.shell import shell
log = snakemake.log_fmt_shell(stdout=False, stderr=True)
res = snakemake.resources.get("mem_gb", "3")
if not res or res is None:
res = 3
progs = set()
extensions = set()
for file in snakemake.output:
if "alignment_summary" in file:
progs.add("CollectAlignmentSummaryMetrics ")
extensions.add(".alignment_summary_metrics")
elif "insert_size" in file:
progs.add("CollectInsertSizeMetrics ")
extensions.add(".insert_size_metrics")
extensions.add(".insert_size_histogram.pdf")
elif "quality_distribution" in file:
progs.add("QualityScoreDistribution ")
extensions.add(".quality_distribution_metrics")
extensions.add(".quality_distribution.pdf")
elif "quality_by_cycle" in file:
progs.add("MeanQualityByCycle ")
extensions.add(".quality_by_cycle_metrics")
extensions.add(".quality_by_cycle.pdf")
elif "base_distribution_by_cycle" in file:
progs.add("CollectBaseDistributionByCycle ")
extensions.add(".base_distribution_by_cycle_metrics")
extensions.add(".base_distribution_by_cycle.pdf")
elif "gc_bias" in file:
progs.add("CollectGcBiasMetrics ")
extensions.add(".gc_bias.detail_metrics")
extensions.add(".gc_bias.summary_metrics")
extensions.add(".gc_bias.pdf")
elif "rna_metrics" in file:
progs.add("RnaSeqMetrics ")
extensions.add(".rna_metrics")
elif "bait_bias" in file or "error_summary" in file or "pre_adapter" in file:
progs.add("CollectSequencingArtifactMetrics ")
extensions.add(".bait_bias_detail_metrics")
extensions.add(".bait_bias_summary_metrics")
extensions.add(".error_summary_metrics")
extensions.add(".pre_adapter_detail_metrics")
extensions.add(".pre_adapter_summary_metrics")
elif "quality_yield" in file:
progs.add("CollectQualityYieldMetrics ")
extensions.add(".quality_yield_metrics")
else:
sys.exit(
"Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html"
)
programs = " PROGRAM=" + "PROGRAM=".join(progs)
out = str(snakemake.wildcards.sample) # as default
output_file = str(snakemake.output[0])
for ext in extensions:
if ext in output_file:
if output_file.endswith(ext):
out = output_file[: -len(ext)]
break
shell(
"(picard -Xmx{res}g CollectMultipleMetrics "
"I={snakemake.input.bam} "
"O={out} "
"R={snakemake.input.ref} "
"{snakemake.params}{programs}) {log}"
)