.. _`bio/picard/collectmultiplemetrics`: PICARD COLLECTMULTIPLEMETRICS ============================= .. image:: https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/picard/collectmultiplemetrics?label=version%20update%20pull%20requests :target: https://github.com/snakemake/snakemake-wrappers/pulls?q=is%3Apr+is%3Aopen+label%3Abio/picard/collectmultiplemetrics A ``picard`` meta-metrics tool that collects multiple classes of metrics. You can select which tool(s) to run by adding the respective extension(s) (see table below) to the requested output of the wrapper invocation (see example Snakemake rule below). +-----------------------------------+-----------------------------------------+ | Tool | Extension(s) for the output files | +===================================+=========================================+ | CollectAlignmentSummaryMetrics | `.alignment_summary_metrics` | +-----------------------------------+-----------------------------------------+ | CollectInsertSizeMetrics | `.insert_size_metrics`, | | | | | | `.insert_size_histogram.pdf` | +-----------------------------------+-----------------------------------------+ | QualityScoreDistribution | `.quality_distribution_metrics`, | | | | | | `.quality_distribution.pdf` | +-----------------------------------+-----------------------------------------+ | MeanQualityByCycle | `.quality_by_cycle_metrics`, | | | | | | `.quality_by_cycle.pdf` | +-----------------------------------+-----------------------------------------+ | CollectBaseDistributionByCycle | `.base_distribution_by_cycle_metrics`, | | | | | | `.base_distribution_by_cycle.pdf` | +-----------------------------------+-----------------------------------------+ | CollectGcBiasMetrics | `.gc_bias.detail_metrics`, | | | | | | `.gc_bias.summary_metrics`, | | | | | | `.gc_bias.pdf` | +-----------------------------------+-----------------------------------------+ | RnaSeqMetrics | `.rna_metrics` | +-----------------------------------+-----------------------------------------+ | CollectSequencingArtifactMetrics | `.bait_bias_detail_metrics`, | | | | | | `.bait_bias_summary_metrics`, | | | | | | `.error_summary_metrics`, | | | | | | `.pre_adapter_detail_metrics`, | | | | | | `.pre_adapter_summary_metrics` | +-----------------------------------+-----------------------------------------+ | CollectQualityYieldMetrics | `.quality_yield_metrics` | +-----------------------------------+-----------------------------------------+ **URL**: https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics Example ------- This wrapper can be used in the following way: .. code-block:: python rule collect_multiple_metrics: input: bam="mapped/{sample}.bam", ref="genome.fasta", output: # Through the output file extensions the different tools for the metrics can be selected # so that it is not necessary to specify them under params with the "PROGRAM" option. # Usable extensions (and which tools they implicitly call) are listed here: # https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html. multiext( "stats/{sample}", ".alignment_summary_metrics", ".insert_size_metrics", ".insert_size_histogram.pdf", ".quality_distribution_metrics", ".quality_distribution.pdf", ".quality_by_cycle_metrics", ".quality_by_cycle.pdf", ".base_distribution_by_cycle_metrics", ".base_distribution_by_cycle.pdf", ".gc_bias.detail_metrics", ".gc_bias.summary_metrics", ".gc_bias.pdf", ".rna_metrics", ".bait_bias_detail_metrics", ".bait_bias_summary_metrics", ".error_summary_metrics", ".pre_adapter_detail_metrics", ".pre_adapter_summary_metrics", ".quality_yield_metrics", ), # optional specification of memory usage of the JVM that snakemake will respect with global # resource restrictions (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources) # and which can be used to request RAM during cluster job submission as `{resources.mem_mb}`: # https://snakemake.readthedocs.io/en/latest/executing/cluster.html#job-properties resources: mem_mb=4096, log: "logs/picard/multiple_metrics/{sample}.log", params: # optional parameters # REF_FLAT is required if RnaSeqMetrics are used extra="--VALIDATION_STRINGENCY LENIENT --METRIC_ACCUMULATION_LEVEL null --METRIC_ACCUMULATION_LEVEL SAMPLE --REF_FLAT ref_flat.txt", wrapper: "v3.0.1/bio/picard/collectmultiplemetrics" Note that input, output and log file paths can be chosen freely. When running with .. code-block:: bash snakemake --use-conda the software dependencies will be automatically deployed into an isolated environment before execution. Notes ----- * The `java_opts` param allows for additional arguments to be passed to the java compiler, e.g. `-XX:ParallelGCThreads=10` (not for `-XmX` or `-Djava.io.tmpdir`, since they are handled automatically). * The `extra` param allows for additional program arguments. * `--TMP_DIR` is automatically set by `resources.tmpdir` Software dependencies --------------------- * ``picard=3.1.1`` * ``snakemake-wrapper-utils=0.6.2`` Input/Output ------------ **Input:** * BAM file (.bam) * FASTA reference sequence file (.fasta or .fa) **Output:** * multiple metrics text files (_metrics) AND * multiple metrics pdf files (.pdf) * the appropriate extensions for the output files must be used depending on the desired tools Authors ------- * David Laehnemann * Antonie Vietor * Filipe G. Vieira Code ---- .. code-block:: python __author__ = "David Laehnemann, Antonie Vietor" __copyright__ = "Copyright 2020, David Laehnemann, Antonie Vietor" __email__ = "antonie.v@gmx.de" __license__ = "MIT" import tempfile from pathlib import Path from snakemake.shell import shell from snakemake_wrapper_utils.java import get_java_opts log = snakemake.log_fmt_shell(stdout=True, stderr=True) extra = snakemake.params.get("extra", "") java_opts = get_java_opts(snakemake) exts_to_prog = { ".alignment_summary_metrics": "CollectAlignmentSummaryMetrics", ".insert_size_metrics": "CollectInsertSizeMetrics", ".insert_size_histogram.pdf": "CollectInsertSizeMetrics", ".quality_distribution_metrics": "QualityScoreDistribution", ".quality_distribution.pdf": "QualityScoreDistribution", ".quality_by_cycle_metrics": "MeanQualityByCycle", ".quality_by_cycle.pdf": "MeanQualityByCycle", ".base_distribution_by_cycle_metrics": "CollectBaseDistributionByCycle", ".base_distribution_by_cycle.pdf": "CollectBaseDistributionByCycle", ".gc_bias.detail_metrics": "CollectGcBiasMetrics", ".gc_bias.summary_metrics": "CollectGcBiasMetrics", ".gc_bias.pdf": "CollectGcBiasMetrics", ".rna_metrics": "RnaSeqMetrics", ".bait_bias_detail_metrics": "CollectSequencingArtifactMetrics", ".bait_bias_summary_metrics": "CollectSequencingArtifactMetrics", ".error_summary_metrics": "CollectSequencingArtifactMetrics", ".pre_adapter_detail_metrics": "CollectSequencingArtifactMetrics", ".pre_adapter_summary_metrics": "CollectSequencingArtifactMetrics", ".quality_yield_metrics": "CollectQualityYieldMetrics", } # Select programs to run from output files progs = set() for file in snakemake.output: matched = False for ext in exts_to_prog: if file.endswith(ext): progs.add(exts_to_prog[ext]) matched = True if not matched: raise ValueError( "Unknown type of metrics file requested, for possible metrics files, see https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/picard/collectmultiplemetrics.html" ) programs = "--PROGRAM null --PROGRAM " + " --PROGRAM ".join(progs) # Infer common output prefix output_file = str(snakemake.output[0]) for ext in exts_to_prog: if output_file.endswith(ext): out = output_file[: -len(ext)] break with tempfile.TemporaryDirectory() as tmpdir: shell( "picard CollectMultipleMetrics" " {java_opts} {extra}" " --INPUT {snakemake.input.bam}" " --TMP_DIR {tmpdir}" " --OUTPUT {out}" " --REFERENCE_SEQUENCE {snakemake.input.ref}" " {programs}" " {log}" ) # Under some circumstances, some picard programs might not produce an output (https://github.com/snakemake/snakemake-wrappers/issues/357). To avoid snakemake errors, the output files of those programs are created empty (if they do not exist). for ext in [ ext for ext, prog in exts_to_prog.items() if prog in ["CollectInsertSizeMetrics"] ]: for file in snakemake.output: if file.endswith(ext) and not Path(file).is_file(): Path(file).touch() .. |nl| raw:: html