BISMARK_METHYLATION_EXTRACTOR

Call methylation counts from Bismark alignment results (see https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor).

URL:

Example

This wrapper can be used in the following way:

rule bismark_methylation_extractor:
    input: "bams/{sample}.bam"
    output:
        mbias_r1="qc/meth/{sample}.M-bias_R1.png",
        # Only for PE BAMS:
        # mbias_r2="qc/meth/{sample}.M-bias_R2.png",

        mbias_report="meth/{sample}.M-bias.txt",
        splitting_report="meth/{sample}_splitting_report.txt",

        # 1-based start, 1-based end ('inclusive') methylation info: % and counts
        methylome_CpG_cov="meth_cpg/{sample}.bismark.cov.gz",
        # BedGraph with methylation percentage: 0-based start, end exclusive
        methylome_CpG_mlevel_bedGraph="meth_cpg/{sample}.bedGraph.gz",

        # Primary output files: methylation status at each read cytosine position: (extremely large)
        read_base_meth_state_cpg="meth/CpG_context_{sample}.txt.gz",
        # * You could merge CHG, CHH using: --merge_non_CpG
        read_base_meth_state_chg="meth/CHG_context_{sample}.txt.gz",
        read_base_meth_state_chh="meth/CHH_context_{sample}.txt.gz"
    log:
        "logs/meth/{sample}.log"
    params:
        output_dir="meth",  # optional output dir
        extra="--gzip --comprehensive --bedGraph"  # optional params string
    wrapper:
        "v1.2.0/bio/bismark/bismark_methylation_extractor"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • bowtie2==2.4.2
  • bismark==0.23.0
  • samtools==1.9
  • perl-gdgraph==1.54

Input/Output

Input:

  • Input BAM file aligned by Bismark

Output:

  • Depends on bismark options passed to params.extra, optional for this wrapper
  • mbias_report: M-bias report, *.M-bias.txt (if key is provided, the out file will be renamed to this name)
  • mbias_r1: M-Bias plot for R1, *.M-bias_R1.png (if key is provided, the out file will be renamed to this name)
  • mbias_r2: M-Bias plot for R2, *.M-bias_R2.png (if key is provided, the out file will be renamed to this name)
  • splitting_report: Splitting report, *_splitting_report.txt (if key is provided, the out file will be renamed to this name)
  • methylome_CpG_cov: Bismark coverage file for CpG context, *.bismark.cov.gz (if key is provided, the out file will be renamed to this name)
  • methylome_CpG_mlevel_bedGraph: Bismark methylation level track, *.bedGraph.gz
  • read_base_meth_state_cpg: Per read CpG base methylation info, CpG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
  • read_base_meth_state_chg: Per read CpG base methylation info, CHG_context_*.txt.gz (if key is provided, the out file will be renamed to this name)
  • read_base_meth_state_chh: Per read CpG base methylation info, CHH_context_*.txt.gz (if key is provided, the out file will be renamed to this name)

Params

  • output_dir: Output directory (current dir is used if not specified)
  • ignore: Number of bases to trim at 5’ end in R1 (see bismark_methylation_extractor documentation), optional argument
  • ignore_3prime: Number of bases to trim at 3’ end in R1 (see bismark_methylation_extractor documentation), optional argument
  • ignore_r2: Number of bases to trim at 5’ end in R2 (see bismark_methylation_extractor documentation), optional argument
  • ignore_3prime_r2: Number of bases to trim at 3’ end in R2 (see bismark_methylation_extractor documentation), optional argument
  • extra: Any additional args

Authors

  • Roman Cherniatchik

Code

"""Snakemake wrapper for Bismark methylation extractor tool: bismark_methylation_extractor."""
# https://github.com/FelixKrueger/Bismark/blob/master/bismark_methylation_extractor

__author__ = "Roman Chernyatchik"
__copyright__ = "Copyright (c) 2019 JetBrains"
__email__ = "roman.chernyatchik@jetbrains.com"
__license__ = "MIT"


import os
from snakemake.shell import shell

params_extra = snakemake.params.get("extra", "")
cmdline_args = ["bismark_methylation_extractor {params_extra}"]

# output dir
output_dir = snakemake.params.get("output_dir", "")
if output_dir:
    cmdline_args.append("-o {output_dir:q}")

# trimming options
trimming_options = [
    "ignore",  # meth_bias_r1_5end
    "ignore_3prime",  # meth_bias_r1_3end
    "ignore_r2",  # meth_bias_r2_5end
    "ignore_3prime_r2",  # meth_bias_r2_3end
]
for key in trimming_options:
    value = snakemake.params.get(key, None)
    if value:
        cmdline_args.append("--{} {}".format(key, value))

# Input
cmdline_args.append("{snakemake.input}")

# log
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
cmdline_args.append("{log}")

# run
shell(" ".join(cmdline_args))

key2prefix_suffix = [
    ("mbias_report", ("", ".M-bias.txt")),
    ("mbias_r1", ("", ".M-bias_R1.png")),
    ("mbias_r2", ("", ".M-bias_R2.png")),
    ("splitting_report", ("", "_splitting_report.txt")),
    ("methylome_CpG_cov", ("", ".bismark.cov.gz")),
    ("methylome_CpG_mlevel_bedGraph", ("", ".bedGraph.gz")),
    ("read_base_meth_state_cpg", ("CpG_context_", ".txt.gz")),
    ("read_base_meth_state_chg", ("CHG_context_", ".txt.gz")),
    ("read_base_meth_state_chh", ("CHH_context_", ".txt.gz")),
]

log_append = snakemake.log_fmt_shell(stdout=True, stderr=True, append=True)
for (key, (prefix, suffix)) in key2prefix_suffix:
    exp_path = snakemake.output.get(key, None)
    if exp_path:
        if len(snakemake.input) != 1:
            raise ValueError(
                "bismark/bismark_methylation_extractor: Error: only one BAM file is"
                " expected in input, but was <{}>".format(snakemake.input)
            )
        bam_file = snakemake.input[0]
        bam_name = os.path.basename(bam_file)
        bam_wo_ext = os.path.splitext(bam_name)[0]

        actual_path = os.path.join(output_dir, prefix + bam_wo_ext + suffix)
        if exp_path != actual_path:
            shell("mv {actual_path:q} {exp_path:q} {log_append}")