COOLTOOLS SADDLE

https://img.shields.io/github/issues-pr/snakemake/snakemake-wrappers/bio/cooltools/saddle?label=version%20update%20pull%20requests

Calculate a saddle for a resolution in an .mcool file using a track

URL: https://github.com/open2c/cooltools

Example

This wrapper can be used in the following way:

rule cooltools_saddle:
    input:
        cooler="CN.mm9.1000kb.mcool",  ## Multiresolution cooler file
        track="CN_1000000.eigs.tsv",  ## Track file
        expected="CN_1000000.cis.expected.tsv",  ## Expected file
        view="mm9_view.txt",  ## File with the region names and coordinates
    output:
        saddle="CN_{resolution,[0-9]+}.saddledump.npz",
        digitized_track="CN_{resolution,[0-9]+}.digitized.tsv",
        fig="CN_{resolution,[0-9]+}.saddle.pdf",
    params:
        ## Add optional parameters
        range="--qrange 0.01 0.99",
        extra="",
    log:
        "logs/CN_{resolution}_saddle.log",
    wrapper:
        "v5.0.1/bio/cooltools/saddle"


# Note that in this test files are edited to remove

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Software dependencies

  • cooltools=0.7.0

Input/Output

Input:

  • a multiresolution cooler file (.mcool)

  • track file

  • expected file

  • (optional) view, a bed-style file with region coordinates and names to use for analysis

Output:

  • Saves a binary .npz file with saddles and extra information about it, and a track file with digitized values. Can also save saddle plots using extra –fig argument. All output files have the same prefix, taken from the first output argument (i.e. enough to give one output argument). Can have a {resolution} wildcard that specifies the resolution for the analysis, then it doesn’t need to be specified as a parameter.

Params

  • range: What range of values from the track to use. Typically used to ignore outliers. –qrange 0 1 will use all data (default) –qrange 0.01 0.99 will ignore first and last percentile –range 0 5 will use values from 0 to 5

  • resolution: Optional, can be instead specified as a wildcard in the output

  • extra: Any additional arguments to pass

Authors

  • Ilya Flyamer

Code

__author__ = "Ilya Flyamer"
__copyright__ = "Copyright 2022, Ilya Flyamer"
__email__ = "flyamer@gmail.com"
__license__ = "MIT"

from snakemake.shell import shell
from os import path
import tempfile

## Extract arguments
view = snakemake.input.get("view", "")
if view:
    view = f"--view {view}"

track = snakemake.input.get("track", "")
track_col_name = snakemake.params.get("track_col_name", "")
if track and track_col_name:
    track = f"{track}::{track_col_name}"

expected = snakemake.input.get("expected", "")
range = snakemake.params.get("range", "--qrange 0 1")
extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=False, stderr=True)

resolution = snakemake.params.get(
    "resolution", snakemake.wildcards.get("resolution", 0)
)
if not resolution:
    raise ValueError("Please specify resolution either as a wildcard or as a parameter")

fig = snakemake.output.get("fig", "")
if fig:
    ext = path.splitext(fig)[1][1:]
    fig = f"--fig {ext}"

with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "(cooltools saddle"
        " {snakemake.input.cooler}::resolutions/{resolution} "
        " {track} "
        " {expected} "
        " {view} "
        " {range} "
        " {fig} "
        " {extra} "
        " -o {tmpdir}/out)"
        " {log}"
    )

    shell("mv {tmpdir}/out.saddledump.npz {snakemake.output.saddle}")
    shell("mv {tmpdir}/out.digitized.tsv {snakemake.output.digitized_track}")
    if fig:
        shell("mv {tmpdir}/out.{ext} {snakemake.output.fig}")