MOFA2 TRAINING

Train a model on a multi-omic data set with default options.

URL: https://www.bioconductor.org/packages/release/bioc/html/MOFA2.html

Example

This wrapper can be used in the following way:

rule mofa2_training:
    input:
        "{data}.parquet",
    output:
        "{data}.hdf5",
    log:
        "log/{data}.log",
    params:
        scale_groups=False,  # set to True if groups have different ranges/variances
        scale_views=False,  # set to True if views have different ranges/variances
    wrapper:
        "v9.8.0/bio/mofa2/training"

Note that input, output and log file paths can be chosen freely.

When running with

snakemake --use-conda

the software dependencies will be automatically deployed into an isolated environment before execution.

Notes

All other training variables are set to default values.

Software dependencies

bioconductor-mofa2=1.20.2
r-arrow=24.0.0
mofapy2=0.7.4

Input/Output

Input:

A parquet file in tidy format containing data with the headers: sample, feature, view, group (optional), value

sample: The name of the sample

feature: The name of the observed feature

group (optional, advanced): Discouraged for beginners. The aim of the multi-group framework is not to capture differential changes in mean levels between the groups (as for example when doing differential RNA expression). The goal is to compare the sources of variability that drive each group.

value: The observed value

view: The view the observed feature is grouped into

Output:

An HDF5-file with the trained model.

Params

-:
`` ``:
s:
c:
a:
l:
e:
_:
g:
r:
o:
u:
p:
::
`` ``:
s:
e:
t:
`` ``:
t:
o:
`` ``:
`:
T:
r:
u:
e:
`:
`` ``:
i:
f:
`` ``:
g:
r:
o:
u:
p:
s:
``

``:

h:
a:
v:
e:
`` ``:
d:
i:
f:
f:
e:
r:
e:
n:
t:
`` ``:
r:
a:
n:
g:
e:
s:
/:
v:
a:
r:
i:
a:
n:
c:
e:
s:
``

``:

-:
`` ``:
s:
c:
a:
l:
e:
_:
v:
i:
e:
w:
s:
::
`` ``:
s:
e:
t:
`` ``:
t:
o:
`` ``:
`:
T:
r:
u:
e:
`:
`` ``:
i:
f:
`` ``:
v:
i:
e:
w:
s:
``

``:

h:
a:
v:
e:
`` ``:
d:
i:
f:
f:
e:
r:
e:
n:
t:
`` ``:
r:
a:
n:
g:
e:
s:
/:
v:
a:
r:
i:
a:
n:
c:
e:
s:
``

``:

Authors

Simon Sack

Code

#!/bin/R

# load libraries
library(MOFA2)
library(arrow)

# connect to conda environment
conda_prefix <- Sys.getenv("CONDA_PREFIX")
reticulate::use_condaenv(conda_prefix)

# if log file is provided, write log to that file
if (length(snakemake@log) > 0) {
  log <- file(snakemake@log[[1]], open = "wt")
  sink(log)
  sink(log, type = "message")
}

# load long.data frame from parquet file with following headers:
# `sample, feature, view, group (optional), value`

# cast input path as character to avoid errors
path <- as.character(snakemake@input[[1]])

df <- read_parquet(path)

mofa_object <- create_mofa(df)

data_opts <- get_default_data_options(mofa_object)
model_opts <- get_default_model_options(mofa_object)
train_opts <- get_default_training_options(mofa_object)

# model params: scale_groups, scale_views
if ("scale_groups" %in% names(snakemake@params)) {
  data_opts$scale_groups <- snakemake@params[["scale_groups"]]
}

if ("scale_views" %in% names(snakemake@params)) {
  data_opts$scale_views <- snakemake@params[["scale_views"]]
}

# create MOFA-object
mofa_object <- prepare_mofa(
  object = mofa_object,
  data_options = data_opts,
  model_options = model_opts,
  training_options = train_opts
)

# train the MOFA model and write the result to `outfile`
run_mofa(
  mofa_object,
  snakemake@output[[1]]
)