MOFA2 TRAINING
Train a model on a multi-omic data set with default options.
URL: https://www.bioconductor.org/packages/release/bioc/html/MOFA2.html
Example
This wrapper can be used in the following way:
rule mofa2_training:
input:
"{data}.parquet",
output:
"{data}.hdf5",
log:
"log/{data}.log",
params:
scale_groups=False, # set to True if groups have different ranges/variances
scale_views=False, # set to True if views have different ranges/variances
wrapper:
"v9.8.0/bio/mofa2/training"
Note that input, output and log file paths can be chosen freely.
When running with
snakemake --use-conda
the software dependencies will be automatically deployed into an isolated environment before execution.
Notes
All other training variables are set to default values.
Software dependencies
bioconductor-mofa2=1.20.2r-arrow=24.0.0mofapy2=0.7.4
Input/Output
Input:
A parquet file in tidy format containing data with the headers: sample, feature, view, group (optional), value
sample: The name of the sample
feature: The name of the observed feature
group (optional, advanced): Discouraged for beginners. The aim of the multi-group framework is not to capture differential changes in mean levels between the groups (as for example when doing differential RNA expression). The goal is to compare the sources of variability that drive each group.
value: The observed value
view: The view the observed feature is grouped into
Output:
An HDF5-file with the trained model.
Params
-:`` ``:
s:c:a:l:e:_:g:r:o:u:p:::`` ``:
s:e:t:`` ``:
t:o:`` ``:
`:T:r:u:e:`:`` ``:
i:f:`` ``:
g:r:o:u:p:s:
``:
h:a:v:e:`` ``:
d:i:f:f:e:r:e:n:t:`` ``:
r:a:n:g:e:s:/:v:a:r:i:a:n:c:e:s:
``:
-:`` ``:
s:c:a:l:e:_:v:i:e:w:s:::`` ``:
s:e:t:`` ``:
t:o:`` ``:
`:T:r:u:e:`:`` ``:
i:f:`` ``:
v:i:e:w:s:
``:
h:a:v:e:`` ``:
d:i:f:f:e:r:e:n:t:`` ``:
r:a:n:g:e:s:/:v:a:r:i:a:n:c:e:s:
``:
Code
#!/bin/R
# load libraries
library(MOFA2)
library(arrow)
# connect to conda environment
conda_prefix <- Sys.getenv("CONDA_PREFIX")
reticulate::use_condaenv(conda_prefix)
# if log file is provided, write log to that file
if (length(snakemake@log) > 0) {
log <- file(snakemake@log[[1]], open = "wt")
sink(log)
sink(log, type = "message")
}
# load long.data frame from parquet file with following headers:
# `sample, feature, view, group (optional), value`
# cast input path as character to avoid errors
path <- as.character(snakemake@input[[1]])
df <- read_parquet(path)
mofa_object <- create_mofa(df)
data_opts <- get_default_data_options(mofa_object)
model_opts <- get_default_model_options(mofa_object)
train_opts <- get_default_training_options(mofa_object)
# model params: scale_groups, scale_views
if ("scale_groups" %in% names(snakemake@params)) {
data_opts$scale_groups <- snakemake@params[["scale_groups"]]
}
if ("scale_views" %in% names(snakemake@params)) {
data_opts$scale_views <- snakemake@params[["scale_views"]]
}
# create MOFA-object
mofa_object <- prepare_mofa(
object = mofa_object,
data_options = data_opts,
model_options = model_opts,
training_options = train_opts
)
# train the MOFA model and write the result to `outfile`
run_mofa(
mofa_object,
snakemake@output[[1]]
)