MOSA - Multi-omic Synthetic Augmentation

This repository presents a bespoke Variational Autoencoder (VAE) that integrates all molecular and phenotypic data sets available for cancer cell lines.

Installation

Instruction

Clone this repository
Create a python (Python 3.10) environment: e.g. conda create -n mosa python=3.10
Activate the python environment: conda activate mosa
Run pip install -r requirements.txt
Install shap from https://github.com/ZhaoxiangSimonCai/shap, which is customised to support the data format in MOSA.
Run pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

Typical installation time

The installation time largely depends on the internet speed as packages need to be downloaded and installed over the internet. Typically the installation should take less than 10 minutes.

Demo

Instruction

Download data files from figshare repository (see links in the manuscript)
Configure the paths of the data files in reports/vae/files/hyperparameters.json
Run MOSA with python PhenPred/vae/Main.py

Expected output

The expected output, including the latent space matrix and reconstructed data matrices, can be downloaded from the figshare repository as described in the paper.

Expected runtime

As a deep learning-based method, the runtime of MOSA depends on whether a GPU is available for training. MOSA took 52 minutes to train and generate the results using a V100 GPU on the DepMap dataset.

Instructions for using MOFA with custom data

Although MOSA is specifically designed for analysing the DepMap dataset, the model can be adapted for any multi-omic datasets. To use MOSA with custom datasets:

Prepare the custom dataset following the formats of DepMap data, which can be downloaded from figshare repositories as described in the manuscript.
Configure the paths of the data files in reports/vae/files/hyperparameters.json. At least two omic datasets are required.
Run MOSA with python PhenPred/vae/Main.py
If certain benchmark analysis cannot be run properly, MOSA can be run by setting skip_benchmarks=true in the hyperparameters.json to only save the output data, which includes the integrated latent space matrix and reconstructed data for each omics.
To further customise data pre-processing, the user can create their own dataset following the style of PhenPred/vae/DatasetDepMap23Q2.py, and the use the custome dataset class in the Main.py.

Reproduction instructions

To reproduce the benchmark results

Download the data from figshare
Place the downloaded files to reports/vae/files/
In the Main.py, configure to run MOSA from pre-computed data hyperparameters = Hypers.read_hyperparameters(timestamp="20231023_092657").

To reproduce from scratch

Directly run MOSA with the default configurations as described above.

Instructions for Integrating Disentanglement Learning into MOSA

To incorporate disentanglement learning, two additional terms are included in the loss function, following the Disentangled Inferred Prior Variational Autoencoder (DIP-VAE) approach, as described by Kumar et al. (2018):

To use this, update the hyperparameters.json file by specifying dip_vae_type as either "i" or "ii" (type ii is recommended), and define the parameters lambda_d and lambda_od as float values, which control the diagonal and off-diagonal regularization, respectively.

Pre-trained models

The pre-trained models can be downloaded from the Hugging Face model hub: MOSA

Citation

Cai, Z et al., Synthetic multi-omics augmentation of cancer cell lines using unsupervised deep learning, 2023

Name		Name	Last commit message	Last commit date
Latest commit History 702 Commits
PhenPred		PhenPred
config		config
data		data
figure		figure
notebooks		notebooks
reports		reports
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOSA - Multi-omic Synthetic Augmentation

Installation

Instruction

Typical installation time

Demo

Instruction

Expected output

Expected runtime

Instructions for using MOFA with custom data

Reproduction instructions

To reproduce the benchmark results

To reproduce from scratch

Instructions for Integrating Disentanglement Learning into MOSA

Pre-trained models

Citation

About

Releases 1

Packages

Contributors 4

Languages

License

QuantitativeBiology/PhenPred

Folders and files

Latest commit

History

Repository files navigation

MOSA - Multi-omic Synthetic Augmentation

Installation

Instruction

Typical installation time

Demo

Instruction

Expected output

Expected runtime

Instructions for using MOFA with custom data

Reproduction instructions

To reproduce the benchmark results

To reproduce from scratch

Instructions for Integrating Disentanglement Learning into MOSA

Pre-trained models

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages