Python package that can be used to compute dysbiosis scores. The package leverages autoencoders based anomaly detection. Further details on this method are available here.
Before installing dyspyosis, ensure you have the CUDA toolkit v11.x and matching cuDNN installed, these are required for Tensorflow. Which version you need depends on your hardware, e.g. for a GTX 10XX you'll need CUDA Toolkit 11.2 and the matching cuDNN (8.1.1), for more recent cards you can get more recent versions.
Next, install dyspyosis using the command below.
pip install dyspyosis
Below you can find an example how to use the dyspyosis package. Note that this is for testing purposes and parameters
have been set to complete the script quickly. For real data you'll want to increase the rarefication_count
(the
number of times samples will be rarefied) to a large number (the number of samples x rarefication_count should be > 10k)
and increase the number of epochs
to 4000.
The encode_dim
is the size of the latent space and has been found to work best when set between 4 and 8 depending
on the number of genera in the input data, lower encoder_dim values working better with fewer genera.
The loss, the main metric for dysbiosis, can be computed using compute_loss()
, while the laten space can be
accessed using get_latent
. See the example below.
Note: Depending on your system, you might need to set an environmental variable CUDA_VISIBLE_DEVICES
to "0" before
loading dyspyosis to use the GPU. Try this in case CUDA is installed, but you get an error that no CUDA device was found.
Note: The neural network dyspyosis is based on is relatively small, depending on the complexity of your dataset and
size of the latent space, running dyspyosis on CPU might outperform the GPU (see benchmarks)! To do so, set
CUDA_VISIBLE_DEVICES
to "-1" and CUDA_DEVICE_ORDER
to "PCI_BUS_ID" in your environment before launching
dyspyosis.
import pandas as pd
from dyspyosis import Dyspyosis
if __name__ == "__main__":
df = pd.read_table("./data/test.tsv", index_col=0)
dyspyosis = Dyspyosis(
df.values,
labels=df.index.tolist(),
rarefication_depth=5000,
rarefication_count=10,
encode_dim=4
)
dyspyosis.run_training(epochs=5)
loss = dyspyosis.compute_loss()
loss.to_csv("./data/loss_out.tsv", sep=",", index=None)
latent = dyspyosis.get_latent()
latent.to_csv("./data/latent_out.tsv", sep=",", index=None)
There are two benchmark scripts included in the repository: benchmark_cpu.py
and benchmark_gpu.py
. When
running the CPU benchmark it is important to set two environmental variables before running the code, CUDA_VISIBLE_DEVICES
needs to be "-1"
and CUDA_DEVICE_ORDER
needs to be "PCI_BUS_ID". This ensures that the CPU benchmark actually runs on the CPU in case a GPU is available.
Here are some results running dyspyosis on hardware we have access to.
Type | Hardware | Epochs | Time (s) |
---|---|---|---|
CPU | Intel i5-7500 @ 3.4Ghz | 100 | 185.0017 |
CPU | AMD Ryzen 7 3700X | 100 | 115.1882 |
GPU | NVIDIA GeForce GTX 1060 6GB | 100 | 691.4091 |
GPU | NVIDIA GeForce RTX 4080 16GB | 100 | 340.6128 |
To create the same environment the main devs are using, use requirements.txt to install the exact versions off all packages.
Clone the repository, create a virtual environment and install all requirements first. Additionally, ensure you have the CUDA toolkit v11.x and matching cuDNN installed, these are required for Tensorflow. Which version you need depends on your hardware, e.g. for a GTX 10XX you'll need CUDA Toolkit 11.2 and the matching cuDNN (8.1.1), for more recent cards you can get more recent versions.
git clone https://github.com/raeslab/dyspyosis
cd dyspyosis
python -m venv venv
source venv/activate
pip install -r docs/dev/requirements.txt
To run tests, use the command below. There are a number of Deprecation Warnings (due to tensorflow) that can be
suppressed by --disable-warnings
.
pytest tests/ --disable-warnings --cov=src --cov-report=term-missing --cov-report=xml
Any contributions you make are greatly appreciated.
- Found a bug or have some suggestions? Open an issue.
- Pull requests are welcome! Though open an issue first to discuss which features/changes you wish to implement.
dyspyosis was developed by Sebastian Proost at the RaesLab (part of VIB and KULeuven). dyspyosis is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
For commercial access inquiries, please contact Jeroen Raes.