Larvae

Small programmes to load, process and classify larvae data in an unsupervised fashion.

Prerequisites

Python 3.x
numpy v.1.14.0
pandas v.0.23.0
tensorflow-gpu v.1.7.0
keras v.2.1.6
sklearn v.0.19.1
tables v.3.4.3
matplotlib v.2.2.2
scipy v.1.1.0
progress v.1.4
mpi4py v.3.0.0
OpenMPI v.3.1.1
Singularity v.2.5.2
CUDA v.9.0
cuDNN v.7.0 \

Loading

First, we need to load and preprocess larvae data. Three different loaders exist:

loader.py : to load and process data using a window of any length
loader_lwft.py : to load and process data using a layered window Fourier transform
loader_mpi.py : same as loader.py, compatible with MPI

All of these store data in .hdf5 files.

Usage :
For loader.py :

python loader.py --path=/path/to/the/screen --save_dir=/path/to/save/dir --lines=line1,line2,...lineN --window=1 --screen=t15

For loader_lwft.py :

python loader_lwft.py --path=/path/to/the/screen --features=all --save_dir=/path/to/save/dir --lines=line1,line2,...lineN --window=1 --screen=t15

For loader_mpi.py :

mpiexec -np 4 python loader_mpi.py --path=/path/to/the/screen --save_dir=/path/to/save/dir --lines=line1,line2,...lineN --window=1 --screen=t15

IDEC

The main model we use for our computations is based on the IDEC algorithm (Improved Deep Embedded Clustering with Local Structure Preservation by Guo et al.).

Usage : With data already transformed and stored in a hdf5 file:

python IDEC_GPU.py --dataset=/path/to/data.hdf5 --n_clusters=x --maxiter=200000 --update_interval=500 --save_dir=/path/to/save_dir --epochs=100

If you also have pretrained weights for the autoencoder:

python IDEC_GPU.py --dataset=/path/to/data.hdf5 --n_clusters=x --maxiter=200000 --update_interval=500 --save_dir=/path/to/save_dir --ae_weights=path/to/pretrained/ae.h5

To resume training :

python IDEC_GPU.py --dataset=/path/to/data.hdf5 --n_clusters=x --maxiter=200000 --update_interval=500 --save_dir=/path/to/save_dir --idec_weights=path/to/weights.h5

This implementation is based on the one of Guo et al. (see https://github.com/XifengGuo/IDEC)

Singularity

To allow to perform computations on a HPC cluster, we use encapsulated environments using Singularity. We provide here two recipes from which you can create environments that are fully compatible with our programmes.

To create a new sandbox environment with tensorflow-gpu :

singularity build --sandbox tensorflow_gpu.img tensorflow_gpu

Once the container built and this repo imported in it, you can perform computations. For example, on TARS, the Pasteur cluster, we use:

singularity exec --writable --nv -H $HOME:/home/$USER tensorflow_gpu.img/ python /Larvae/IDEC/IDEC_GPU.py --dataset=/Larvae/path/to/dataset.hdf5 --n_clusters=6 --maxiter=800000 --update_interval=500 --save_dir=/Larvae/save_dir --epochs=50

On a pretty large line (typically GMR_72F11), we determined empirically that 500,000 < maxiter < 800,000, epochs=50 and update_interval=500 are satisfying values to obtain convergence.

However, on smaller lines, the update_interval may be reduced to 140, which is the value used in the paper. For more information on the arguments:

singularity exec --writable --nv -H $HOME:/home/$USER tensorflow_gpu.img/ python /Larvae/IDEC/IDEC_GPU.py -h

The other recipe allows the use MPI thanks to mpi4py to perform parallel computations on several nodes and speed up loading and processing of data. Usage example:

mpiexec -np 2 singularity exec --writable -H $HOME:/home/$USER -B /pasteur/projets/policy02/Larva-Screen/screens/:/screen centos7_mpi4py.img/ python /Larvae/utils/loader_mpi.py /screen --save_dir=/Larvae/test/ --lines=GMR_75G10_AE_01@UAS_TNT_2_0003,FCF_attP2_1500062@UAS_TNT_2_0003 --window=1 --screen=t5

For this to work, please make sure the OpenMPI version in the path is 3.1.1.

For more information on Singularity, please refer to https://www.sylabs.io/docs/

Compute and save metrics

We can easily compute the time distributions, transitions and probabilities of the different clusters using:

python utils/metrics.py --path=/screen --n_clusters=6 --line=line --window=1 --screen=t15 --idec_weight=/path/to/weights.h5

The line should here include the experimental process (ex: GMR_72F11_AE_01@UAS_Chrimson_Venus_X_0070/r_LED100_30s2x15s30s) in order to remain coherent with the hit analysis. The metrics are saved in .npz files supported by numpy, and plots are generated in .png format.

Generate TRX files for visualisation of the classification

Depending on the model you trained, use the corresponding utils/plot_classification file. Usage :

python visualize_classification_idec.py --path=path/to/trx/folder/ --n_clusters=6 --idec_weights=path/to/weights.h5 --window=1

Then simply run MATLAB_visualisation/Visualize_Larva_Clustering.m in MATLAB. You may need to resize the window by launching Visualize_Larva_Clustering.fig and modifying the settings.

VaDE

We also made our own implementation of the VaDE algorithm (Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering by Jiang et al.). The usage is similar to IDEC:

python VaDE_tensorflow.py --dataset=path/to/dataset.hdf5 --epochs=5000 --epochs_pretrain=50

For more info, please use

python VaDE_tensorflow.py -h

Authors

Nicolas Skatchkovsky, Decision and Bayesian Computation, Institut Pasteur
Jean-Baptiste Masson, Decision and Bayesian Computation, Institut Pasteur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Larvae

Prerequisites

Loading

IDEC

Singularity

Compute and save metrics

Generate TRX files for visualisation of the classification

VaDE

Authors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
IDEC		IDEC
MATLAB_visualisation		MATLAB_visualisation
Singularity		Singularity
VaDE		VaDE
utils		utils
.gitignore		.gitignore
README.md		README.md

nskat/Larvae

Folders and files

Latest commit

History

Repository files navigation

Larvae

Prerequisites

Loading

IDEC

Singularity

Compute and save metrics

Generate TRX files for visualisation of the classification

VaDE

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages