This repo contains explorations of autoencoders in different settings using Tensorflow and Keras.
The main experiment focuses on the RandNet architecture (see Chen et at) for unsupervised anomaly detection; the model training can be run by calling the train_ensemble.py
script.
The Wordline file needs to be downloaded by hand to a data
folder (the script should run ok with other data sets if you change the DATA_PATH
and INPUT_SHAPE
params). Other training parameters can be changed by modifying the appropriate capitalized variables.
The notebooks contain
- useful utility functions for tensorboard logging (both metrics and images),
- constructing custom Keras models with weight masking and custom training steps (e.g., variational autoencoders),
- custom data loaders that change the model input per epoch for adaptive sampling.
We experiment with a batch-adaptive sampling method that increases the batch size over the epochs. This results in covering more of the training data as we progress in learning.
More ideas to explore:
- Sparse autoencoders
- Semantic hashing for texts
- Layer-wise pretraining of deep autoencoders
Resources (more links in the notebooks):