- Provide Hyperparameter Optimization(HPO) as a service to choose the optimal hyperparameters provided by the user for any model.
- Primary use case is to optimize the performance of an environment by tuning below layers independently or together.
- Container : resource usage like cpu and memory requests /limits.
- Runtime: Ex : hotspot, openj9, nodeJS , etc.
- Application stack: Ex: Quarkus , Liberty, postgres etc.
- OS layer.
To improve the performance of the runtime / application server we would have lots of tunables to tune and those will vary with the application and hardware we are running on. Manual tuning the multiple parameters with a large range is so cumbersome, and is hard to come up with an optimal configuration.
HPO as a service helps the user find the optimal configuration values for any model.
- Step 1 : Start HPOaaS.
- Step 2 : Start a new experiment with provided search space.
- Step 3 : Get the HPO config from HPOaaS
- Step 4 : Run the benchmark with HPO config.
- Step 5 : Send the results of benchmark to HPOaaS
- Step 6 : Generate a subsequent trial
- Step 7 : Loop through Step 3 to 6 for the remaining trials of an experiment.
Hyperparameter optimization(HPO) is choosing a set of optimal hyperparameters that yields an optimal performance based on the predefined objective function.
- Search space: List of tunables with the ranges to optimize.
- Study (an experiment in Autotune): It is to find the optimal set of tunable values through multiple trials.
- Trials: Each trial is an execution of an objective function by running a benchmark/ application with the configuration generated by Bayesian.
- Objective function: Decides where to sample in upcoming trials and returns the value which represents the performance of tunables (hyper parameters).
- Optuna
- TPE: Tree-structured Parzen Estimator sampler.
- TPE with multivariate
- optuna-scikit
The above tools mentioned supports Bayesian optimization which is part of a class of sequential model-based optimization(SMBO) algorithms for using results from a previous trial to improve the next.
Machine learning is a process of teaching a system to make accurate predictions based on the data fed. Hyperparamter optimization (/ tuning) in machine learning has different methods like Manual, Random search, Grid search, Bayesian optimization for increased efficiency. This HPO uses Bayesian optimization because of its multiple advantages.
To tune any OS / application stack, there would be multiple tunables available for different layers and it is difficult to tune each of them based on the ranges available for the tunable and requirement we have. Based on the objective function defined, bayesian helps in finding the optimal values for those tunables considering the past results. It explores the tunable ranges for the first few trials and exploits them in the next trials in a narrow range.
HPO as a service provides you to
- select any tool / framework to use. Currently, it supports optuna.
- Select any algorithms of the framework. Currently, it uses TPE from optuna.
- Tune one or multiple stacks. Supports both kubernetes and bare models.
- Add your own layer of tunables which need to be tuned.
- Configure how many trials you need for an experiment to get the optimal set.
- Run an experiment without any previous data available.
- Append the existing data of an application/ benchmark to optuna study to generate the next configuration. This helps in re-using the available data to come up with an optimal set of values in lesser time.
- Supports multiple optuna study for multiple experiments.
- Tune resource usage of a container in a cluster.
- Improve the response time of an application on an openshift cluster run on hotspot + Quarkus.
- Few experiments are done for this use case with TechEmpower Quarkus benchmark on openshift cluster and the results are updated in kruize/autotune-results repo. In this case, a total of 31 tunables are used to optimize the performance and with multiple experiments, objective function was also improved to reflect the improvements of Performance which are defined in the repo.
- Tune the RHEL OS environment variables to improve the performance.
- Use the existing application data (from Horreum) to generate the next optimal configuration set.
With Autotune: https://github.com/kruize/autotune/blob/master/docs/autotune_install.md
Containerised HPO: TBU
With Scripts: https://github.com/kruize/em-hpo-scripts/blob/main/README.md