This repo contains the PyTorch implementation of two Reinforcement Learning algorithms:
- PPO (Proximal Policy Optimization) (paper)
- VIME-PPO (Variational Information Maximizing Exploration) (paper)
The code makes use of openai/baselines.
The PPO implementation is mainly taken from ikostrikov/pytorch-a2c-ppo-acktr-gail.
The main novelty in this repository consists of the implementation of the VIME's exploration strategy using the PPO algorithm.
- Python 3
- PyTorch
- OpenAI baselines
In order to install requirements, follow:
pip install -r requirements.txt
If you don't have mujoco installed, follow the intructions here.
If having issues with OpenAI baselines, try:
# Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
In order to run InvertedDoublePendulum-v2 with VIME, you can use the following command:
python main.py --env-name InvertedDoublePendulum-v2 --algo vime-ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --num-env-steps 1000000 --use-linear-lr-decay --no-cuda --log-dir /tmp/doublependulum/vimeppo/vimeppo-0 --seed 0 --use-proper-time-limits --eta 0.01
Instead, to run experiments with PPO, just replace vime-ppo
with ppo
.
For standard gym environments, I used --eta 0.01
.
For sparse gym environments, I used --eta 0.0001
.
[the number in parenthesis represents how many experiments have been run]
Any gym-compatible environment can be run, but the hyperparameters have not been tested for all of them.
However, the parameters used with the InvertedDoublePendulum-v2 example in the Instructions are, generally, good enough for other mujoco environments.
- Integrate more args into the command line