Skip to content

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

License

Notifications You must be signed in to change notification settings

mazpie/vime-pytorch

Repository files navigation

vime-pytorch

This repo contains the PyTorch implementation of two Reinforcement Learning algorithms:

  • PPO (Proximal Policy Optimization) (paper)
  • VIME-PPO (Variational Information Maximizing Exploration) (paper)

The code makes use of openai/baselines.

The PPO implementation is mainly taken from ikostrikov/pytorch-a2c-ppo-acktr-gail.

The main novelty in this repository consists of the implementation of the VIME's exploration strategy using the PPO algorithm.

Requirements

In order to install requirements, follow:

pip install -r requirements.txt

If you don't have mujoco installed, follow the intructions here.

If having issues with OpenAI baselines, try:

# Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .

Instructions

In order to run InvertedDoublePendulum-v2 with VIME, you can use the following command:

python main.py --env-name InvertedDoublePendulum-v2 --algo vime-ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --num-env-steps 1000000 --use-linear-lr-decay --no-cuda --log-dir /tmp/doublependulum/vimeppo/vimeppo-0 --seed 0 --use-proper-time-limits --eta 0.01

Instead, to run experiments with PPO, just replace vime-ppo with ppo.

Results

For standard gym environments, I used --eta 0.01.

MountainCar-v0

InvertedDoublePendulum-v2

For sparse gym environments, I used --eta 0.0001.

MountainCar-v0-Sparse

HalfCheetah-v3-Sparse

[the number in parenthesis represents how many experiments have been run]

Note:

Any gym-compatible environment can be run, but the hyperparameters have not been tested for all of them.

However, the parameters used with the InvertedDoublePendulum-v2 example in the Instructions are, generally, good enough for other mujoco environments.

TODO:

  • Integrate more args into the command line

About

PyTorch implementation of the VIME paper (Variational Information Maximizing Exploration)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published