Skip to content

Latest commit

 

History

History
124 lines (98 loc) · 12.5 KB

README.md

File metadata and controls

124 lines (98 loc) · 12.5 KB

Notes

This is a forked repository of OpenAI Gym, so that functionalities and environemtns are identical except we add two quadrotor models, rate control and ball bounding quadrotor model. Threfore, this repo only focuses on these models and please refer to OpenAI Gym for more detail (e.g., installation, and other models).

Training and testing system requirements

We use Mujoco 2.0 and Mujoco-py (python wrapper). It is recommended to create a python 3.x conda environment for simplicity. Once you install this repo, check installed environments by executing the following

$python
from gym import envs
print(envs.registry.all())

You should be able to find two custom environments; “QuadRate-v0” and “BallBouncingQuad-v0”. You also need the following repos for testing and training these environments.

1. Continuous control a quadrotor via rate commands (rate control)

The animation below is what you can expect after training your rate control model. Rate control implies that we command body-rate for 3 axes; roll (rotating along x-axis which is forward direction of the vehicle), pitch (rotating along y-axis which is left), and yaw (rotating along z-axis which is up); . The units are rad/s for rates and [0,1] for thrust.

The task of this environment is very simple that we provide a goal position, and a policy is trained to minimize goal to vehicle distance (i.e., maximize cumulative reward). For the detail for reward shaping, please have a look here. For training, we use PPO2 provided from stable-baselines. In summary, we have 4 input commands, , and 13 observations, . Note that is unit quaternion (4x1) and others are correspondence to position, linear-, angular-velocity respectively (3x1 vector for each).

In this example, we set .

quad_rate.py is OpenAI environment file and quadrotor_quat.xml is Mujoco model file that describes physical vehicle model and other simulation properties such as air-density, gravity, viscosity, and so on. quadrotor_quat_fancy.xml is only for fancier rendering (more effects for lighting, shadow etc.) which usually takes more time to visualize. It is thus recommdended to use quadrotor_quat.xml for the sake of training time.

1.1 Testing pre-trained weight

We provide a pre-trained weight and you can obtain it from another repository.

  • clone it and go to openai_train_scripts folder and execute the follow command
source ./rateQuad_test_script.sh ./model

You should be able to see the same animation we saw earlier. Please note that you need to change system dependent variables (e.g., RL_BASELINES_ZOO_PATH) as of yours.

1.1.1 What does the policy learn?

As we can see from the above animation, our agent is able to fly to the goal and hover at that position. In order to do this task, the policy has to learn underlying attitude and position controllers. The former governs to control attitudes of the vehicle which are roll, pitch, and yaw angles and the latter deals with regulating position (i.e., tracking position error and minimiing it).

1.2 Hyperparameters

It is often quite important to properly tune hyperparameters for a particular environment yet PPO2 is relatively robust to these params. We use the following setup as a suboptimal configuration and it seems to work well. But always you are more than welcome to tune your own params and test it. One can find hyperparameters from here

The table below summarizes hyperparameters used for training both rate control and ball bouncing quadrotor.

Name of param Value
normalize true
n_envs 32
n_timesteps 50e7
policy 'MlpPolicy'
policy_act_fun 'tanh'
n_steps 2048
nminibatches 50e7
lam 0.95
noptepochs 10
ent_coef 0.001
learning_rate 2.5e-4
cliprange 0.2
max_episode_steps 8000
reward_threshold 9600

1.3 Training procedures

Analogous to above testing, training can be easily done if you already installed dependencies. Go to openai_train_scripts folder and execute the follow command

source ./train_rateQuad_script_module.sh

Please note that you need to change system dependent variables (e.g., RL_BASELINES_ZOO_PATH, --tensorboard-log, --log-folder, and lockFile) as of yours.

2. Ball Bouncing Quadrotor (BBQ)

This environment is minor extension of the privous environment such is rate control. We introduce a ball above the vehicle and shape the reward in the way of hitting the ball at the center of the vehicle. Below animation demonstartes this.

Similar to the preivous example, we have 4 input commands , , but 19 observations, . Note that is unit quaternion (4x1) and others are correspondence to the vehicle and ball position, linear velocity of vehicle and ball, and vehicle angular velocity respectively (all 3x1).

One tricky thing for this model was simulating elastic collision (Mujoco 1.5 didn't fully suport this). According to their description regarding Mujoco 2.0, full elastic simulation is supported and a user can set it by specifying negative number in solref (see here). For those who want to know in-depth explanation, please refer to link1 and link2)

The trained policy performs well (I think) but sometimes it can't handle flowing off ball when bouncing is very small.

ball_bouncing_quad.py is OpenAI environment file and ball_bouncing_quad.xml is Mujoco model file that describes physical vehicle and ball models and other simulation properties such as air-density, gravity, viscosity, and so on. ball_bouncing_quad_fancy.xml is only for fancier rendering (more effects for lighting, shadow etc.) which usually takes more time to visualize. It is thus recommdended to use ball_bouncing_quad.xml for the sake of training time. Note that in this Mujoco model, we set contype and conaffinity as 0 for the vehicle arms and propellers to avoid possible collisions with ball. Only the top plate has contype and conaffinity of 1 to enable collision with ball. This may be different to real quadrotor scenario.

2.1 Testing pre-trained weight

We provide a pre-trained weight and you can obtain it from another repository.

  • clone it and go to openai_train_scripts folder and execute the follow command
source ./bbq_test_script.sh ./model

You should be able to see the same animation we saw earlier.

2.2 Hyperparameters

The same hyperparameters used as of rate control model.

2.3 Training procedures

Analogous to above testing, training can be easily done if you already installed dependencies. Go to openai_train_scripts folder and execute the follow command

source ./train_bbq_script_module.sh

3. Deploying the trained weight to real-world

WIP...but you can have a look our previous work on Control of a Quadrotor with Reinforcement Learning (i.e., outputting direct rotor speed commands instead rate command). Please stay tune and we will update once we have some interesting results.

Publications

If our work helps your works in an academic/research context, please cite the following publication(s):

@ARTICLE{7961277, 
author={J. {Hwangbo} and I. {Sa} and R. {Siegwart} and M. {Hutter}}, 
journal={IEEE Robotics and Automation Letters}, 
title={Control of a Quadrotor With Reinforcement Learning}, 
year={2017}, 
volume={2}, 
number={4}, 
pages={2096-2103}, 
keywords={aircraft control;helicopters;learning systems;neurocontrollers;stability;step response;quadrotor control;reinforcement learning;neural network;step response;stabilization;Trajectory;Junctions;Learning (artificial intelligence);Computational modeling;Neural networks;Robots;Optimization;Aerial systems: mechanics and control;learning and adaptive systems}, 
doi={10.1109/LRA.2017.2720851}, 
ISSN={2377-3766}, 
month={Oct},}