Skip to content

Latest commit

 

History

History
189 lines (159 loc) · 7.78 KB

readme.md

File metadata and controls

189 lines (159 loc) · 7.78 KB

Robot Navigation using Reinforcement Learning

This repo solves robot navigation using deep reinforcement learning. Two cases are implemented: discrete action space and continuous action space. The envrionment is inspired from turtlebot3 and is build from scratch in world.py file. Also, there are seperate files for DQN, DDPG, and TD3.

Environment

There are two variants of the environments: one with obstalces and another without. The first variant is shown in the image below, where the robot navigates within the gray areas, within the bounds of the green pillars and black walls. The white circles indicates obstacles. Also

Image of World

We converted the original map to a grayscale version using computer vision techniques. Now the white space indicates movable areas while the black circles within the wall indicates the obstacles. See the file rl_map.ipynb for details.

Image of World with Robot

With this background information, we build our world having the following additional properties.

  1. Our observation space given by x, y and theta indicating the robot's current pose and the target position specified by its x and y coordinates.
  2. x and y have a range of (0 to the maximum x & y coordinates of the world), while theta has a range of (-pi to pi)
  3. The action space is given by the linear (v) and angular velocity (w) of the robot.
  4. v and w are designed to be continous between 0 to 1 for v and -1 to 1 for w. These based values are scaled by 20 - 100 for v and 3 for w, in some World implementations.
  5. The initial robot pose and the target pose are initialized randomly within the movable area, with only two constraints.
    1. The robot pose and the target pose should not be equivalent
    2. The robot pose and the target pose should not be on the walls or obstacles.
  6. We calculate the reward signal as a function of the euclidean distance to the target.
  7. We set a threshold of 3units to acceptable target and robot location equivalence.
  8. Given an action (v, w), new position and orientation are calculated using the integration approximation formula
    x0, y0, t0 = robot_pose 
    x = x0 + v*cos(t0 + 0.5*w)
    y = y0 + v*sin(t0 + 0.5*w)
    t = t0 + w   //wrapped to values between -pi to pi
    

Map Version

As stated before, two map versions exist. The map shown in the preceeding sections (with obstalces), and another shown below, (without obstacles).

The env can be created using the code fragment below,which generate the environment without obstacles:

env = gym.make('bot3RLNav/World-v2', map_file="data/map01.jpg",
             robot_file="data/robot.png")

Image of World

Note: World-v1 is the environment with obstalces and World-v2 is the environment without obstacles.

Reward System

Walls and all obstacles are assigned a negative reward -10. The step method also sends a done signal when reward is negative. Rewards for the navigable region are calculated in two sections as explained below.

Distance Reward

This calculates the distance between robot pose and target location and generates a reward from it.

distance = sqrt(((xg - x)^2) + ((yg - y)^2))
reward = 1/(1 + distance)

By this system, rewards are always between 0 and 1. When robot is close to target, the distance is zero, which in turn yields reward = 1/(1 + 0) = 1/1 = 1.

Conversely, as robot moves farther from goal, distance becomes large, and reward tends to zero. reward = 1/(1 + inf) ~= 1/inf ~= 0

Bearing reward.

We generate another reward which is a function of the robot's orientation and the orientation it needs to take in order to align with the target. This reward is also set between 0 and 1. 0 for when the robot's orientation is completely offset, and 1 for when it is aligned to the target. Given robot pose denoted as x, y, theta and goal location denoted as xg, yg, we calculate reward as

bearing = arctan2(yg - y, xg - x)
error = bearing - theta

This gives an error between 0 and pi, which we then we normalize this error to values between 0 and 1

error = error/pi
reward = 1 - error

With the above system, the highest error is 1 and the lowest 0 and similarly, highest reward is 1 and lowest is 0.

The distance reward is then added to the bearing reward to obtain a maximum of 2 and a minimum of 0.

Helpers

Workspace Setup

Set up a conda environment.

To activate terminal.

conda activate bot3RLNavigation

Install gym using the code below. See here for details.

conda install -c conda-forge gym

To create requirements file. See More

conda list -e > requirements.txt

For pip, use

pip freeze > pip_reqs.txt

You can create an environment for work using this package's requirements via

conda create --name <env> --file requirements.txt

Install opencv. See here for details.

conda install -c conda-forge opencv

Install PILLOW See here for details

conda install -c conda-forge pillow

Install stable-baseline3 See here for more details.

conda install -c conda-forge stable-baselines3

Gym Environment Registration

In the __init__.py file in envs add an import statement to your created world.
In the package's __init__.py supply the id, entry_point and other necessary options for your environment.
After registration, our custom World environment can be created with

env = gym.make('id')

See the mentioned __init__ files for details.

Simulation

To render simulation using World-v2, use the format below

env.reset()
name = "bot3"
cv2.namedWindow(name)
rate = 500  # frame rate in ms
while True:
    frame = env.render(mode="rgb_array")
    cv2.imshow("bot3", frame)
    cv2.waitKey(rate)
    action = env.action_space.sample() # random (or use policy)
    obs, reward, done, info step = env.step(action)
    if done:
      frame = env.render(mode="rgb_array")
      cv2.imshow("bot3", frame)
      cv2.waitKey(rate)
      break

Resetting

To set fixed targets use the code block below. This feature means, the robot initial pose and the target location are always reset to the same first poses that the preceeding .reset() call made.

env = gym.make(...)
env.reset(options={'reset': False})
# or env.reset(options=dict(reset=False))

Works on v0 and all the subclasses.

Methodology

The network archeticture for the three networks: Image of World

Results

DQN:

DDPG:

TD3:

References

  1. Custom Environment
  2. Spaces
  3. Gym Github
  4. Markdown
  5. Create CV Window
  6. CV imshow
  7. DDPG Network
  8. TD3 Network
  9. Network Archeticture