DoublePendulum

This is about make a RL agent to do the control of double pendulum (Pendubot) and achieve the swing up and balancing problem with parameters uncertainty.

Here are some videos that describe how our agent was able to swing up and balance the Pendubot, with parameters uncertainty using Greedy-Divide and Conquer algorithm:

in all teh videos the models were trained only on these parameters:

$m_1 = 1 Kg, m_2 = 1 Kg , l_1 = 160 mm, l_2 = 160 mm$

and we will test several cases

In this case we test the original configurations ↓:

1.1.160.160.mp4

In this case $l_1 = 70 mm$ was cahnged, the rest of the parameters are fixed ↓:

1.1.2.1.1.160.2.160.mp4

In this case the second mass $m_2 = 2 Kg$ was cahnged by 100% , the rest of the parameters are fixed ↓:

1.1.2.2.1.160.2.160.mp4

In this case the first mass $m_1 = 2 Kg$ was cahnged by 100% , the rest of the parameters are fixed ↓:

m1.2kg.mp4

In this case the first mass $m_1 = 3 Kg$ was cahnged by 200% , the rest of the parameters are fixed ↓:

m1.3kg.mp4

In this case the second mass $m_2 = 2.5 Kg$ was cahnged by 250% , the rest of the parameters are fixed ↓:

m2.2.5kg.mp4

In this case all the parameters are randomised $m_1 = 1.1 Kg, m_2 = 1.5 Kg , l_1 = 145 mm, l_2= 100 mm$ ↓:

Rand1.mp4

In this case all the parameters are randomised $m_1 = 1.1 Kg, m_2 = 1.3 Kg , l_1 = 120 mm, l_2 = 190 mm$ ↓:

Rand2.mp4

Prerequisites

rotli>=1.0.9
ConfigParser>=5.3.0
cryptography>=38.0.3
Cython>=0.29.32
dl>=0.1.0
docutils>=0.19
gym>=0.21.0
HTMLParser>=0.0.2
importlib_metadata>=4.13.0
ipaddr>=2.2.0
keyring>=23.11.0
lockfile>=0.12.2
lxml>=4.9.1
matplotlib>=3.6.1
mypy_extensions>=0.4.3
numpy>=1.23.4
opencv_python>=4.6.0.66
ordereddict>=1.1
protobuf>=4.21.9
pyOpenSSL>=22.1.0
scipy>=1.7.1
stable_baselines3>=1.6.2
typing_extensions>=4.4.0
wincertstore>=0.2.1
xmlrpclib>=1.0.1
zipp>=3.10.0

All the libraries can be pip installed using python3 -m pip install -r requirements.txt

Getting Started 'Single Pendulum'

Clone this repo (for help see this tutorial).
Navigate to repository folder
Install dependencies which are specified in requirements.txt. use python3 -m pip install -r requirements.txt
Run project.py.
Run Uncertainity.py if you want to test the model if there is uncertainity in mass of the pendulum.

Single Pendulum

The project start with single pendulum, it is better to run it on the local machine, because cv2.imshow() won't work and will give an error. You can set the parameter of your own system, it is very clear how to do that:

env = Pendulum(m=m, L=L, I=I, b=b, dt=dt, mode='balance')

System parameters

m       # mass of the pendulum bob
L       # length of pendulum bob
I       # inertia of actuator
b       # friction in actuator 
g       # gravity acceleration
dt      # step size
theta   # initial angle
dtheta  # initial angular speed
mode    # working mode ['balance', 'swing_up']
max_itr # maximum iteration of episode balance = 200, swing_up = 500

Pendulum mode:

If the mode is set to balance the pedulum will have the following consumption:

Start near the balance angle.
Get +1 reward if agent maintain he angle of pendulum between [-12, 12].
Terminate if get outside the angle range.
Maximum episode iteration will be 200 by default and the maximum return will be 200.

If the mode is set to swing_up the pedulum will have the following consumption:

Start near the down balance angle.
reward = -(2theta^2 + 0.1d_theta^2 + 0.01*tourq^2).
Terminate if the agent maintain the theta between [-12, 12] for time bigger than the half of maximum episode iteration.
Maximum episode iteration will be 500 by default.

Test the system

For testing the system you can run the system by taking a random action from action space and apply it to the system, to do that you can use the following code:

# Taking random actions and show the real time simulation
while True:
  # Take a random action
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)
  
  # Render the game
  env.render(mode = "human")
  
  if done == True:
    break
cv2.waitKey(2000)
env.close()

After running the previous you will got the following result if the mode is set to balance:

And the following result if the mode is set to swing_up:

Train the agent

We have now to train the agent, depend on the mode you can set the maximum number of iteration for training, swing_up mode is more general but also need much more time to be trained than the balance mode:

balance mode:

model = DummyVecEnv([lambda: env])
model = PPO('MlpPolicy', model, verbose = 1)

model.learn(total_timesteps=20000)

swing_up mode:

model = DummyVecEnv([lambda: env])
model = PPO('MlpPolicy', model, verbose = 1)

model.learn(total_timesteps=200000)

Test the agent

To test the agent we first activate the continues running mode:

env.continues_run_mode = True

In this mode the system will interact with the user, whom can use the keyboard to apply outside disturbance to the system, the user can use the arrows to increase or decrease the amount of external tourque she/he wants to apply, and the direction also, and also he can exit by pressinf any other key.

# Evaluating the results of training 
env.continues_run_mode = True
print(evaluate_policy(model, env, n_eval_episodes=1, render=True))
env.close()

i, up arrow      : increase the external tourque
d, down arrow    : decrease the external tourque
l, left arrow    : apply the external tourque to the left
r, right arrow   : apply the external tourque to the right
q, any other key : finish the testing

You can see the previous in the following window:

Test with mass change

balance mode We change the mass randomally (20%) then we evalute the model and we have success rate of 100%, and that is logical because the system is fully actuated and the only situation it could fail if the tourqe of the motor is not able to hold the mass.

swing_up mode We change the mass randomally (20%) then we evalute the model and we have success rate of 100%, and that is logical because the system is fully actuated and the only situation it could fail if the tourqe of the motor is not able to hold the mass, but here it is very clear that the return is very connected to the value of the mass because the dynamics of the system will change, which mean the response of the system for any action will be diffferent, but the result is good enough and we won't made any improvment.

Dynamics of Double Pendulum (Pendubot)

a double pendulum is a pendulum with another pendulum attached to its end, is a simple physical system that exhibits rich dynamic behavior with a strong sensitivity to initial conditions.

The motion of a double pendulum is governed by a set of coupled ordinary differential equations and is chaotic.

θ₁'' =	−g (2 m₁ + m₂) sin θ₁ − m₂ g sin(θ₁ − 2 θ₂) − 2 sin(θ₁ − θ₂) m₂ (θ₂'² L₂ + θ₁'² L₁ cos(θ₁ − θ₂))
	L₁ (2 m₁ + m₂ − m₂ cos(2 θ₁ − 2 θ₂))

θ₂'' =	2 sin(θ₁ − θ₂) (θ₁'² L₁ (m₁ + m₂) + g(m₁ + m₂) cos θ₁ + θ₂'² L₂ m₂ cos(θ₁ − θ₂))
	L₂ (2 m₁ + m₂ − m₂ cos(2 θ₁ − 2 θ₂))

and after solving the differntial equations using scipy library;

from scipy.integrate import odeint
sol = odeint(self.sys_ode, x0, [0, self.dt], args=(action, ))
self.theta1, self.theta2, self.dtheta1, self.dtheta2 = sol[-1, 0], sol[-1, 1], sol[-1, 2], sol[-1, 3]

and then simply plotting the results after calculating the positions of the masses we get:

but as you see we still have the problem of friction, without it the model is not realistic enough and moduling the previous equations into python, and for that we solve the problem by using Dynamics of Manipulators (we consider the Double Pendulum as a 2DOF manipulator).

The Equation of motion for most mechanical systems may be written in following form:

$Q=D(q)q¨+C(q,q˙)q˙+g(q)+Qd=D(q)q¨+c(q,q˙)+g(q)+Qd=D(q)q¨+h(q,q˙)+Qd$

where:

$\mathbf{Q} \in \mathbb{R}^n$ - generalized forces corresponding to generilized coordinates

$\mathbf{Q}_d \in \mathbb{R}^n$ - generalized disippative forces (for instance friction)

$\mathbf{q} \in \mathbb{R}^{n}$ - vector of generilized coordinates

$\mathbf{D} \in \mathbb{R}^{n \times n}$ - positive definite symmetric inertia matrix

$\mathbf{C} \in \mathbb{R}^{n \times n}$ - describe 'coefficients' of centrifugal and Coriolis forces

$\mathbf{g} \in \mathbb{R}^{n}$ - describes effect of gravity and other position depending forces

$\mathbf{h} \in \mathbb{R}^n$ - combined effect of $\mathbf{g}$ and $\mathbf{C}$

In order to find the EoM we will use the Lagrange-Euler equations:

$d/dt(∂L/∂q˙i)−∂L/∂qi=Qi−∂R/∂q˙i,i=1,2,…,n$

where:

$\mathcal{L}(\mathbf{q},\dot{\mathbf{q}}) \triangleq E_K - E_\Pi \in \mathbb{R}$ Lagrangian of the system

$\mathcal{R} \in \mathbb{R}$ Rayleigh function (describes energy dissipation)

and here we add two dissipative elements in this system, namely "dampers" with coefficients $b1,b2$ (viscous friction), their dissipation function is given as:

$\mathcal{R} = \frac{1}{2}\ ∑ b_j \dot{\alpha}^2_j$

and after applying Lagrange formalism to obtain equations of motion;

$I_1\ddot{\alpha}_1 + l_1^2 (m_1 + m_2) \ddot{\alpha}_1 + l_1 l_2 m_2 \cos(\alpha_1 - \alpha_2)\ddot{\alpha}_2 +$ $l_1 l_2 m_2 \sin(\alpha_1 - \alpha_2)\dot{\alpha}^2_2$ $+l_1 m_1 g \cos \alpha_1 + l_1 m_2 g \cos \alpha_2 + b_1 \dot{\alpha}_1 =u_1$ $l_1 l_2 m_2 \cos(\alpha_1 - \alpha_2)\ddot{\alpha}_1 + I_2 \ddot{\alpha} + l_2^2 m_2 \ddot{\alpha}_2 - l_2 m_2 l_1 \sin(\alpha_1 - \alpha_2)\dot{\alpha}^2_1 + l_2 m_2 g \cos \alpha_2+ b_2 \dot{\alpha}_2 = u_2$

Now we can find the $D,C,g$ . all details are in the code DynamicsDP.py.

and so we get:

Training the Double Pendulum - Balancing

for balancing the task is simple enough to be solved by a the idea of a simple reward function; since the DP is starting from around the vertical position, we just give negatice reward for speeds and the theta1 and theta2 to be far away from the vertical position as follows;

Training the Double Pendulum - Swing_up

we've tried over 50 different reward functions (the models and the some of the reward functions are uploaded) to make it swing up AND balance! and we couldn't succeed until we used an if statement in the reward function, and for continuous input RL problems, continuous Reward functions might not work! for example one of the reward functions (that will be explained in details);

$reward= - ( a ({\pi}- \theta_2)^4+ b (\theta_1)^2 + c (\dot{\theta_1})^2 (1.1- cos(\theta_1)) + d \dot{\theta_2}^2 (1.1- cos(\theta_1)) + e 0.01 Tourq^2)$

it was trained for 10,000,000 timesteps for 24 hours and it could not swing up and balance, it could only swing up. then it occured to us since it seems that it needs 2 agents, one for swing up and the other for balancing vertically , we decided to use if statement in the reward function.

The task that the agent must perform consists of two phases. In the first one, it has to swing the second pendulum vertically. In the second, while keeping the balance, it has to move the first pendulum to the target point. For this reason, the reward function has been split into two expressions. The first is the weighted sum of the linear dependencies of pendulum deflection angles. It is awarded when the second pendulum is inclined from the vertical by an angle of more than 10°. The values of the parameters of this sum were selected to promote the swing of the second pendulum more than to align the first one. The second formula works when the angle of the pendulum to the vertical is less than 10°. In this phase, the agent must be concerned mainly with not losing his balance and moving the first pendulum closer to the target point. For this reason, this part of the function is a linear dependence of only the angle of the first pendulum plus a penalty for loss of balance. (all angles in the equation are normalized, ${\theta_1}, {\theta_2} \in [0,1]$ )

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
Test Models		Test Models
V1		V1
Videos		Videos
img		img
models		models
Double_Pendulum_Dynamics_with_Simulation.py		Double_Pendulum_Dynamics_with_Simulation.py
DynamicsDP.py		DynamicsDP.py
Pendulum.py		Pendulum.py
Pendulum2D.py		Pendulum2D.py
README.md		README.md
Uncertainity.py		Uncertainity.py
project.py		project.py
requirements.txt		requirements.txt
retrain.py		retrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoublePendulum

Prerequisites

Getting Started 'Single Pendulum'

Single Pendulum

System parameters

Pendulum mode:

Test the system

Train the agent

Test the agent

Test with mass change

Dynamics of Double Pendulum (Pendubot)

Training the Double Pendulum - Balancing

Training the Double Pendulum - Swing_up

About

Releases

Packages

Contributors 2

Languages

Reinforcement-Learning-F22/DoublePendulum

Folders and files

Latest commit

History

Repository files navigation

DoublePendulum

Prerequisites

Getting Started 'Single Pendulum'

Single Pendulum

System parameters

Pendulum mode:

Test the system

Train the agent

Test the agent

Test with mass change

Dynamics of Double Pendulum (Pendubot)

Training the Double Pendulum - Balancing

Training the Double Pendulum - Swing_up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages