Legged robots stand still despite the rewards of tracking linear and angular velocities #602

RainJCloude · 2024-06-04T09:56:02Z

RainJCloude
Jun 4, 2024

Hello, I am trying to use Isaac Orbit to train solo12. I followed the guide to generate the .usd, and the robot spawns in the environment. I am also using the same reward functions of ANYmal (of course changing the termination and every thing related to the name of the links), and the same observation and action space. However, solo12 prefer to stand still after falling several time instead of moving.

I really have no idea that why this is happening. It doesn't make sense that the robot that receive reward function for the tracking of linear and angular velocity, remains still.

Again, the rl_task_env_cfg is basically the same as the repository, except the termination condition. Also the rest of the training algorithm is the same. The only thing that I've generated is the asset solo12.py in which i gave the gains for the PD controller and the initial configuration.

Thanks in advance

Mayankm96 · 2024-06-04T11:47:39Z

Mayankm96
Jun 4, 2024
Maintainer

Are you using the rough terrain, or is this on the flat terrain?

The robot may learn to stay still if the penalty for moving is too high compared to the reward for the task. It is hard to provide feedback on that without more information.

0 replies

RainJCloude · 2024-06-04T12:27:52Z

RainJCloude
Jun 4, 2024
Author

Flat terrain. I know that it's hard try to help me with these few information, but I am using the whole orbit code without any kind of modification, just with another robot.
The fact that it is standing still could be due to wrong setting of PD gains? A wrong initial configuration? A wrong conversion of the URDF to USD?
Have anyone ever had a similar issue?

0 replies

dhananjayrajmane · 2024-06-07T04:33:03Z

dhananjayrajmane
Jun 7, 2024

@RainJCloude Can you check enabled_self_collisions variable ?
Set to false and check. Similar kind of issue faced previously, after setting enabled_self_collisions variable to False training worked perfect

0 replies

RainJCloude · 2024-06-10T10:10:22Z

RainJCloude
Jun 10, 2024
Author

It is set to False. I've basically copied and pasted all the code but just including my Usd. I also think that my usd is correct because I am able to give open loop command to the joints through the Omnigraph controller.

0 replies

RainJCloude · 2024-06-10T15:36:56Z

RainJCloude
Jun 10, 2024
Author

Okay maybe I can explain in a better way my problem:

The quadruped is not able to train because the value loss starts from a gigantic value, and the robot never learns to walk within its 900 training episodes.

About the configuration of the environment I've set reward, and observation copying that from UnitreeGo1.

About the asset, I've used this values for the actuator, maybe the problem could be related to these:

ACTUATOR_LEGS_ALL = IdealPDActuatorCfg(
    joint_names_expr=[".*_HAA", ".*_HFE", ".*_KFE"],
    effort_limit=5.0,
    velocity_limit=1.0,
    stiffness={".*": 5},
    damping={".*": 0.05}
)

0 replies

KyleM73 · 2024-06-22T18:54:42Z

KyleM73
Jun 22, 2024

I know this issue is a little old but a few suggestions:

let it keep training. even for standard robots like Go1 ive seen it learn to stand still and it wasn't until iteration ~2000 that it started learning to walk
turn down the weight of l2 acceleration, it should never be that large
record videos during training so you can see what behaviors the robot is learning
increase your PD actuator gains depending on the size of the robot. for reference Spot (35 kg) is P=60 D=1.5, Go1 (13 kg) is P=20 D=0.5
make the standard deviation param for velocity tracking smaller. I find that the defaults are too permissive and rewards the robot even for somewhat large velocity errors. Making std smaller forces the policy to minimize the error more aggressively. for example see this comparison: https://www.desmos.com/calculator/hchecap9yl
you might also experiment with a normed exponential kernel instead of a square exponential, as the norm exp(-||x||/std^2) is more aggressive (but has some artifacting for very small errors about zero)

good luck!!

0 replies

RainJCloude · 2024-06-26T07:15:55Z

RainJCloude
Jun 26, 2024
Author

Thank you so much!
I solved last week and the problem was actually the acceleration.
Moreover, also solo12 also need 800 episode to be trained. Do you know why it requires so much time whereas ANYmal learns to walk in just 300 episodes?

thanks again a lot. Your suggestions have been really appreciated

0 replies

KyleM73 · 2024-06-26T16:08:58Z

KyleM73
Jun 26, 2024

Great! As for the Anymal and Go1 training speeds, I think they're influenced by two factors:

Both robots have actuator nets trained for them. This helps bias the policy towards walking very quickly.
The rewards were tuned for these robots specifically- with sufficient reward tuning other robots should be able to approach similar training speeds, but it won't happen out of the box with rewards designed for a different robot (even though very similar)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legged robots stand still despite the rewards of tracking linear and angular velocities #602

{{title}}

Replies: 8 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Legged robots stand still despite the rewards of tracking linear and angular velocities #602

RainJCloude Jun 4, 2024

Replies: 8 comments

Mayankm96 Jun 4, 2024 Maintainer

RainJCloude Jun 4, 2024 Author

dhananjayrajmane Jun 7, 2024

RainJCloude Jun 10, 2024 Author

RainJCloude Jun 10, 2024 Author

KyleM73 Jun 22, 2024

RainJCloude Jun 26, 2024 Author

KyleM73 Jun 26, 2024

RainJCloude
Jun 4, 2024

Mayankm96
Jun 4, 2024
Maintainer

RainJCloude
Jun 4, 2024
Author

dhananjayrajmane
Jun 7, 2024

RainJCloude
Jun 10, 2024
Author

RainJCloude
Jun 10, 2024
Author

KyleM73
Jun 22, 2024

RainJCloude
Jun 26, 2024
Author

KyleM73
Jun 26, 2024