Legged robots stand still despite the rewards of tracking linear and angular velocities #602
Replies: 8 comments
-
Are you using the rough terrain, or is this on the flat terrain? The robot may learn to stay still if the penalty for moving is too high compared to the reward for the task. It is hard to provide feedback on that without more information. |
Beta Was this translation helpful? Give feedback.
-
Flat terrain. I know that it's hard try to help me with these few information, but I am using the whole orbit code without any kind of modification, just with another robot. |
Beta Was this translation helpful? Give feedback.
-
@RainJCloude Can you check enabled_self_collisions variable ? |
Beta Was this translation helpful? Give feedback.
-
It is set to False. I've basically copied and pasted all the code but just including my Usd. I also think that my usd is correct because I am able to give open loop command to the joints through the Omnigraph controller. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I know this issue is a little old but a few suggestions:
good luck!! |
Beta Was this translation helpful? Give feedback.
-
Thank you so much! thanks again a lot. Your suggestions have been really appreciated |
Beta Was this translation helpful? Give feedback.
-
Great! As for the Anymal and Go1 training speeds, I think they're influenced by two factors:
|
Beta Was this translation helpful? Give feedback.
-
Hello, I am trying to use Isaac Orbit to train solo12. I followed the guide to generate the .usd, and the robot spawns in the environment. I am also using the same reward functions of ANYmal (of course changing the termination and every thing related to the name of the links), and the same observation and action space. However, solo12 prefer to stand still after falling several time instead of moving.
I really have no idea that why this is happening. It doesn't make sense that the robot that receive reward function for the tracking of linear and angular velocity, remains still.
Again, the rl_task_env_cfg is basically the same as the repository, except the termination condition. Also the rest of the training algorithm is the same. The only thing that I've generated is the asset solo12.py in which i gave the gains for the PD controller and the initial configuration.
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions