Skip to content

πŸ”₯ Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning.

License

Notifications You must be signed in to change notification settings

SafeRL-Lab/Robust-Gymnasium

Repository files navigation

Logo

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Paper Β· Website Β· Code Β· Tutorial (under active development) Β· Issue



This benchmark aims to advance robust reinforcement learning (RL) for real-world applications and domain adaptation. The benchmark provides a comprehensive set of tasks that cover various robustness requirements in the face of uncertainty on state, action, reward and environmental dynamics, and span diverse applications including control, robot manipulations, dexterous hand, and so on (This repository is under actively development. We appreciate any constructive comments and suggestions).

πŸ”₯ Benchmark Features:

  • High Modularity: It is designed for flexible adaptation to a variety of research needs, featuring high modularity to support a wide range of experiments.
  • Task Coverage: It provides a comprehensive set of tasks to evaluate robustness across different RL scenarios (at least 170 tasks).
  • High Compatibility: It can be seamless and compatible with a wide range of existing environments.
  • Support Vectorized Environments: It can be useful to enable parallel processing of multiple environments for efficient experimentation.
  • Support for New Gym API: It fully supports the latest standards in Gym API, facilitating easy integration and expansion.
  • LLMs Guide Robust Learning: Leverage LLMs to set robust parameters (LLMs as adversary policies).

πŸ”₯ Benchmark Tasks:

  • Robust MuJoCo Tasks: Tackle complex simulations with enhanced robustness.
  • Robust Box2D Tasks: Engage with 2D physics environments designed for robustness evaluation.
  • Robust Robot Manipulation Tasks: Robust robotic manipulation with Kuka and Franka robots.
  • Robust Safety Tasks: Prioritize safety in robustness evaluation.
  • Robust Android Hand Tasks: Explore sophisticated hand manipulation challenges in robust settings.
  • Robust Dexterous Tasks: Advance the robust capabilities in dexterous robotics.
  • Robust Fetch Manipulation Tasks: Robust object manipulation with Fetch robots.
  • Robust Robot Kitchen Tasks: Robust manipulation in Kitchen environments with robots.
  • Robust Maze Tasks: Robust navigation robots.
  • Robust Multi-Agent Tasks: Facilitate robust coordination among multiple agents.

Each of these robust tasks incorporates robust elements such as robust observations, actions, reward signals, and dynamics to evaluate the robustness of RL algorithms.

πŸ”₯ Our Vision: We hope this benchmark serves as a useful platform for pushing the boundaries of RL in real-world problems --- promoting robustness and domain adaptation ability!

Any suggestions and issues are welcome. If you have any questions, please propose an issue or pull request, or contact us directly via email at shangding.gu@berkeley.edu; we will respond to you in one week.


Content


Introduction

Reinforcement Learning against Uncertainty/Perturbation

Reinforcement learning (RL) problems is formulated as that an agent seeks a policy that optimizes the long-term expected return through interacting with an environment. While standard RL has been heavily investigated recently, its use can be significantly hampered in practice due to noise, malicious attacks, the sim-to-real gap, domain generalization requirements, or even a combination of those and more factors. Consequently, in addition to maximizing the cumulative rewards, robustness to unexpected uncertainty/perturbation emerges as another critical goal for RL, especially in high-stakes applications such as robotics, financial investments, autonomous driving, and so on. This leads to a surge of considerations of more robust RL algorithms for different problems, termed as robust RL, including but not limited to single-agent RL, safe RL, and multi-agent RL.

A Unified Robust Reinforcement Learning Framework: MDP with Disruption

Robust RL problems typically consists of three modules

  • An agent (a policy): tries to learn a strategy $\pi$ (a policy) based on the observation from the environment to achieve optimal long-term return
  • An environment/task: a task that determine the agents' immediate reward $r(\cdot |s,a)$ and the physical or logical dynamics (transition function $P_t( \cdot | s,a)$)
  • The disruptor module: represents the uncertainty/perturbation events that happens during any parts of the interaction process between the agent and environment, with different modes, sources, and frequencies.

We illustrate the framework of robust RL for single-agent problems for instance:

Robust-Gymnasium: A Unified Modular Benchmark

This benchmark support various 1) environments/tasks and 2) disruptors (perturbations to the interaction process). This allows users to design and evaluate different algorithms in different application scenarios when encountering diverse uncertainty issues. Switch to the sections below if you want to get a quick glance of which environments and perturbations that Robust-Gymnasium support.


Environments and Tasks

Tasks: Random, Adversary, Semantic Tasks (Robot Manipulation Tasks).

Robust MuJoCo Tasks
Tasks\Robust type Robust State Robust Action Robust Reward Robust Dynamics
Ant-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
HalfCheetah-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Hopper-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Walker2d-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Swimmer-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Humanoid-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
HumanoidStandup-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Pusher-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Reacher-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
InvertedDoublePendulum-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
InvertedPendulum-v2-v3-v4-v5 βœ… βœ… βœ… βœ…
Robust Boxd2d Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
CarRacing-v2 βœ… βœ… βœ…
LunarLanderContinuous-v3 βœ… βœ… βœ…
BipedalWalker-v3 βœ… βœ… βœ…
LunarLander-v3 (Discrete Task) βœ… βœ… βœ…
Robust Robot Manipulation Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
RobustLift βœ… βœ… βœ…
RobustDoor βœ… βœ… βœ…
RobustNutAssembly βœ… βœ… βœ…
RobustPickPlace βœ… βœ… βœ…
RobustStack βœ… βœ… βœ…
RobustWipe βœ… βœ… βœ…
RobustToolHang βœ… βœ… βœ…
RobustTwoArmLift βœ… βœ… βœ…
RobustTwoArmPegInHole βœ… βœ… βœ…
RobustTwoArmHandover βœ… βœ… βœ…
RobustTwoArmTransport βœ… βœ… βœ…
MultiRobustDoor βœ… βœ… βœ…
Robust Safety Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
RobustSafetyAnt-v4 βœ… βœ… βœ…
RobustSafetyHalfCheetah-v4 βœ… βœ… βœ…
RobustSafetyHopper-v4 βœ… βœ… βœ…
RobustSafetyWalker2d-v4 βœ… βœ… βœ…
RobustSafetySwimmer-v4 βœ… βœ… βœ…
RobustSafetyHumanoid-v4 βœ… βœ… βœ…
RobustSafetyHumanoidStandup-v4 βœ… βœ… βœ…
RobustSafetyPusher-v4 βœ… βœ… βœ…
RobustSafetyReacher-v4 βœ… βœ… βœ…
Robust Androit Hand Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
RobustAdroitHandDoor-v1 βœ… βœ… βœ…
RobustAdroitHandHammer-v1 βœ… βœ… βœ…
RobustAdroitHandPen-v1 βœ… βœ… βœ…
RobustAdroitHandRelocate-v1 βœ… βœ… βœ…
Robust Dexterous Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
RobustHandManipulateEgg_BooleanTouchSensors-v1 βœ… βœ… βœ…
RobustHandReach-v2 βœ… βœ… βœ…
RobustHandManipulateBlock-v1 βœ… βœ… βœ…
RobustHandManipulateEgg-v1 βœ… βœ… βœ…
RobustHandManipulatePen-v1 βœ… βœ… βœ…
Robust Fetch Manipulation Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
RobustFetchPush-v3 βœ… βœ… βœ…
RobustFetchReach-v3 βœ… βœ… βœ…
RobustFetchSlide-v3 βœ… βœ… βœ…
RobustFetchPickAndPlace-v3 βœ… βœ… βœ…
Robust Robot Kitchen Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
FrankaKitchen-v1 βœ… βœ… βœ…
Robust Maze Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
AntMaze_UMaze-v4 βœ… βœ… βœ…
PointMaze_UMaze-v3 βœ… βœ… βœ…
Robust Multi-Agent Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
MA-Ant-2x4, 2x4d, 4x2, 4x1 βœ… βœ… βœ…
MA-HalfCheetah-2x3, 6x1 βœ… βœ… βœ…
MA-Hopper-3x1 βœ… βœ… βœ…
MA-Walker2d-2x3 βœ… βœ… βœ…
MA-Swimmer-2x1 βœ… βœ… βœ…
MA-Humanoid-9|8 βœ… βœ… βœ…
MA-HumanoidStandup-v4 βœ… βœ… βœ…
MA-Pusher-3p βœ… βœ… βœ…
MA-Reacher-2x1 βœ… βœ… βœ…
Many-MA-Swimmer-10x2, 5x4, 6x1, 1x2 βœ… βœ… βœ…
Many-MA-Ant-2x3, 3x1 βœ… βœ… βœ…
CoupledHalfCheetah-p1p βœ… βœ… βœ…
Robust Humanoid Tasks
Tasks\Robust type Robust State Robust Action Robust Reward
Robusth1hand-reach-v0 βœ… βœ… βœ…
Robusth1hand-push-v0 βœ… βœ… βœ…
h1hand-truck-v0 βœ… βœ… βœ…
Robusth1hand-slide-v0 βœ… βœ… βœ…

Disruptor Module for Perturbations

Before introducing the disruptor module, we recall that RL problem can be formulated as a process involving several key concepts: an agent, state, action, reward, and an environment. Specifically, at each time $t$, the environment generate a state $s_t$ and a reward $r_t$ and send them to the agent, and the agent choose an action $a_t$ and send back to the environment to generate the next state $s_{t+1}$ conditioned on the current state $s_t$ and the action $a_t$.

Considering this, in this benchmark, we consider extensive potential uncertainty/disturbance/generalizable events that happen in this process (including both training and testing phases) during any places, with any modes, and at any time, summarized in the following table.

Perturbation modes\sources Observed state Observed reward Action Environment/task
Random βœ… βœ… βœ… βœ…
Adversarial βœ… \ βœ… \
Set arbitrarily \ \ \ βœ…
Semantic Domain shift \ \ \ βœ…

Those perturbation events can be generally categorized from three different perspectives:

  • Sources: which component is perturbed/attacked.
    • Agent's observed state: The agent observes a noisy/attacked 'state' $\widetilde{s}_t$ (diverge from the real state $s_t$ ) and use it as the input of its policy to determine the action.
    • Agent's observed reward: The agent observes a noisy/attacked 'reward' $\widetilde{r}_t$ (differ from the real immediate reward ($r_t$) obtained from the environment) and construct their policy according to it.
    • Action: The action $a_t$ chosen by the agent is contaminated before sent to the environment. Namely, a perturbed action $\widetilde{a}_t$ serves as the input of the environment for the next step.
    • Environment: an environment includes both immediate reward function $r$ and dynamic function $P_t$. An agent may interact with a shifted or unstationary environment.
  • Modes: what kind of perturbation is imposed on.
    • Random: the nominal variable will be added by some random noise following some distributions, such as Gaussian, or uniform distribution. This mode can be used to all perturbation sources.
    • Adversarial: an adversarial attacker will choose the perturbed output within some admissible set to degrade the agent's performance. This mode can be used to the perturbations towards observation and action.
    • Set arbitrarily: An environment can be set to any fixed one within some pre-scribed uncertainty set of the environments.
    • Semantic-domain-shifted: We offer some partially-similar environment/tasks while with some semantic diversity (such as different goals) for domain generalization or transfer learning tasks.
  • Frequency: when does the perturbation happen. Viewed through the lens of time, the perturbations can happen at different period during training and testing process, even with different frequency. We provide interactive modes that support step-wise varying interaction between disruptors, agents, and environments. So the user can choose to apply perturbations at any point in the dimension of time in any way.
πŸ’‘ Tip
Not all environments support all kinds of disruptors (perturbations). Please refer to the above section (Environments and Tasks) for more information.

Tutorials

Here, we provide a step-by-step tutorial for users to create and use a domain-shifted/noisy task by choosing any environment/task combined with any uncertainty factor to perturb some original environment, see the link.


Installation of the Environments

  1. Create an environment (requires Conda installation): We are currently developing our environments on an Ubuntu system. The operating system version in our server is 20.04.3 LTS.

    Use the following command to create a new Conda environment named robustgymnasium with Python 3.11:

    conda create -n robustgymnasium  python=3.11

    Activate the newly created environment:

    conda activate robustgymnasium
  2. Install dependency packages:

    Install the necessary packages using pip. Make sure you are in the project directory where the setup.py file is located:

    pip install -r requirements.txt
    pip install -e .

Testing the Tasks

To run the tests, navigate to the examples directory and Test. te the test script, e.g.,

cd examples/robust_action/mujoco/ 
chmod +x test.sh
./test.sh

Ensure you follow these steps to set up and test the environment properly. Adjust paths and versions as necessary based on your specific setup requirements.

If you met some issues, please check the existing solutions for the reported issues, which could help you address your issue.


Selected Demos

Robust MuJoCo Tasks

Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2

These demonstrations are from version 4 of the MuJoCo tasks with robust settings.

Robust MuJoCo Variant Tasks

Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2

These demonstrations are Robust MuJoCo variant tasks with robust settings.

Robust Robot Manipulation Tasks

Image 1 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 1 Image 2 Image 1

These demonstrations are from robot manipulation tasks with robust settings.

Robust Dexterous Hand and Maze Tasks

Image 1 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 3 Image 1 Image 2 Image 1

These demonstrations are from dexterous hand and maze tasks with robust settings.


Citation

If you find the repository useful, please cite the study

@article{robustrl2024,
  title={Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning},
  author={Gu, Shangding and Shi, Laixi and Wen, Muning and Jin, Ming and Mazumdar, Eric and Chi, Yuejie and Wierman, Adam and Spanos, Costas},
  journal={Github},
  year={2024}
}

Acknowledgments

We thank the contributors from MuJoCo, Gymnasium, Humanoid-bench and Robosuite.