Skip to content

Commit

Permalink
Merge pull request #4 from camel-ai/update-docs
Browse files Browse the repository at this point in the history
Update and clean outdated code
  • Loading branch information
dandansamax authored Jul 31, 2024
2 parents 395298c + 05a2ba9 commit 79b4af3
Show file tree
Hide file tree
Showing 27 changed files with 1,090 additions and 1,081 deletions.
2 changes: 1 addition & 1 deletion .github/actions/crab_install/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ runs:
path: ./.venv
key: venv-${{ hashFiles('poetry.lock') }}
- name: Install the project dependencies
run: poetry install -E visual-prompt -E server
run: poetry install -E client -E server
shell: bash
- uses: actions/cache/save@v3
name: Save caches based on poetry.lock
Expand Down
23 changes: 8 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 🦀 Crab: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents
# 🦀 CRAB: Cross-platform Agent Benchmark for Multimodal Embodied Language Model Agents

[![arXiv][arxiv-image]][arxiv-url]
[![Slack][slack-image]][slack-url]
Expand All @@ -8,11 +8,11 @@

## Overview

Crab is a framework for building LLM agent benchmark environments in a Python-centric way.
CRAB is a framework for building LLM agent benchmark environments in a Python-centric way.

#### Key Features

🌐 Cross-platform
🌐 Cross-platform and Multi-environment
* Create build agent environments that support various deployment options including in-memory, Docker-hosted, virtual machines, or distributed physical machines, provided they are accessible via Python functions.
* Let the agent access all the environments in the same time through a unified interface.

Expand All @@ -29,32 +29,25 @@ Crab is a framework for building LLM agent benchmark environments in a Python-ce
#### Prerequisites

- Python 3.10 or newer
- pip

```bash
pip install crab-framework[visual-prompt]
pip install crab-framework[client]
```

## Experiment on CRAB-Benchmark-v0

All datasets and experiment code are in [crab-benchmark-v0](./crab-benchmark-v0/) directory. Please carefully read the [benchmark tutorial](./crab-benchmark-v0/README.md) before using our benchmark.

## Examples

#### Run template environment with openai agent

You can run the examples using the following command.

```bash
export OPENAI_API_KEY=<your api key>
python examples/single_env.py
python examples/multi_env.py
```

#### Run desktop environment with openai agent

You can run the examples using the following command.

```bash
export OPENAI_API_KEY=<your api key>
python examples/desktop_env.py "Open Firefox"
```

## Cite
Please cite [our paper](https://arxiv.org/abs/2407.01511) if you use anything related in your work:
Expand Down
19 changes: 18 additions & 1 deletion crab-benchmark-v0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,24 @@

Our benchmark contains two important parts: **Environments** and **Tasks**.

#### Environments

Since our Ubuntu environment is built upon KVM, setting it up locally requires you an experienced Linux user to deal with many small and miscellaneous issues. Therefore, we provide two environment setup methods:

* [Local setup](./docs/environment_local_setup.md) provides you a step-by-step guideline to build environments on a Linux Machine with **at least one monitor and 32G memory**, but it doesn't cover details like how to install KVM on your machine because they are various on different Linux distros.
* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required softwares and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).
* For those who want a quicker setup, we also provide a setup through [Google Clould Platform](./docs/environment_gcp_setup.md). Specifically, we publish a disk image contains all required softwares and configurations on google cloud, you can use your own google account to create a cloud computer through this disk image and use [google remote desktop](https://remotedesktop.google.com/access/) to connect to it. This method doesn't have any hardware limitations and when you set it up you can run the experiment immediately. As a tradeoff, the cloud computer that meets the minimum hardware requirement costs around $0.4 per hour (depend on the machine zone).

We connect to the Android environment via ADB, so any Android device, from an emulator to a physical smartphone, will work. You should ensure ADB is installed on your system and can be directly called through the command line. In our experiment, we used the built-in emulator of [Android Studio](https://developer.android.com/studio) to create a Google Pixel 8 Pro virtual device with the release name \textit{R} and installed necessary extra Apps.

#### Tasks

We manage our task dataset using a CRAB-recommended method. Sub-tasks are defined through Pydantic models written in Python code, and composed tasks are defined in JSON format, typically combining several sub-tasks. The sub-tasks are defined in [android_subtasks](./dataset/android_subtasks.py) and [ubuntu_subtasks](./dataset/ubuntu_subtasks.py). The JSON files storing composed tasks are categorized into [android](./dataset/android/), [ubuntu](./dataset/ubuntu/), and [cross-platform](./dataset/cross/). The tasks in android and ubuntu directories are single-environment task and those in cross directory are cross-environment tasks. Additionally, we create several tasks by hand instead of composing sub-tasks to provide semantically more meaningful tasks, which are found in [handmade tasks](./dataset/handmade_tasks.py).

## Experiment

After setting up the environment, you can start the experiment. A brief overview of the experiment is as follows:

1. Open the Ubuntu environment virtual machine and the Android environment emulator.
2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.
3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](./dataset/handmade_tasks.py). The task is: "Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description."
4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ def evaluator_ca79febf():
# * Make sure the init page of "Calendar" app is "Day" view. There should be at least one element today.


handmade_subtasks = [
handmade_tasks = [
Task(
id="79832e15-5fd3-43b8-b3e3-66249edfe1db",
description='Open slack in Ubuntu desktop, summarize the last two messages in current channel, then use "Messages" app in android phone to send the summary to the first contact in the list.',
Expand Down
Binary file added crab-benchmark-v0/docs/assets/android_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added crab-benchmark-v0/docs/assets/android_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 26 additions & 4 deletions crab-benchmark-v0/docs/environment_gcp_setup.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,30 @@
Reminder: This method is currently under preparation and is not available.

## Setup and Start the VM Instance

TODO
The development image is hosted in the project `capable-vista-420022` with image name `crab-benchmark-v0-0`.

You can use [gcloud](https://cloud.google.com/sdk/docs/install) to create an instance from this image.

```bash
gcloud compute instances create \
crab-instance \
--zone=us-central1-a \
--machine-type=n2-standard-8 \
--image=https://www.googleapis.com/compute/v1/projects/capable-vista-420022/global/images/crab-benchmark-v0-0 \
--enable-nested-virtualization
# You can change instance name, zone, machine type as you want.
# Remember that the CPU must support nested virtualization and should have at least 32G memory.
```

After creating the instance, you can connect it using SSH.

User account information:

* user: `root`; password: `crab`
* user: `crab`; password: `crab`

**IMPORTANT: You must switch to user `crab` before setting up remote desktop.** Use `sudo su crab`.

## Connect the Instance through a remote desktop service

Expand All @@ -10,6 +34,4 @@ There are many possible remote desktop products you can use. Here, we provide in

1. Go to [Google Remote Desktop Headless](https://remotedesktop.google.com/headless). Click **Begin** -> **Next** -> **Authorize**. On the resulting page, copy the command from the `Debian Linux` section.
2. Connect to the VM instance through SSH, paste the copied command, and run it. You will be prompted to set a six-digit PIN.
3. Go to [Google Remote Desktop Access](https://remotedesktop.google.com/access). You should see a remote device marked as online. Click it and enter the PIN. You will then see the desktop of the VM instance.

##
3. Go to [Google Remote Desktop Access](https://remotedesktop.google.com/access). You should see a remote device marked as online. Click it and enter the PIN. You will then see the desktop of the VM instance.
48 changes: 47 additions & 1 deletion crab-benchmark-v0/docs/environment_local_setup.md
Original file line number Diff line number Diff line change
@@ -1 +1,47 @@
TODO
## Install CRAB

First you should install `poetry`, a modern python dependency management tool.

Then pull the crab repo and install:

```bash
git clone https://github.com/camel-ai/crab

cd crab
poetry install -E client
```

## Install Ubuntu VM

**IMPORTANT: If you are using an Ubuntu VM, the Python version in the VM must match the Python version on the host machine. If you follow this instruction to install Ubuntu, the Python version in the VM will be 3.10.12. Consider using `conda` or `pyenv` to install the same Python version on the host machine.**

Install `virt-manager`. If you are using Ubuntu or Debian, try `sudo apt install virt-manager`.

Download [Ubuntu 22.04 image](https://releases.ubuntu.com/jammy/ubuntu-22.04.4-desktop-amd64.iso), then create a new machine with at least 8G RAM and 30G disk in virt-manager using the image. Follow the instruction and complete the installation. (It's better to use `crab` as the main user name.)

After install Ubuntu, you should install crab-server on it and do necessary initilization. In Ubuntu VM, run

```bash
git clone https://github.com/camel-ai/crab.git ~/crab/
cd ~/crab/crab-benchmark-v0/scripts
chmod +x ubuntu_env_init.sh
./ubuntu_env_init.sh
```

The VM will reboot after initilization. After rebooting, remember its ip address.

## Install Android Emulator

Download the newest version of [Android Studio](https://developer.android.com/studio). Install it.

Open Android studio and use build-in device manager to create a Pixel 8 Pro with system image release "R".

![](./assets/android_1.png)

![](./assets/android_2.png)

Then boot it.

## Install ADB

Download and install ADB from its [official website](https://developer.android.com/tools/releases/platform-tools)
1 change: 0 additions & 1 deletion crab-benchmark-v0/docs/experiment_instruction.md

This file was deleted.

4 changes: 2 additions & 2 deletions crab-benchmark-v0/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

from .android_env import ANDROID_ENV
from .dataset.android_subtasks import android_subtasks
from .dataset.handmade_subtasks import handmade_subtasks
from .dataset.handmade_tasks import handmade_tasks
from .dataset.ubuntu_subtasks import ubuntu_subtasks
from .ubuntu_env import UBUNTU_ENV
from .visual_prompt_actions import (
Expand Down Expand Up @@ -134,7 +134,7 @@ def get_benchmark(env: str, ubuntu_url: str):
benchmark_config.tasks.extend(tasks)

# Load from handmade tasks
benchmark_config.tasks.extend(handmade_subtasks)
benchmark_config.tasks.extend(handmade_tasks)

benchmark_config.step_limit = 15
return create_benchmark(benchmark_config)
Expand Down
73 changes: 73 additions & 0 deletions crab-benchmark-v0/scripts/ubuntu_env_init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/bin/bash

# Disable screen autolock
gsettings set org.gnome.desktop.screensaver lock-enabled false
gsettings set org.gnome.desktop.session idle-delay 0

# Disable automatic updates
sudo bash -c 'cat <<EOF > /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Unattended-Upgrade "0";
EOF'

# Allow sudo without password for the current user
CURRENT_USER=$(whoami)
sudo bash -c "echo \"$CURRENT_USER ALL=(ALL) NOPASSWD: ALL\" | tee /etc/sudoers.d/$CURRENT_USER"

# Install required packages
sudo apt update
sudo apt install -y openssh-server git vim python3-pip xdotool python3-tk python3.10-venv

# Install pipx
python3 -m pip install pipx
python3 -m pipx ensurepath

# Modify .bashrc to alias python to python3 for the current user
echo 'alias python=python3' >> /home/$CURRENT_USER/.bashrc

# Reload .bashrc for the current user
source /home/$CURRENT_USER/.bashrc

# Install poetry using pipx
pipx install poetry

# Pull CRAB repo
if [ ! -d "/home/$CURRENT_USER/crab" ]; then
git clone https://github.com/camel-ai/crab.git /home/$CURRENT_USER/crab/
fi

# Create poetry environment
cd /home/$CURRENT_USER/crab
poetry install -E server

# Change to X11 from Wayland
sudo sed -i 's/#WaylandEnable=false/WaylandEnable=false/g' /etc/gdm3/custom.conf
touch /home/$CURRENT_USER/.Xauthority

# Create the crab.service file with dynamic user and group
sudo bash -c "cat <<EOF > /etc/systemd/system/crab.service
[Unit]
Description=My Python Script Service
After=network.target
[Service]
WorkingDirectory=/home/$CURRENT_USER/crab/
ExecStart=/home/$CURRENT_USER/.local/bin/poetry run python -m crab.server.main --HOST 0.0.0.0
Restart=always
User=$CURRENT_USER
Group=$CURRENT_USER
[Install]
WantedBy=multi-user.target
EOF"

# Reload systemd to recognize the new service
sudo systemctl daemon-reload

# Enable and start the crab service
sudo systemctl enable crab.service

# Reboot the system to apply changes for X11
echo "System will reboot in 10 seconds to apply changes..."
sleep 10
sudo reboot
Loading

0 comments on commit 79b4af3

Please sign in to comment.