Skip to content

MIT-REALM/VLM_gcbfplus

Repository files navigation

VLM-GCBF+ for deadlock resolution

Jax Official Implementation of Paper: K Garg*, D S K Nair, J Arkin, S Zhang, N Roy, C Fan: "Online Motion Planning for Connected Multi-Robot Systems using Vision Language Models as High-level Planners".

Dependencies

We recommend to use CONDA to install the requirements:

conda create -n gcbfplus python=3.10
conda activate gcbfplus
cd gcbfplus

Then install jax following the official instructions, and then install the rest of the dependencies:

pip install -r requirements.txt

Installation

Install GCBF:

pip install -e .

Run

High-level planner for deadlock resolution

To run the high-level planner for deadlock resolution, use:

python -u  test_with_LLM.py --path logs/SingleIntegrator/gcbf+/model_with_traj/seed0_20240227110346 -n 10 --epi 20 --obs 25 --max-step 2500 --area-size 4  --keep-mode 20 --nojit-rollout --num-incontext-prompts 0 --leader_model 'gpt3.5' --modality 'llm' --num_LLM_calls 1

where the flags are:

  • -n: number of agents
  • --obs: number of obstacles
  • --area-size: side length of the environment
  • --max-step: maximum number of steps for each episode, increase this if you have a large environment
  • --path: path to the log folder
  • --keep-mode: keep mode for the high-level planner
  • --num-incontext-prompts: number of in-context examples
  • --leader_model: leader model for the high-level planner including
    • 'gpt-5'
    • 'gpt-4o'
    • 'gpt-4o-mini'
    • 'gpt3.5'
    • 'gpt4'
    • 'claude-opus-4.1'
    • 'claude-sonnet-3.5'
    • 'claude-haiku-3'
    • 'opus-3'
    • 'sonnet-3'
    • 'hand' for hand-designed heuristic leader-assignment
    • 'fixed' for fixed leader-assignment
    • 'random' for random leader-assignment
    • 'none' for no leader-assignment
  • --modality: Helps in choosing the type of model 'vlm' or 'llm'. Following models can be used in combination with a given modality input.
    • 'vlm': 'gpt-4o'
    • 'vlm': 'gpt-4o-mini'
    • 'vlm': 'claude-opus-4.1'
    • 'vlm': 'claude-sonnet-3.5'
    • 'vlm': 'claude-haiku-3'
    • 'vlm': 'sonnet-3'
    • 'llm': 'gpt3.5'
    • 'llm': 'gpt4'
    • 'vlm' or 'llm': 'gpt-5'
    • 'vlm' or 'llm': 'opus-3'
    • 'none': 'hand' for hand-designed heuristic leader-assignment
    • 'none': 'fixed' for fixed leader-assignment
    • 'none': 'random' for random leader-assignment
    • 'none': 'none' for no leader-assignment
  • --num_LLM_calls: number of LLM calls for "Ensemble" implementation of the high-level planner

For testing on "Randomized room" environment, use:

python -u  test_with_LLM.py --path logs/SingleIntegrator/gcbf+/model_with_traj/seed0_20240227110346/ -n 1 --epi 20 --obs 1 --preset_reset --preset_scene 'rand box' --max-step 2500 --area-size 1 --keep-mode 20 --nojit-rollout --num-incontext-prompts 0 --leader_model 'gpt3.5' --modality 'llm' --num_LLM_calls 1 

where ---preset_reset is used to reset the environment to a fixed initial state from - 'rand box' for a random room environment - 'original box' for a fixed room environment - 'box' for room-like environment with more obstacles.

Scene and Waypoint Accuracy Evaluation

This repository contains tools to evaluate both scene extraction accuracy and waypoint generation accuracy for different large language models (LLMs). These tools operate on grid environments where an agent and obstacles are provided, and the LLM must analyze the environment image and produce a JSON description of the scene or generate feasible waypoints.


Running Scene & Waypoint Accuracy Experiments

To generate scene and waypoint accuracy results, first navigate to the Scene_and_waypoint_accuracy_files directory and run environment.py. After that, enter the specific accuracy folder you want to evaluate (e.g., scene_accuracy_gpt, waypoint_accuracy_aws, waypoint_accuracy_gpt) and run the script inside that folder.

Accessign LLMs and VLMs for the high-level planner

Accessing GPT Models

The code requires access to GPT models via OpenAI API. To access the models, you need to have an OpenAI account and an API key. You can sign up for an account here. Once you have an account, you can find your API key here. The instructions to using the API key can be found here.

Accessing Claude Models

The code uses AWS-based access to Claude models through boto3 package. You need to have an AWS account and configure your credentials as described here.

GCBF+ low-level controller for safe multi-agent navigation

Environments

We provide 3 2D environments including SingleIntegrator, DoubleIntegrator, and DubinsCar, and 2 3D environments including LinearDrone and CrazyFlie.

Algorithms

We provide algorithms including GCBF+ (gcbf+), GCBF (gcbf), centralized CBF-QP (centralized_cbf), and decentralized CBF-QP (dec_share_cbf). Use --algo to specify the algorithm.

Hyper-parameters

To reproduce the results shown in our paper, one can refer to settings.yaml.

Train

To train the model (only GCBF+ and GCBF need training), use:

python train.py --algo gcbf+ --env DoubleIntegrator -n 8 --area-size 4 --loss-action-coef 1e-4 --n-env-train 16 --lr-actor: 1e-5 --lr-cbf: 1e-5 --horizon: 32

In our paper, we use 8 agents with 1000 training steps. The training logs will be saved in folder ./logs/<env>/<algo>/seed<seed>_<training-start-time>. We also provide the following flags:

  • -n: number of agents
  • --env: environment, including SingleIntegrator, DoubleIntegrator, DubinsCar, LinearDrone, and CrazyFlie
  • --algo: algorithm, including gcbf, gcbf+
  • --seed: random seed
  • --steps: number of training steps
  • --name: name of the experiment
  • --debug: debug mode: no recording, no saving
  • --obs: number of obstacles
  • --n-rays: number of LiDAR rays
  • --area-size: side length of the environment
  • --n-env-train: number of environments for training
  • --n-env-test: number of environments for testing
  • --log-dir: path to save the training logs
  • --eval-interval: interval of evaluation
  • --eval-epi: number of episodes for evaluation
  • --save-interval: interval of saving the model

In addition, use the following flags to specify the hyper-parameters:

  • --alpha: GCBF alpha
  • --horizon: GCBF+ look forward horizon
  • --lr-actor: learning rate of the actor
  • --lr-cbf: learning rate of the CBF
  • --loss-action-coef: coefficient of the action loss
  • --loss-h-dot-coef: coefficient of the h_dot loss
  • --loss-safe-coef: coefficient of the safe loss
  • --loss-unsafe-coef: coefficient of the unsafe loss
  • --buffer-size: size of the replay buffer

Test

To test the learned model, use:

python test.py --path <path-to-log> --epi 5 --area-size 4 -n 16 --obs 0

This should report the safety rate, goal reaching rate, and success rate of the learned model, and generate videos of the learned model in <path-to-log>/videos. Use the following flags to customize the test:

  • -n: number of agents
  • --obs: number of obstacles
  • --area-size: side length of the environment
  • --max-step: maximum number of steps for each episode, increase this if you have a large environment
  • --path: path to the log folder
  • --n-rays: number of LiDAR rays
  • --alpha: CBF alpha, used in centralized CBF-QP and decentralized CBF-QP
  • --max-travel: maximum travel distance of agents
  • --cbf: plot the CBF contour of this agent, only support 2D environments
  • --seed: random seed
  • --debug: debug mode
  • --cpu: use CPU
  • --u-ref: test the nominal controller
  • --env: test environment (not needed if the log folder is specified)
  • --algo: test algorithm (not needed if the log folder is specified)
  • --step: test step (not needed if testing the last saved model)
  • --epi: number of episodes to test
  • --offset: offset of the random seeds
  • --no-video: do not generate videos
  • --log: log the results to a file
  • --dpi: dpi of the video
  • --nojit-rollout: do not use jit to speed up the rollout, used for large-scale tests

To test the nominal controller, use:

python test.py --env SingleIntegrator -n 16 --u-ref --epi 1 --area-size 4 --obs 0

To test the CBF-QPs, use:

python test.py --env SingleIntegrator -n 16 --algo dec_share_cbf --epi 1 --area-size 4 --obs 0 --alpha 1

Pre-trained models

We provide the pre-trained models in the folder logs.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages