Skip to content

Conversation

@yhnsu
Copy link
Collaborator

@yhnsu yhnsu commented Dec 17, 2025

Description

This PR introduces a new LeRobot data handler that enables seamless export of EmbodiChain environment episodes to the LeRobot dataset format. This integration allows users to leverage LeRobot's powerful data visualization and training tools with EmbodiChain-generated robotic manipulation data.

Key Features

  • Automatic Data Conversion: Converts EmbodiChain observations and actions to LeRobot-compatible format
  • Multi-Modal Support: Handles RGB images (including stereo cameras), proprioceptive states, and actions
  • Flexible Configuration: Supports both video and image encoding for visual data
  • Easy Integration: Simple API through env.to_dataset() method
  • Hugging Face Hub Compatible: Optional push to Hugging Face Hub for sharing datasets
  • Robust Error Handling: Graceful handling of missing dependencies and import errors

Changes Summary

New Files

  • embodichain/data/handler/lerobot_data_handler.py: Core handler implementation
    • LerobotDataHandler class for data extraction and conversion
    • save_to_lerobot_format() function for easy episode saving
    • Automatic feature schema generation from environment metadata
    • Support for stereo cameras and multi-camera setups

Modified Files

  • embodichain/data/enum.py:

    • Added Modality enum for data modality types (states, actions, images, etc.)
    • Added JointType and EefType enums for robot data types
    • Added ActionMode enum for absolute/relative action specifications
    • Added SUPPORTED_PROPRIO_TYPES and SUPPORTED_ACTION_TYPES constants for data schema
    • Implemented HandQposNormalizer class for dexterous hand joint normalization/denormalization
  • embodichain/lab/gym/envs/embodied_env.py:

    • Added to_dataset() method to EmbodiedEnv class
    • Integrated LeRobot data handler with proper error handling
    • Automatic episode counter management
  • embodichain/lab/scripts/run_env.py:

    • Updated to call env.to_dataset() after successful task completion
    • Generates unique dataset IDs based on timestamp and trajectory index
  • embodichain/lab/gym/utils/misc.py:

    • Added utility functions for data processing compatibility
  • configs/gym/pour_water/gym_config.json:

    • Added robot_meta configuration with observation and action specifications

Usage Example

# During environment execution
env = make_env("PourWater-v0")

# ... run episode ...

# Save episode to LeRobot format
dataset_path = env.to_dataset(
    repo_id="my_username/pour_water_dataset",
    fps=30,
    use_videos=True,
    push_to_hub=False,
)

# Visualize with LeRobot tools
# lerobot-dataset-viz --repo-id my_username/pour_water_dataset --episode-index 0

Technical Details

Data Schema:

  • Observations:
    • RGB images from all configured cameras (with stereo support)
    • Proprioceptive states (joint positions for arm, hand, etc.)
  • Actions: Robot joint commands
  • Metadata: Task description, robot type, FPS

Format Compatibility:

  • Converts torch tensors and numpy arrays to appropriate formats
  • Handles image normalization (float32 [0,1] → uint8 [0,255])
  • Preserves temporal ordering and episode structure

Storage:

  • Default location: ~/.cache/huggingface/lerobot/<repo_id>/
  • Configurable through HF_LEROBOT_HOME environment variable

Dependencies

This feature requires the lerobot package to be installed:

pip install lerobot

The handler gracefully handles missing dependencies and provides clear error messages.

Type of change

  • New feature (non-breaking change which adds functionality)

Benefits

  1. Interoperability: Seamlessly use EmbodiChain with LeRobot's ecosystem
  2. Visualization: Leverage LeRobot's GUI tools for data inspection
  3. Training: Direct compatibility with LeRobot's imitation learning pipelines
  4. Sharing: Easy dataset sharing through Hugging Face Hub
  5. Standardization: Adopts widely-used dataset format in robotics community

Checklist

  • I have run the black . command to format the code base
  • Dependencies have been documented (lerobot package)
  • Error handling has been implemented for missing dependencies
  • Code follows existing project structure and conventions

Migration Guide

For existing EmbodiChain users:

  1. Install the lerobot package: pip install lerobot
  2. downgrade gymnasium package to match embodichain pip install gymnasium==0.29.1
  3. Ensure your environment config includes robot_meta configuration with observation and action specifications
  4. Call env.to_dataset(repo_id="your/dataset") after episode completion
  5. Optionally visualize with lerobot-dataset-viz --repo-id your/dataset --episode-index 0

No breaking changes to existing code - the feature is opt-in through the to_dataset() method.

@yhnsu yhnsu requested a review from yuecideng December 17, 2025 08:29
@yuecideng yuecideng changed the title draft: Add LeRobot Data Handler for Dataset Export [Draft]: Add LeRobot Data Handler for Dataset Export Dec 17, 2025
@yuecideng yuecideng marked this pull request as draft December 17, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants