luccabb · luccabb · Jan 22, 2026 · Jan 22, 2026
@@ -8,6 +8,10 @@ Moonfish is a didactic Python chess engine designed to showcase parallel search
 
 The engine achieves approximately ~2000 Elo when playing against Lichess Stockfish bots (beats level 5 and loses to level 6) and includes comprehensive test suites including the Bratko-Kopec tactical test positions.
 
+## Play Online
+
+**[Play against Moonfish in your browser](https://huggingface.co/spaces/luccabb/moonfish_chess)** - No installation required!
+
 # Quickstart
 
 ## Requirements
@@ -79,6 +83,7 @@ $ curl "http://localhost:5000/?fen=rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR%2
 - **UCI Protocol** - Compatible with popular chess GUIs
 - **Web API** - RESTful interface for online integration
 - **Lichess Bot** - Ready for deployment on [Lichess.org](/CONTRIBUTING.md#lichess-bot-python-bridge)
+- **RL Environment** - OpenEnv-compatible environment for reinforcement learning
 
 ## Configuration Options
 
@@ -92,6 +97,114 @@ $ curl "http://localhost:5000/?fen=rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR%2
 | `--quiescence-search-depth` | Max depth of quiescence search | `3` | `1-N` |
 | `--syzygy-path` | Tablebase directory | `None` | Valid path |
 
+## Reinforcement Learning Environment
+
+Moonfish includes an [OpenEnv](https://github.com/huggingface/openenv)-compatible RL environment for training chess agents. OpenEnv is a framework for RL environments that supports both local and remote (HTTP) execution.
+
+### Installation
+
+```shell
+pip install moonfish[rl]
+```
+
+### Local Usage
+
+```python
+from moonfish.rl import ChessEnvironment, ChessAction
+
+# Create environment with moonfish as opponent
+env = ChessEnvironment(opponent="moonfish", opponent_depth=2)
+
+# Reset and play
+obs = env.reset()
+while not obs.done:
+    move = select_move(obs.legal_moves)  # Your policy here
+    obs, reward, done = env.step(ChessAction(move=move))
+
+env.close()
+```
+
+### Configuration Options
+
+```python
+from moonfish.rl import ChessEnvironment, RewardConfig
+
+# Custom reward shaping
+config = RewardConfig(
+    win=1.0,              # Reward for winning
+    loss=-1.0,            # Penalty for losing
+    draw=0.0,             # Reward for draw
+    illegal_move=-0.1,    # Penalty for illegal moves
+    use_evaluation=True,  # Enable position-based intermediate rewards
+    evaluation_scale=0.001,
+)
+
+# Environment options
+env = ChessEnvironment(
+    reward_config=config,
+    opponent="moonfish",   # "moonfish", "random", or None (self-play)
+    opponent_depth=2,      # Search depth for moonfish opponent
+    agent_color=True,      # True=White, False=Black, None=alternate
+    max_moves=500,         # Max half-moves before draw
+)
+```
+
+### OpenEnv Server Mode
+
+For distributed training or integration with OpenEnv-compatible frameworks:
+
+```shell
+# Start the server locally
+python -m uvicorn moonfish.rl.server.app:app --port 8000
+```
+
+```python
+# Connect via HTTP client
+from moonfish.rl import make_env, ChessAction
+
+client = make_env("http://localhost:8000")
+obs = client.reset()
+result = client.step(ChessAction(move="e2e4"))
+print(result.observation.fen)
+```
+
+### API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check |
+| `/metadata` | GET | Environment configuration |
+| `/reset` | POST | Start new game (optional: `fen`, `seed`) |
+| `/step` | POST | Make a move (`{"move": "e2e4"}`) |
+| `/state` | GET | Current game state |
+| `/engine-move` | POST | Get best move for position |
+
+### Hosted on Hugging Face
+
+A hosted version is available on Hugging Face Spaces - use it for training without running your own server:
+
+```python
+from moonfish.rl import make_env, ChessAction
+
+client = make_env("https://luccabb-moonfish-chess.hf.space")
+obs = client.reset()
+result = client.step(ChessAction(move="e2e4"))
+```
+
+**Space URL:** https://huggingface.co/spaces/luccabb/moonfish_chess
+
+### Deploy Your Own
+
+```shell
+# Install OpenEnv CLI
+pip install openenv
+
+# Clone and deploy to Hugging Face Spaces
+cd moonfish/rl
+openenv validate  # Check environment structure
+openenv push      # Deploy to HF Spaces
+```
+
 ## Contributing
 
 We welcome contributions, feel free to open PRs/Issues! Areas of interest:

@@ -0,0 +1,185 @@
+---
+title: Moonfish Chess
+emoji: ♟️
+colorFrom: gray
+colorTo: blue
+sdk: docker
+pinned: false
+license: mit
+---
+
+# Chess OpenEnv
+
+A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.
+
+## Features
+
+- **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions
+- **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping
+- **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface
+- **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards
+- **HTTP API**: FastAPI server for remote training and multi-agent setups
+- **Containerized**: Docker support for reproducible deployments
+
+## Quick Start
+
+### Local Usage (No Server)
+
+```python
+from moonfish.rl import ChessEnvironment, ChessAction
+
+# Create environment
+env = ChessEnvironment()
+
+# Start a new game
+obs = env.reset()
+print(f"Legal moves: {obs.legal_moves}")
+
+# Make a move
+action = ChessAction(move="e2e4")
+obs, reward, done = env.step(action)
+
+print(f"FEN: {obs.fen}")
+print(f"Reward: {reward}, Done: {done}")
+```
+
+### Client-Server Usage
+
+Start the server:
+
+```bash
+cd moonfish/rl
+python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+
+Connect with the client:
+
+```python
+from moonfish.rl import ChessEnvClient, ChessAction
+
+client = ChessEnvClient("http://localhost:8000")
+
+obs = client.reset()
+result = client.step(ChessAction(move="e2e4"))
+print(f"Reward: {result.reward}")
+
+client.close()
+```
+
+## Data Models
+
+### ChessAction
+```python
+@dataclass
+class ChessAction:
+    move: str  # UCI format: "e2e4", "e7e8q" (promotion)
+```
+
+### ChessObservation
+```python
+@dataclass
+class ChessObservation:
+    fen: str              # Board state in FEN notation
+    legal_moves: List[str]  # Available moves in UCI format
+    is_check: bool        # Current player in check
+    done: bool            # Game over
+    reward: Optional[float]  # Terminal reward
+    result: Optional[str]    # "1-0", "0-1", "1/2-1/2"
+    metadata: Dict[str, Any]  # Evaluation, material, etc.
+```
+
+### ChessState
+```python
+@dataclass
+class ChessState:
+    episode_id: str        # Unique game identifier
+    step_count: int        # Half-moves played
+    current_player: str    # "white" or "black"
+    fen: str               # Current position
+    move_history: List[str]  # All moves in UCI format
+```
+
+## Reward Configuration
+
+```python
+from moonfish.rl import ChessEnvironment, RewardConfig
+
+config = RewardConfig(
+    win=1.0,           # Reward for winning
+    loss=-1.0,         # Penalty for losing
+    draw=0.0,          # Reward for draw
+    illegal_move=-0.1, # Penalty for illegal moves
+    use_evaluation=True,  # Enable intermediate rewards
+    evaluation_scale=0.0001,  # Scale for eval-based rewards
+)
+
+env = ChessEnvironment(reward_config=config)
+```
+
+## Docker
+
+Build and run:
+
+```bash
+docker build -t chess-openenv .
+docker run -p 8000:8000 chess-openenv
+```
+
+## Integration with RL Frameworks
+
+### With TorchRL
+
+```python
+from moonfish.rl import ChessEnvironment, ChessAction
+
+class ChessTorchRLWrapper:
+    def __init__(self):
+        self.env = ChessEnvironment()
+
+    def reset(self):
+        obs = self.env.reset()
+        return self._obs_to_tensor(obs)
+
+    def step(self, action_idx):
+        move = self._idx_to_move(action_idx)
+        obs, reward, done = self.env.step(ChessAction(move=move))
+        return self._obs_to_tensor(obs), reward, done
+```
+
+### With OpenEnv Training Loop
+
+```python
+from moonfish.rl import make_env, ChessAction
+import random
+
+client = make_env("http://localhost:8000")
+
+for episode in range(100):
+    obs = client.reset()
+    episode_reward = 0
+
+    while not obs.done:
+        # Your policy here (random for demo)
+        move = random.choice(obs.legal_moves)
+        result = client.step(ChessAction(move=move))
+        obs = result.observation
+        episode_reward += result.reward
+
+    print(f"Episode {episode}: reward={episode_reward}")
+
+client.close()
+```
+
+## API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check |
+| `/metadata` | GET | Environment configuration |
+| `/reset` | POST | Start new episode |
+| `/step` | POST | Execute a move |
+| `/state` | GET | Get episode metadata |
+
+## License
+
+MIT - See the moonfish repository for full license details.
@@ -0,0 +1,18 @@
+"""Chess OpenEnv - A chess environment for reinforcement learning."""
+
+from .client import ChessEnvClient, make_env, StepResult
+from .models import ChessAction, ChessObservation, ChessState, RewardConfig
+from .server.chess_environment import ChessEnvironment
+
+__all__ = [
+    "ChessAction",
+    "ChessObservation",
+    "ChessState",
+    "RewardConfig",
+    "ChessEnvClient",
+    "StepResult",
+    "make_env",
+    "ChessEnvironment",
+]
+
+__version__ = "1.0.0"