Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ Moonfish is a didactic Python chess engine designed to showcase parallel search

The engine achieves approximately ~2000 Elo when playing against Lichess Stockfish bots (beats level 5 and loses to level 6) and includes comprehensive test suites including the Bratko-Kopec tactical test positions.

## Play Online

**[Play against Moonfish in your browser](https://huggingface.co/spaces/luccabb/moonfish_chess)** - No installation required!

# Quickstart

## Requirements
Expand Down Expand Up @@ -79,6 +83,7 @@ $ curl "http://localhost:5000/?fen=rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR%2
- **UCI Protocol** - Compatible with popular chess GUIs
- **Web API** - RESTful interface for online integration
- **Lichess Bot** - Ready for deployment on [Lichess.org](/CONTRIBUTING.md#lichess-bot-python-bridge)
- **RL Environment** - OpenEnv-compatible environment for reinforcement learning

## Configuration Options

Expand All @@ -92,6 +97,114 @@ $ curl "http://localhost:5000/?fen=rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR%2
| `--quiescence-search-depth` | Max depth of quiescence search | `3` | `1-N` |
| `--syzygy-path` | Tablebase directory | `None` | Valid path |

## Reinforcement Learning Environment

Moonfish includes an [OpenEnv](https://github.com/huggingface/openenv)-compatible RL environment for training chess agents. OpenEnv is a framework for RL environments that supports both local and remote (HTTP) execution.

### Installation

```shell
pip install moonfish[rl]
```

### Local Usage

```python
from moonfish.rl import ChessEnvironment, ChessAction

# Create environment with moonfish as opponent
env = ChessEnvironment(opponent="moonfish", opponent_depth=2)

# Reset and play
obs = env.reset()
while not obs.done:
move = select_move(obs.legal_moves) # Your policy here
obs, reward, done = env.step(ChessAction(move=move))

env.close()
```

### Configuration Options

```python
from moonfish.rl import ChessEnvironment, RewardConfig

# Custom reward shaping
config = RewardConfig(
win=1.0, # Reward for winning
loss=-1.0, # Penalty for losing
draw=0.0, # Reward for draw
illegal_move=-0.1, # Penalty for illegal moves
use_evaluation=True, # Enable position-based intermediate rewards
evaluation_scale=0.001,
)

# Environment options
env = ChessEnvironment(
reward_config=config,
opponent="moonfish", # "moonfish", "random", or None (self-play)
opponent_depth=2, # Search depth for moonfish opponent
agent_color=True, # True=White, False=Black, None=alternate
max_moves=500, # Max half-moves before draw
)
```

### OpenEnv Server Mode

For distributed training or integration with OpenEnv-compatible frameworks:

```shell
# Start the server locally
python -m uvicorn moonfish.rl.server.app:app --port 8000
```

```python
# Connect via HTTP client
from moonfish.rl import make_env, ChessAction

client = make_env("http://localhost:8000")
obs = client.reset()
result = client.step(ChessAction(move="e2e4"))
print(result.observation.fen)
```

### API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment configuration |
| `/reset` | POST | Start new game (optional: `fen`, `seed`) |
| `/step` | POST | Make a move (`{"move": "e2e4"}`) |
| `/state` | GET | Current game state |
| `/engine-move` | POST | Get best move for position |

### Hosted on Hugging Face

A hosted version is available on Hugging Face Spaces - use it for training without running your own server:

```python
from moonfish.rl import make_env, ChessAction

client = make_env("https://luccabb-moonfish-chess.hf.space")
obs = client.reset()
result = client.step(ChessAction(move="e2e4"))
```

**Space URL:** https://huggingface.co/spaces/luccabb/moonfish_chess

### Deploy Your Own

```shell
# Install OpenEnv CLI
pip install openenv

# Clone and deploy to Hugging Face Spaces
cd moonfish/rl
openenv validate # Check environment structure
openenv push # Deploy to HF Spaces
```

## Contributing

We welcome contributions, feel free to open PRs/Issues! Areas of interest:
Expand Down
185 changes: 185 additions & 0 deletions moonfish/rl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
---
title: Moonfish Chess
emoji: ♟️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
license: mit
---

# Chess OpenEnv

A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.

## Features

- **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions
- **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping
- **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface
- **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards
- **HTTP API**: FastAPI server for remote training and multi-agent setups
- **Containerized**: Docker support for reproducible deployments

## Quick Start

### Local Usage (No Server)

```python
from moonfish.rl import ChessEnvironment, ChessAction

# Create environment
env = ChessEnvironment()

# Start a new game
obs = env.reset()
print(f"Legal moves: {obs.legal_moves}")

# Make a move
action = ChessAction(move="e2e4")
obs, reward, done = env.step(action)

print(f"FEN: {obs.fen}")
print(f"Reward: {reward}, Done: {done}")
```

### Client-Server Usage

Start the server:

```bash
cd moonfish/rl
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```

Connect with the client:

```python
from moonfish.rl import ChessEnvClient, ChessAction

client = ChessEnvClient("http://localhost:8000")

obs = client.reset()
result = client.step(ChessAction(move="e2e4"))
print(f"Reward: {result.reward}")

client.close()
```

## Data Models

### ChessAction
```python
@dataclass
class ChessAction:
move: str # UCI format: "e2e4", "e7e8q" (promotion)
```

### ChessObservation
```python
@dataclass
class ChessObservation:
fen: str # Board state in FEN notation
legal_moves: List[str] # Available moves in UCI format
is_check: bool # Current player in check
done: bool # Game over
reward: Optional[float] # Terminal reward
result: Optional[str] # "1-0", "0-1", "1/2-1/2"
metadata: Dict[str, Any] # Evaluation, material, etc.
```

### ChessState
```python
@dataclass
class ChessState:
episode_id: str # Unique game identifier
step_count: int # Half-moves played
current_player: str # "white" or "black"
fen: str # Current position
move_history: List[str] # All moves in UCI format
```

## Reward Configuration

```python
from moonfish.rl import ChessEnvironment, RewardConfig

config = RewardConfig(
win=1.0, # Reward for winning
loss=-1.0, # Penalty for losing
draw=0.0, # Reward for draw
illegal_move=-0.1, # Penalty for illegal moves
use_evaluation=True, # Enable intermediate rewards
evaluation_scale=0.0001, # Scale for eval-based rewards
)

env = ChessEnvironment(reward_config=config)
```

## Docker

Build and run:

```bash
docker build -t chess-openenv .
docker run -p 8000:8000 chess-openenv
```

## Integration with RL Frameworks

### With TorchRL

```python
from moonfish.rl import ChessEnvironment, ChessAction

class ChessTorchRLWrapper:
def __init__(self):
self.env = ChessEnvironment()

def reset(self):
obs = self.env.reset()
return self._obs_to_tensor(obs)

def step(self, action_idx):
move = self._idx_to_move(action_idx)
obs, reward, done = self.env.step(ChessAction(move=move))
return self._obs_to_tensor(obs), reward, done
```

### With OpenEnv Training Loop

```python
from moonfish.rl import make_env, ChessAction
import random

client = make_env("http://localhost:8000")

for episode in range(100):
obs = client.reset()
episode_reward = 0

while not obs.done:
# Your policy here (random for demo)
move = random.choice(obs.legal_moves)
result = client.step(ChessAction(move=move))
obs = result.observation
episode_reward += result.reward

print(f"Episode {episode}: reward={episode_reward}")

client.close()
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment configuration |
| `/reset` | POST | Start new episode |
| `/step` | POST | Execute a move |
| `/state` | GET | Get episode metadata |

## License

MIT - See the moonfish repository for full license details.
18 changes: 18 additions & 0 deletions moonfish/rl/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
"""Chess OpenEnv - A chess environment for reinforcement learning."""

from .client import ChessEnvClient, make_env, StepResult
from .models import ChessAction, ChessObservation, ChessState, RewardConfig
from .server.chess_environment import ChessEnvironment

__all__ = [
"ChessAction",
"ChessObservation",
"ChessState",
"RewardConfig",
"ChessEnvClient",
"StepResult",
"make_env",
"ChessEnvironment",
]

__version__ = "1.0.0"
Loading