SkyDeck Dashboard

Web-based dashboard and controller for managing SkyPilot experiments with declarative desired state management, automatic reconciliation, and persistent job history tracking.

Features

Declarative State Management: Set desired state (RUNNING/STOPPED/TERMINATED), system automatically reconciles
Job History Tracking: Complete history of all job executions per experiment
Dynamic Flag Configuration: Type-safe configuration using flags with autocomplete
Automatic Reconciliation: Background reconciler ensures current state matches desired state
SQLite Persistence: All state survives restarts
Real-time Updates: Web UI polls and displays current status

Quick Start

# Install dependencies
cd packages/skydeck
uv pip install -e .

# Run the dashboard
uv run python -m skydeck.run

# Access at http://localhost:8000

Architecture

Core Components

Data Models (models.py): Job and Experiment Pydantic models
Database Layer (database.py): SQLite async operations
Flag Schema (flag_schema.py): Dynamic flag inference from Pydantic models
State Manager (state_manager.py): Cache of current SkyPilot state
Desired State Manager (desired_state.py): CRUD for experiments
Background Poller (poller.py): Polls SkyPilot every 30s
Reconciler (reconciler.py): Brings current state → desired state
FastAPI Backend (app.py): REST API and static file serving
Web UI (static/): Single-page dashboard

Data Model

Experiment: Configuration template that spawns jobs

Has desired_state (what you want) and current_state (actual status)
Configuration stored as flags: Dict[str, Union[str, int, float, bool]]
Only one running job per experiment at a time

Job: Single SkyPilot job execution

Linked to parent experiment
Full execution history: timestamps, status, logs, exit code
Status: INIT, PENDING, RUNNING, SUCCEEDED, FAILED, CANCELLED

Reconciliation Loop

User sets desired_state=RUNNING
  ↓
Reconciler detects mismatch
  ↓
Calls sky.launch() with experiment flags
  ↓
Creates new Job record
  ↓
Poller updates job status from SkyPilot
  ↓
Experiment current_state updated

Usage Examples

Create and Run Experiment

experiment = Experiment(
    id="ppo_4layer",
    name="PPO 4 Layers",
    flags={
        "trainer.losses.ppo.enabled": True,
        "policy_architecture.core_resnet_layers": 4,
    },
    base_command="lt",
    run_name="daveey.ppo_4layer",
    nodes=4,
    gpus=4,
    desired_state=DesiredState.RUNNING,
)
await db.save_experiment(experiment)

Grid Search

for layers in [1, 4, 16, 64]:
    experiment = Experiment(
        id=f"ppo_{layers}layer",
        name=f"PPO {layers} Layers",
        flags={
            "trainer.losses.ppo.enabled": True,
            "policy_architecture.core_resnet_layers": layers,
        },
        nodes=4,
        gpus=4,
        desired_state=DesiredState.RUNNING,
    )
    await db.save_experiment(experiment)
# Reconciler automatically launches all experiments!

View Job History

jobs = await db.get_jobs_for_experiment("ppo_4layer", limit=10)
for job in jobs:
    print(f"{job.id}: {job.status} (exit={job.exit_code})")

API Endpoints

Experiments

GET /api/experiments - List all experiments
POST /api/experiments - Create new experiment
GET /api/experiments/{id} - Get experiment details
DELETE /api/experiments/{id} - Delete experiment
POST /api/experiments/{id}/state - Update desired state
GET /api/experiments/{id}/status - Full status with current job
GET /api/experiments/{id}/jobs - Job history
POST /api/experiments/{id}/flags - Update flags

Jobs

POST /api/jobs/{id}/cancel - Cancel running job

System

GET /api/health - System health status
GET /api/flag-schemas - Flag metadata for autocomplete
POST /api/refresh - Force SkyPilot state refresh
POST /api/reconcile - Force reconciliation

Configuration

Environment variables:

SKYDECK_DB_PATH: Database file path (default: skydeck.db)
SKYDECK_POLL_INTERVAL: Poll interval in seconds (default: 30)
SKYDECK_RECONCILE_INTERVAL: Reconcile interval in seconds (default: 60)

Command line options:

python -m skydeck.run --host 0.0.0.0 --port 8000 --db-path /path/to/db.sqlite

Development

# Install dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run with auto-reload
uv run uvicorn skydeck.app:app --reload

Key Design Principles

Experiments are templates, jobs are instances - Like classes vs objects
Flags are first-class - Dynamically typed, inferred from Pydantic
SQLite for everything - Survives restarts, single dependency
Automatic reconciliation - Set desired state, system does the rest
Job history tracking - Never lose experiment outcomes
One running job per experiment - Enforced by reconciler

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
frontend		frontend
skydeck		skydeck
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TESTING.md		TESTING.md
backfill_experiments.py		backfill_experiments.py
discover_api.py		discover_api.py
pyproject.toml		pyproject.toml
test_checkbox.py		test_checkbox.py
test_edit_persistence.js		test_edit_persistence.js
test_links.py		test_links.py
test_page.py		test_page.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SkyDeck Dashboard

Features

Quick Start

Architecture

Core Components

Data Model

Reconciliation Loop

Usage Examples

Create and Run Experiment

Grid Search

View Job History

API Endpoints

Experiments

Jobs

System

Configuration

Development

Key Design Principles

License

About

Uh oh!

Releases

Packages

Languages

Metta-AI/skydeck

Folders and files

Latest commit

History

Repository files navigation

SkyDeck Dashboard

Features

Quick Start

Architecture

Core Components

Data Model

Reconciliation Loop

Usage Examples

Create and Run Experiment

Grid Search

View Job History

API Endpoints

Experiments

Jobs

System

Configuration

Development

Key Design Principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages