GPU Scheduler

A command-line utility built on top of nvitop that monitors NVIDIA GPUs and automatically executes queued commands when specified GPUs become idle.

✨ Features

🔍 GPU Monitoring: Real-time GPU status display with live updates
📋 Task Queue: Persistent queue for commands waiting on specific GPUs
⚙️ Background Scheduler: Daemon process that continues running after TUI exit
🎯 Auto-Execution: Automatically launches tasks when GPUs become idle
💾 Data Persistence: All data saved to disk, survives TUI/terminal restarts
🎨 Interactive TUI: Beautiful terminal UI for task management
🔔 Notifications: Email, webhook, or stdout notifications
🔒 Single Instance: Prevents multiple scheduler conflicts

🚀 Quick Start

Installation

Install globally to use gst anywhere (Recommended):

# Using uv
uv tool install .

# Using pipx
pipx install .

Or install in the current virtual environment:

# Using pip
pip install .

5-Minute Tutorial

1. Launch TUI

gst

2. Submit a task

Select "📝 Submit"
Enter GPU indices: 0
Enter command: python train.py --epochs 3
Confirm

3. Start scheduler

Select "⚙️ Start Scheduler"
Exit TUI (scheduler keeps running!)

4. Check status anytime

gst  # Reopen TUI
# Select "📊 View Status" to see running tasks

That's it! Your tasks will execute automatically when GPUs become idle. 🎉

Command-Line Quick Reference

# Monitor GPU
gpus monitor --watch

# Submit task
gpus submit --gpu 0 -- python train.py

# View queue
gpus queue

# Start scheduler (background)
nohup gpus scheduler > scheduler.log 2>&1 &

📖 Documentation

New to GPU Scheduler? Start here:

Quick Start Guide - 5-minute tutorial
FAQ - Common questions and solutions

Need detailed reference?

CLI Command Reference - Complete command documentation
Documentation Center - Full documentation index

Want to understand internals?

Workflow - How scheduling works
Persistence - Data storage guarantees

Developer resources:

Testing Guide - Run and write tests
UI Theming - TUI customization

💡 Key Concepts

Non-Interactive Shell Environment

⚠️ Important: Tasks run in non-interactive shells without .bashrc or conda activation.

Solution - Use absolute paths:

# ✅ Recommended
gpus submit --gpu 0 -- /path/to/.venv/bin/python train.py

# ✅ Or use wrapper script
cat > run_task.sh << 'EOF'
#!/bin/bash
cd /path/to/project
source .venv/bin/activate
python train.py
EOF

gpus submit --gpu 0 -- bash run_task.sh

See FAQ - Virtual Environments for details.

Background Scheduler

The scheduler runs independently as a daemon process:

# Start via TUI
gst → Start Scheduler → Exit

# Or via CLI (recommended for production)
nohup gpus scheduler --log-dir ./logs > scheduler.log 2>&1 &

# Scheduler survives:
# ✅ TUI exit
# ✅ Terminal close  
# ✅ SSH disconnect

See FAQ - Background Running for production setup.

Data Persistence

All data stored in ~/.local/share/gpu_scheduler/:

~/.local/share/gpu_scheduler/
├── queue.json              # Task queue
├── task_history.json       # Execution history
├── scheduler_state.json    # Running tasks
└── scheduler.log           # Logs

Guarantees:

✅ Tasks persist across TUI sessions
✅ Queue survives system restarts
✅ Atomic operations prevent corruption

🎯 Common Use Cases

Multi-Task Training

# Submit multiple training jobs
gpus submit --gpu 0 -- python train_model_a.py
gpus submit --gpu 0 -- python train_model_b.py
gpus submit --gpu 1 -- python train_model_c.py

# Start scheduler
gst → Start Scheduler → Exit

# Jobs execute automatically in order

Multi-GPU Task

# Task requires 2 GPUs
gpus submit --gpu 0 1 -- python large_model.py

# Will wait until BOTH GPUs are idle

Task Retry

# Enable automatic retry (3 attempts, exponential backoff)
gpus submit --gpu 0 --retry -- python flaky_job.py

# Custom retry
gpus submit \
  --gpu 0 \
  --retry \
  --max-retries 5 \
  --retry-delay 60.0 \
  -- python train.py

Stop Running Task

# Via TUI
gst → Stop Task → Select task → Confirm

# Via CLI
gpus queue --stop <task_id>

# Graceful termination:
# 1. SIGTERM sent (5s timeout)
# 2. SIGKILL if needed
# 3. GPU resources released

🧪 Testing

# Quick GPU test (allocate 2GB for 60s)
python tests/tools/gpu_test.py --memory 2048 --duration 60

# Interactive test scenarios
bash tests/tools/run_scenarios.sh

# Run automated tests
pytest tests/integration/

See tests/README.md for comprehensive testing guide.

🆘 Troubleshooting

Tasks not executing?

Check scheduler is running: ps aux | grep "gpus scheduler"
Check GPU status: gpus monitor
View logs: tail -f ~/.local/share/gpu_scheduler/scheduler.log

Virtual environment issues?

Use absolute Python paths: /path/to/.venv/bin/python
See FAQ - Virtual Environments

More help:

Check FAQ
Review Documentation
Submit GitHub Issue

🤝 Contributing

Contributions welcome! Please check:

Documentation - Understanding the project
Testing Guide - Running tests
Existing issues before submitting PRs

📄 License

See LICENSE file for details.

🙏 Acknowledgments

Built on top of the excellent nvitop library.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
gpu_scheduler		gpu_scheduler
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPU Scheduler

✨ Features

🚀 Quick Start

Installation

5-Minute Tutorial

Command-Line Quick Reference

📖 Documentation

💡 Key Concepts

Non-Interactive Shell Environment

Background Scheduler

Data Persistence

🎯 Common Use Cases

Multi-Task Training

Multi-GPU Task

Task Retry

Stop Running Task

🧪 Testing

🆘 Troubleshooting

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

huhusmang/GPU-Scheduler

Folders and files

Latest commit

History

Repository files navigation

GPU Scheduler

✨ Features

🚀 Quick Start

Installation

5-Minute Tutorial

Command-Line Quick Reference

📖 Documentation

💡 Key Concepts

Non-Interactive Shell Environment

Background Scheduler

Data Persistence

🎯 Common Use Cases

Multi-Task Training

Multi-GPU Task

Task Retry

Stop Running Task

🧪 Testing

🆘 Troubleshooting

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages