CTFd plugin for provisioning on-demand remote desktop sessions across a distributed cluster of Docker hosts.
This plugin orchestrates containerized desktop environments across multiple remote hosts, using SSH for control plane communication and nginx for VNC traffic proxying. Container creation runs asynchronously via gevent greenlets to avoid blocking request handlers.
CTFd (gevent WSGI)
|
+-- ContainerManager (spawns gevent greenlets)
|
+-- HostOrchestrator (load balances across hosts)
|
+-- ConnectionPool[] (SSH to each Docker host)
|
+-- Remote Docker Hosts (run VNC containers)
|
+-- nginx auth_request (validates, proxies VNC traffic)
Loads hosts.yml containing workspace host definitions, container resource limits, and session timer defaults. Falls back to localhost if config is missing.
Per-host paramiko SSH connection pool with max 20 connections. Uses paramiko's default SSH authentication (SSH agent, default key locations, agent forwarding). Validates connection health on checkout/checkin to handle stale TCP connections.
Lock contention minimized by keeping available_connections.get() call inside lock only when pool is exhausted, preventing connection count from exceeding max under concurrent load.
Manages multiple connection pools and tracks per-host container counts. Implements least-loaded scheduling by sorting hosts by active container count. Marks hosts unhealthy on failure to remove from rotation.
On startup, tests connectivity to all configured hosts:
- SSH connection test
- Docker daemon accessibility (
docker ps) - Docker image presence (
docker image inspect)
Hosts that fail any check are marked unhealthy and excluded from scheduling. Results logged to admin event feed.
Single global lock protects all host state since operations are infrequent and contention is minimal.
Core state machine for container lifecycle:
Creation flow:
- User requests session via
/api/create - Spawns gevent greenlet with Flask app context
- Background task selects least-loaded healthy host
- Checks out SSH connection from pool
- Executes
docker runwith dynamic port mapping (0:5900, 0:6080) - Polls
docker portto discover mapped ports - HTTP polls noVNC endpoint until ready (max 90s)
- Updates
active_containersdict and initializes session timer - Returns connection to pool
Destruction flow:
- User or periodic cleanup triggers destruction
- SSH to host, execute
docker stopanddocker rm - Release host slot, clear timer and status
Session timers:
- Initial duration: 3600s (configurable)
- Extension duration: 1200s (configurable)
- Max extensions: 3 (configurable)
- Timer starts on first status poll after container ready
- Periodic cleanup (300s interval) auto-destroys expired sessions
All state mutations protected by single lock. Status dicts (active_containers, session_timers, creation_status) keyed by user_id.
Error handling in creation path individually wraps SSH checkin, slot release, and health marking to prevent cascading failures from masking root cause.
Thread-safe event log using deque with 500 event limit. Supports real-time listener callbacks for SSE streaming to admin dashboard. Failed listeners are removed and logged.
The plugin relies on nginx auth_request subrequest pattern for access control and dynamic backend routing.
- Client requests
/remote-desktop/vnc/123/websockify - Nginx extracts user_id=123 from URL via regex capture
- Nginx triggers internal subrequest to
/remote-desktop/auth-checkwithX-User-ID: 123header - CTFd validates:
- User is authenticated
- User owns container 123 OR user is admin
- Container exists
- CTFd returns 200 with
X-VNC-Host: hostnameandX-VNC-Port: 12345headers - Nginx captures headers via
auth_request_setand proxies tohttp://$vnc_host:$vnc_port/websockify - WebSocket upgrade headers preserved for VNC connection
Static assets (vnc.html, noVNC JS) follow same pattern but without WebSocket upgrade.
- Nginx handles WebSocket proxying and load distribution
- CTFd maintains session state and authorization logic
- No need for CTFd to serve binary VNC traffic
- Per-request authorization without shared session store
- Dynamic backend routing without nginx config reloads
CTFd runs under gunicorn with gevent workers. This plugin uses:
gevent.spawn() for container creation to avoid blocking request threads during SSH operations and container startup polling.
threading.Lock for state protection since gevent greenlets within same worker share memory. All state dicts are guarded by component-level locks.
Background cleanup thread runs via threading.Thread(daemon=True) to periodically scan for expired sessions. Uses Event.wait(300) for cancellable sleep.
workspace_hosts:
- hostname: host1.internal
user: docker_user
pub_hostname: host1.external.com
docker_image: ctfd-remote-desktop:latest
container_defaults:
memory_limit: 4g
shm_size: 2gb
resolution: 1920x1080
cpu_limit: 2
session_defaults:
initial_duration: 3600
extension_duration: 1200
max_extensions: 3ConnectionPool uses paramiko's default SSH authentication which automatically tries:
- SSH agent (if available)
- Default key locations (~/.ssh/id_rsa, ~/.ssh/id_ed25519, etc.)
- SSH agent forwarding (if configured)
No explicit key paths needed. Works with standard SSH setups and deployment tools.
Image must:
- Expose VNC on port 5900
- Expose noVNC on port 6080
- Accept
VNC_PASSWORDandRESOLUTIONenv vars - Serve noVNC web client at
/vnc.html - Provide WebSocket endpoint at
/websockify
GET /remote-desktop- Main UI pagePOST /remote-desktop/api/create- Request new sessionGET /remote-desktop/api/creation-status- Poll creation progressGET /remote-desktop/api/status- Get current session statusPOST /remote-desktop/api/destroy- Destroy current sessionPOST /remote-desktop/api/extend- Extend session timer
GET /remote-desktop/admin- Admin dashboardGET /remote-desktop/admin/api/containers- List all active sessionsPOST /remote-desktop/admin/api/kill- Force kill any user sessionPOST /remote-desktop/admin/api/extend- Extend any user sessionGET /remote-desktop/admin/api/events/stream- SSE event streamGET /remote-desktop/admin/api/events/recent- Recent event log
GET /remote-desktop/api/auth-check- Nginx subrequest for VNC auth
Hosts are marked unhealthy when:
- Startup connectivity test fails (SSH, docker daemon, or image missing)
- Container creation fails
- SSH connection fails
Startup tests check each host for:
- SSH connectivity
- Docker daemon accessibility
- Required image presence (does not pull automatically)
Unhealthy hosts are excluded from scheduling but pools remain active. Currently no automatic recovery mechanism - requires manual intervention via admin API.
Periodic cleanup (300s): Scans session_timers for expired entries and auto-destroys containers.
Signal handlers (SIGTERM, SIGINT): Spawns cleanup_all_containers via gevent with 2s grace period before exit.
atexit handler: Calls cleanup_all_containers to stop all containers on process shutdown.
All cleanup operations iterate over snapshot of active_containers to avoid dict mutation during iteration.
All shared state protected by component-level locks:
- ConnectionPool.lock - connection count and creation
- ContainerManager.lock - active_containers, session_timers, creation_status
- HostOrchestrator.global_lock - host_container_counts, host_health
- EventLogger.lock - events deque, listeners list
Lock acquisition order is deterministic (never nested) to prevent deadlock.
To add custom session lifecycle hooks:
- Add listener to EventLogger for session_created, session_destroyed, session_expired events
- Subscribe to admin events stream for real-time monitoring
- Extend ContainerManager timer logic for custom expiration policies