Skip to content

feat: Implement k8s-job runtime for ephemeral workloads #5

@gouravjshah

Description

@gouravjshah

Summary

Implement Kubernetes Job runtime to spawn MCP servers as ephemeral K8s Jobs, ideal for batch operations and one-shot tasks.

Parent Epic

Part of #1 - Production Kubernetes & Container Support

Use Cases

  • Running data migrations via MCP
  • Batch processing tasks
  • One-time analysis jobs
  • Resource-intensive operations with defined completion

RuntimeConfig Addition

pub enum RuntimeConfig {
    // ... existing variants
    KubernetesJob {
        namespace: String,
        image: String,
        service_account: Option<String>,
        resources: Option<ResourceRequirements>,
        env: HashMap<String, String>,
        env_from: Vec<EnvFromSource>,
        volumes: Vec<Volume>,
        ttl_seconds_after_finished: Option<i32>,
    },
}

Catalog Example

servers:
  - id: data-migration
    runtime:
      type: k8s-job
      namespace: mcp-workloads
      image: myregistry/migration-mcp:v1.2
      service_account: mcp-job-runner
      resources:
        requests:
          memory: "256Mi"
          cpu: "100m"
        limits:
          memory: "1Gi"
          cpu: "1000m"
      env:
        DATABASE_URL: "${DATABASE_URL}"
      ttl_seconds_after_finished: 3600

Implementation Details

Job Creation Flow

  1. Client calls tool on server
  2. Gateway creates K8s Job with generated name
  3. Wait for pod to be ready
  4. Stream stdio via pod exec/logs
  5. Job completes or times out
  6. Cleanup based on TTL

Communication Pattern

Gateway <--exec/logs--> Job Pod <--stdio--> MCP Server

Features Required

  • Job creation with configurable spec
  • Pod readiness waiting
  • Stdio streaming via exec
  • Log retrieval
  • Automatic cleanup (TTL)
  • Error handling (ImagePullBackOff, OOMKilled, etc.)

Acceptance Criteria

  • Can define server with type: k8s-job in catalog
  • Job is created in specified namespace
  • ServiceAccount is properly configured
  • Resource limits are applied
  • Stdio communication works
  • Jobs are cleaned up after TTL
  • Errors are properly reported

Dependencies

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestkubernetesKubernetes relatedruntimeRuntime implementationv0.3Version 0.3 features

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions