Skip to content

Title : opsctrl fix — View or apply the most recent LLM-recommended patch for a pod #5

@orchide

Description

@orchide

As an

Engineer who previously ran opsctrl diagnose

I want to

retrieve and optionally apply the recommended fix for a pod issue

So that

I can resolve issues faster using previously validated suggestions


✅ Acceptance Criteria

🧩 CLI Behavior

  1. Command Format:

    • User runs: opsctrl fix <pod> --namespace <ns>
  2. Fetch Prior Diagnosis:

    • CLI sends a request to the backend with:

      • pod name, namespace, org_id, auth token
    • Backend returns the last diagnosis for this pod (within a TTL window, e.g. 15 mins).

  3. Display Mode (default):

    • CLI prints:

      💡 Suggested Fix (from last diagnosis):
      - "Update the image tag to a valid one or configure imagePullSecrets."
      
      🛠️ Suggested Patch:
      kubectl set image deployment/api api=myregistry.io/api:1.2.4
      
      ⚠️ Apply this fix with: opsctrl fix <pod> --apply
  4. Apply Mode (opt-in):

    • If --apply flag is passed:

      • CLI prompts for confirmation:

        You are about to apply the suggested fix to pod 'api-123' in namespace 'tools'.
        This may affect live workloads.
        
        Proceed? (y/N)
        
      • On confirmation, CLI runs the fix using:

        • kubectl, if it’s a safe 1-liner (e.g., image update)
        • Or creates a patch YAML and applies it via kubectl apply -f -
      • Output includes success/failure and the actual command run.

  5. RBAC & Gating:

    • If the user is not authorized (based on org policy):

      • CLI exits with:

        🚫 Fix application is disabled for your role. Contact your platform admin.
        

🔐 Backend Responsibilities

  1. Last Diagnosis Lookup:

    • API: GET /diagnosis/<pod>?namespace=<ns>

    • Lookup by pod + org + timestamp

    • Must return:

      {
        "diagnosis": "...",
        "suggested_fix": "...",
        "fix_command": "...",
        "confidence_score": 0.92,
        "timestamp": "2025-05-07T13:04Z"
      }
  2. Role Validation:

    • Return RBAC context (e.g., canApplyFix: true/false)
    • Fix metadata includes runnable: true/false
  3. Audit Log (if applied):

    • If user runs with --apply, backend receives webhook or event to log:

      • Who ran the fix
      • Pod, time, command, outcome

🧩 Suggested Tasks (GitHub Issues)

CLI

  • [CLI] fix Command Scaffolding
  • [CLI] Fetch + Display Suggested Fix
  • [CLI] Confirmation Prompt + --apply Execution
  • [CLI] RBAC Gate Handling + Error Messaging
  • [CLI] Audit Hook (optional)

Backend

  • [Backend] Diagnosis Lookup Endpoint
  • [Backend] RBAC Enforcement Logic
  • [Backend] Fix Metadata Model (Patch/Command)
  • [Backend] Audit Logging (Fixes Applied)

Integration

  • [Tests] End-to-End Fix Flow with Real Diagnosis
  • [Tests] RBAC + Edge Cases (no diagnosis, expired fix)

Would you like me to draft the GitHub issue templates for these as well? Or do you want to continue shaping more features (e.g., audit log views, dashboard webhook, Slack flow, etc.)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions