-
Notifications
You must be signed in to change notification settings - Fork 0
Description
As an
Engineer who previously ran opsctrl diagnose
I want to
retrieve and optionally apply the recommended fix for a pod issue
So that
I can resolve issues faster using previously validated suggestions
✅ Acceptance Criteria
🧩 CLI Behavior
-
Command Format:
- User runs:
opsctrl fix <pod> --namespace <ns>
- User runs:
-
Fetch Prior Diagnosis:
-
CLI sends a request to the backend with:
pod name,namespace,org_id, auth token
-
Backend returns the last diagnosis for this pod (within a TTL window, e.g. 15 mins).
-
-
Display Mode (default):
-
CLI prints:
💡 Suggested Fix (from last diagnosis): - "Update the image tag to a valid one or configure imagePullSecrets." 🛠️ Suggested Patch: kubectl set image deployment/api api=myregistry.io/api:1.2.4 ⚠️ Apply this fix with: opsctrl fix <pod> --apply
-
-
Apply Mode (opt-in):
-
If
--applyflag is passed:-
CLI prompts for confirmation:
You are about to apply the suggested fix to pod 'api-123' in namespace 'tools'. This may affect live workloads. Proceed? (y/N) -
On confirmation, CLI runs the fix using:
kubectl, if it’s a safe 1-liner (e.g., image update)- Or creates a patch YAML and applies it via
kubectl apply -f -
-
Output includes success/failure and the actual command run.
-
-
-
RBAC & Gating:
-
If the user is not authorized (based on org policy):
-
CLI exits with:
🚫 Fix application is disabled for your role. Contact your platform admin.
-
-
🔐 Backend Responsibilities
-
Last Diagnosis Lookup:
-
API:
GET /diagnosis/<pod>?namespace=<ns> -
Lookup by pod + org + timestamp
-
Must return:
{ "diagnosis": "...", "suggested_fix": "...", "fix_command": "...", "confidence_score": 0.92, "timestamp": "2025-05-07T13:04Z" }
-
-
Role Validation:
- Return RBAC context (e.g., canApplyFix: true/false)
- Fix metadata includes
runnable: true/false
-
Audit Log (if applied):
-
If user runs with
--apply, backend receives webhook or event to log:- Who ran the fix
- Pod, time, command, outcome
-
🧩 Suggested Tasks (GitHub Issues)
CLI
- [CLI]
fixCommand Scaffolding - [CLI] Fetch + Display Suggested Fix
- [CLI] Confirmation Prompt +
--applyExecution - [CLI] RBAC Gate Handling + Error Messaging
- [CLI] Audit Hook (optional)
Backend
- [Backend] Diagnosis Lookup Endpoint
- [Backend] RBAC Enforcement Logic
- [Backend] Fix Metadata Model (Patch/Command)
- [Backend] Audit Logging (Fixes Applied)
Integration
- [Tests] End-to-End Fix Flow with Real Diagnosis
- [Tests] RBAC + Edge Cases (no diagnosis, expired fix)
Would you like me to draft the GitHub issue templates for these as well? Or do you want to continue shaping more features (e.g., audit log views, dashboard webhook, Slack flow, etc.)?