Skip to content

Hillyon-Labs/opsctrl_daemon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpsCtrl Daemon

License: MIT Docker Helm

Kubernetes pod monitoring daemon with automated failure detection, diagnosis, and Slack alerting.

Overview

OpsCtrl Daemon watches your Kubernetes cluster for pod failures and automatically diagnoses root causes using LLM-powered analysis. When issues occur, it sends detailed alerts with remediation suggestions directly to Slack.

Key Features:

  • Real-time detection of CrashLoopBackOff, OOMKill, ImagePull failures, and more
  • AI-powered root cause analysis with actionable fix suggestions
  • Slack integration for instant incident notifications
  • Read-only operation - never executes into containers or accesses secrets
  • Lightweight single-replica deployment

Table of Contents

Installation

Prerequisites

  • Kubernetes 1.21+
  • Helm 3.x
  • An OpsCtrl account (sign up free)

Quick Start (Helm)

# 1. Add the OpsCtrl Helm repository
helm repo add opsctrl https://charts.opsctrl.dev
helm repo update

# 2. Create the namespace
kubectl create namespace opsctrl


# 3. Install OpsCtrl Daemon
helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --set clusterRegistration.clusterName="my-cluster" \
  --set clusterRegistration.userEmail="you@example.com" \
  --set monitoring.watchNamespaces="default" \

Verify Installation

# Check the pod is running
kubectl get pods -n opsctrl

# View logs to confirm monitoring started
kubectl logs -n opsctrl -l app.kubernetes.io/name=opsctrl-daemon -f

You should see:

πŸš€ Starting opsctrl-daemon...
πŸ”— Cluster registration is required before starting monitoring...
βœ… Cluster registered successfully: <cluster-id>
βœ… Monitoring started for 1 namespaces

Installation Options

Monitor multiple namespaces
helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --set clusterRegistration.clusterName="my-cluster" \
  --set clusterRegistration.userEmail="you@example.com" \
  --set monitoring.watchNamespaces="default\,staging\,production" \
  --set secrets.existingSecret="opsctrl-secrets"

Note: Escape commas with \, in --set or use a values file instead.

Using a values file

Create my-values.yaml:

clusterRegistration:
  clusterName: "production-cluster"
  userEmail: "platform-team@company.com"

monitoring:
  watchNamespaces: "default,staging,production"
  excludeNamespaces: "kube-system,kube-public"
  minRestartThreshold: 3

secrets:
  existingSecret: "opsctrl-secrets"

Install with:

helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  -f my-values.yaml
Upgrade an existing installation
helm repo update
helm upgrade opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --reuse-values
Uninstall
helm uninstall opsctrl-daemon --namespace opsctrl
kubectl delete namespace opsctrl

kubectl (Alternative)

kubectl apply -f https://raw.githubusercontent.com/Hillyon-Labs/opsctrl_daemon/main/k8s-deployment.yaml

See values.yaml for all configuration options.

Configuration

Environment Variables

Variable Description Required Default
WATCH_NAMESPACES Comma-separated namespaces to monitor Yes -
OPSCTRL_BACKEND_URL Backend API URL Yes -
CLUSTER_NAME Unique cluster identifier No -
WEBHOOK_URL Slack webhook URL for alerts No -
MIN_RESTART_THRESHOLD Container restarts before alerting No 3
LOG_LEVEL Logging verbosity (error, warn, info, debug) No info

Monitored Failure Types

Failure Type Description
CrashLoopBackOff Container repeatedly crashing
OOMKilled Out of memory termination
ImagePullBackOff Failed to pull container image
Pending Pod stuck in pending state
Failed Pod entered failed phase

Usage

Viewing Logs

kubectl logs -n monitoring -l app.kubernetes.io/name=opsctrl-daemon -f

Health Check

kubectl port-forward -n monitoring svc/opsctrl-daemon 3000:3000
curl http://localhost:3000/health

Example Slack Alert

When a failure is detected, you'll receive alerts like:

πŸ›‘ CrashLoopBackOff in orders-api (production)

Root Cause: Readiness probe failing on /healthz - connection timeout after 1s

Suggested Fix:
kubectl patch deployment orders-api -n production \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/timeoutSeconds","value":5}]'

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Kubernetes Cluster                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Pod A     β”‚    β”‚   Pod B     β”‚    β”‚   Pod C     β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚         β”‚                 β”‚                  β”‚              β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                          β–Ό                                  β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚                 β”‚ OpsCtrl Daemon  β”‚ ◄── Watch API          β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ OpsCtrl Backend β”‚ ◄── LLM Analysis
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                           β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚     Slack       β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Development

Prerequisites

  • Node.js 20+
  • npm
  • Access to a Kubernetes cluster (local or remote)

Setup

# Clone the repository
git clone https://github.com/Hillyon-Labs/opsctrl_daemon.git
cd opsctrl_daemon

# Install dependencies
npm install

# Copy environment template
cp .env.example .env
# Edit .env with your configuration

# Run in development mode
npm run dev

Testing

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

Building

# Build TypeScript
npm run build

# Build Docker image
docker build -t opsctrl/daemon:local .

RBAC & Security

The daemon operates in read-only mode and requires minimal permissions:

Resource Verbs Purpose
pods get, list, watch Monitor pod status
namespaces get, list, watch Namespace filtering
events get, list, watch, create Failure detection
leases get, list, watch, create, update, patch Leader election

The daemon does not:

  • Execute into containers
  • Access Secrets or ConfigMaps
  • Modify any workloads
  • Send container logs externally

Contributing

Contributions are welcome! Please read our Contributing Guide before submitting a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Support

License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with care by Hillyon Labs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published