OpsCtrl Daemon

Kubernetes pod monitoring daemon with automated failure detection, diagnosis, and Slack alerting.

Overview

OpsCtrl Daemon watches your Kubernetes cluster for pod failures and automatically diagnoses root causes using LLM-powered analysis. When issues occur, it sends detailed alerts with remediation suggestions directly to Slack.

Key Features:

Real-time detection of CrashLoopBackOff, OOMKill, ImagePull failures, and more
AI-powered root cause analysis with actionable fix suggestions
Slack integration for instant incident notifications
Read-only operation - never executes into containers or accesses secrets
Lightweight single-replica deployment

Installation

Prerequisites

Kubernetes 1.21+
Helm 3.x
An OpsCtrl account (sign up free)

Quick Start (Helm)

# 1. Add the OpsCtrl Helm repository
helm repo add opsctrl https://charts.opsctrl.dev
helm repo update

# 2. Create the namespace
kubectl create namespace opsctrl


# 3. Install OpsCtrl Daemon
helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --set clusterRegistration.clusterName="my-cluster" \
  --set clusterRegistration.userEmail="you@example.com" \
  --set monitoring.watchNamespaces="default" \

Verify Installation

# Check the pod is running
kubectl get pods -n opsctrl

# View logs to confirm monitoring started
kubectl logs -n opsctrl -l app.kubernetes.io/name=opsctrl-daemon -f

You should see:

🚀 Starting opsctrl-daemon...
🔗 Cluster registration is required before starting monitoring...
✅ Cluster registered successfully: <cluster-id>
✅ Monitoring started for 1 namespaces

Installation Options

Monitor multiple namespaces

helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --set clusterRegistration.clusterName="my-cluster" \
  --set clusterRegistration.userEmail="you@example.com" \
  --set monitoring.watchNamespaces="default\,staging\,production" \
  --set secrets.existingSecret="opsctrl-secrets"

Note: Escape commas with \, in --set or use a values file instead.

Using a values file

Create my-values.yaml:

clusterRegistration:
  clusterName: "production-cluster"
  userEmail: "platform-team@company.com"

monitoring:
  watchNamespaces: "default,staging,production"
  excludeNamespaces: "kube-system,kube-public"
  minRestartThreshold: 3

secrets:
  existingSecret: "opsctrl-secrets"

Install with:

helm install opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  -f my-values.yaml

Upgrade an existing installation

helm repo update
helm upgrade opsctrl-daemon opsctrl/opsctrl-daemon \
  --namespace opsctrl \
  --reuse-values

Uninstall

helm uninstall opsctrl-daemon --namespace opsctrl
kubectl delete namespace opsctrl

kubectl (Alternative)

kubectl apply -f https://raw.githubusercontent.com/Hillyon-Labs/opsctrl_daemon/main/k8s-deployment.yaml

See values.yaml for all configuration options.

Configuration

Environment Variables

Variable	Description	Required	Default
`WATCH_NAMESPACES`	Comma-separated namespaces to monitor	Yes	-
`OPSCTRL_BACKEND_URL`	Backend API URL	Yes	-
`CLUSTER_NAME`	Unique cluster identifier	No	-
`WEBHOOK_URL`	Slack webhook URL for alerts	No	-
`MIN_RESTART_THRESHOLD`	Container restarts before alerting	No	`3`
`LOG_LEVEL`	Logging verbosity (error, warn, info, debug)	No	`info`

Monitored Failure Types

Failure Type	Description
`CrashLoopBackOff`	Container repeatedly crashing
`OOMKilled`	Out of memory termination
`ImagePullBackOff`	Failed to pull container image
`Pending`	Pod stuck in pending state
`Failed`	Pod entered failed phase

Usage

Viewing Logs

kubectl logs -n monitoring -l app.kubernetes.io/name=opsctrl-daemon -f

Health Check

kubectl port-forward -n monitoring svc/opsctrl-daemon 3000:3000
curl http://localhost:3000/health

Example Slack Alert

When a failure is detected, you'll receive alerts like:

🛑 CrashLoopBackOff in orders-api (production)

Root Cause: Readiness probe failing on /healthz - connection timeout after 1s

Suggested Fix:
kubectl patch deployment orders-api -n production \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/timeoutSeconds","value":5}]'

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │   Pod A     │    │   Pod B     │    │   Pod C     │     │
│  └─────────────┘    └─────────────┘    └─────────────┘     │
│         │                 │                  │              │
│         └────────────────┼──────────────────┘              │
│                          ▼                                  │
│                 ┌─────────────────┐                        │
│                 │ OpsCtrl Daemon  │ ◄── Watch API          │
│                 └────────┬────────┘                        │
└──────────────────────────┼──────────────────────────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │ OpsCtrl Backend │ ◄── LLM Analysis
                  └────────┬────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │     Slack       │
                  └─────────────────┘

Development

Prerequisites

Node.js 20+
npm
Access to a Kubernetes cluster (local or remote)

Setup

# Clone the repository
git clone https://github.com/Hillyon-Labs/opsctrl_daemon.git
cd opsctrl_daemon

# Install dependencies
npm install

# Copy environment template
cp .env.example .env
# Edit .env with your configuration

# Run in development mode
npm run dev

Testing

# Run tests
npm test

# Run tests with coverage
npm run test:coverage

Building

# Build TypeScript
npm run build

# Build Docker image
docker build -t opsctrl/daemon:local .

RBAC & Security

The daemon operates in read-only mode and requires minimal permissions:

Resource	Verbs	Purpose
`pods`	get, list, watch	Monitor pod status
`namespaces`	get, list, watch	Namespace filtering
`events`	get, list, watch, create	Failure detection
`leases`	get, list, watch, create, update, patch	Leader election

The daemon does not:

Execute into containers
Access Secrets or ConfigMaps
Modify any workloads
Send container logs externally

Contributing

Contributions are welcome! Please read our Contributing Guide before submitting a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Support

GitHub Issues - Bug reports and feature requests
Documentation - Full documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with care by Hillyon Labs

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.claude		.claude
.github		.github
helm/opsctrl-daemon		helm/opsctrl-daemon
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README-Docker.md		README-Docker.md
README.md		README.md
demo-startup-flow.md		demo-startup-flow.md
docker-compose.yml		docker-compose.yml
helm-example.yaml		helm-example.yaml
jest.config.js		jest.config.js
k8s-daemonset.yaml		k8s-daemonset.yaml
k8s-deployment.yaml		k8s-deployment.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpsCtrl Daemon

Overview

Table of Contents

Installation

Prerequisites

Quick Start (Helm)

Verify Installation

Installation Options

kubectl (Alternative)

Configuration

Environment Variables

Monitored Failure Types

Usage

Viewing Logs

Health Check

Example Slack Alert

Architecture

Development

Prerequisites

Setup

Testing

Building

RBAC & Security

Contributing

Support

License

About

Uh oh!

Releases 8

Packages

Languages

Hillyon-Labs/opsctrl_daemon

Folders and files

Latest commit

History

Repository files navigation

OpsCtrl Daemon

Overview

Table of Contents

Installation

Prerequisites

Quick Start (Helm)

Verify Installation

Installation Options

kubectl (Alternative)

Configuration

Environment Variables

Monitored Failure Types

Usage

Viewing Logs

Health Check

Example Slack Alert

Architecture

Development

Prerequisites

Setup

Testing

Building

RBAC & Security

Contributing

Support

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages