Chashma is a production‑ready Prometheus + Grafana bundle with prebuilt dashboards and alerts for AI gateway workloads (like pine‑gate). Install it, point it at your gateway’s /metrics, and get useful graphs and SLO alerts within minutes.
- Gateway dashboards: requests/sec, latency p50/p95/p99, errors and 429s
- Backend views: per‑backend traffic share, latency, error rate
- Quota/Errors view: error codes, error%, 429 pressure; tokens/bytes placeholders
- SLO alerts: p95 > 2s, error rate > 1%, 429 rate > 5% (tunable)
- Optional: GPU (DCGM) and tracing (OTel Collector) overview panels
- Helm chart + small CLI for install, connect, and port‑forward
- Kubernetes: 1.24+
- kube‑prometheus‑stack: 55.x+ (bundled by default)
- pine‑gate: emits
gateway_*metrics as described below
- Install the chart (bundles kube‑prometheus‑stack by default)
helm upgrade --install chashma charts/chashma -n monitoring --create-namespace
- Port‑forward Prometheus and Grafana
# Prometheus
kubectl -n monitoring port-forward svc/prometheus-operated 9090:9090
# Grafana
kubectl -n monitoring port-forward svc/chashma-grafana 3000:80
# Grafana password (user: admin)
kubectl -n monitoring get secret chashma-grafana -o jsonpath='{.data.admin-password}' | base64 -d; echo
- Add a ServiceMonitor for your gateway (see next section)
CLI alternative
# Build
make cli-build
# Validate environment
./bin/chashma validate
# Install chart
./bin/chashma install --namespace monitoring
# Connect pine‑gate (replace placeholders)
./bin/chashma connect pine-gate \
--namespace monitoring \
--pine-namespace <NS> \
--pine-selector app=<APP_LABEL> \
--pine-port-name http
# Port‑forward helpers
./bin/chashma port-forward grafana
./bin/chashma port-forward prometheus
Chashma discovers scrape targets via a ServiceMonitor that must match your gateway Service.
Required shape of your Service
- Namespace: your pine‑gate namespace (for example,
default) - Label:
app=<APP_LABEL>(for pine‑gate Helm, this is usually<release>-pine-gate) - Port: named
http, serving/metrics
Create a matching ServiceMonitor (pick one)
- CLI (recommended):
./bin/chashma connect pine-gate \
--namespace monitoring \
--pine-namespace <NS> \
--pine-selector app=<APP_LABEL> \
--pine-port-name http
- Helm values (chart‑only):
helm upgrade --install chashma charts/chashma -n monitoring --reuse-values \
--set pineGate.namespace=<NS> \
--set pineGate.selector.app=<APP_LABEL> \
--set pineGate.portName=http
- Prometheus targets: open
http://localhost:9090/targets→ a pine‑gate job should be UP Discover the job label Prometheus uses:
# In Prometheus "Graph"
label_values(gateway_requests_total, job)
- Grafana dashboards:
- Browse (if not organized into folders yet, look under “k8s‑sidecar‑target‑directory”)
- In “Gateway — pine‑gate” (and other dashboards), set the
jobdashboard variable to your job value (for example,chashma-pine-gate)
# Port‑forward pine‑gate
kubectl -n <NS> port-forward svc/<pine-gate-svc> 8080:80
# Requests
curl -sS -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
-X POST localhost:8080/v1/completions -d '{"model":"echo","prompt":"hi"}'
curl -N -H 'x-api-key: dev-key' \
'http://localhost:8080/v1/stream?model=echo&prompt=hi'
# Optional load burst
for i in {1..50}; do \
curl -s -o /dev/null -H 'x-api-key: dev-key' -H 'Content-Type: application/json' \
-X POST localhost:8080/v1/completions -d '{"model":"echo","prompt":"load"}'; \
done
gateway_requests_total{route,method,backend}gateway_request_latency_seconds_bucket{route,method,backend}(histogram)gateway_request_errors_total{route,method,code,backend}These are emitted by pine‑gate. Mock exporters that do not emitgateway_*will not populate these dashboards.
kps.enabled(bool): install kube‑prometheus‑stack (default true)kps.releaseLabel(string): label to match an existing KPS release if not installingpineGate.serviceMonitor.enabled(bool): create the ServiceMonitor (default true)pineGate.namespace(string): namespace of pine‑gate ServicepineGate.selector.*(map): labels to match pine‑gate Service (e.g.,app)pineGate.portName(string): named port serving/metrics(defaulthttp)dashboards.gateway|backends|quotaErrors|gpu|tracing(bools): enable dashboardsalerts.enabled(bool): install PrometheusRule for gateway SLOs- Optional Grafana folders: set
kube-prometheus-stack.grafana.sidecar.dashboards.folderAnnotation=grafana_folder
# install/upgrade the chart
chashma install --namespace monitoring
# create/patch ServiceMonitor
chashma connect pine-gate --pine-namespace <NS> --pine-selector app=<APP_LABEL> --pine-port-name http
# list pods, SMs, services
chashma status
# port‑forward helpers
chashma port-forward grafana
chashma port-forward prometheus
# preflight check
chashma validate
# uninstall
chashma uninstall
- p95 latency > 2s for 5m (by backend/route)
- Error rate > 1% for 5m (by backend/route)
- 429 rate > 5% for 5m (by backend/route) Tune thresholds by editing the PrometheusRule or templating them via values in your fork.
- No targets in Prometheus:
- Ensure a Service exists in
<NS>withapp=<APP_LABEL>and a port namedhttp - Ensure endpoints are ready:
kubectl -n <NS> get endpoints <SVC> - Ensure the ServiceMonitor has
release: chashma,namespaceSelector.matchNames: [<NS>], andselector.matchLabels.app: <APP_LABEL> - One‑liners:
- Patch SM release:
kubectl -n monitoring patch servicemonitor chashma -p '{"metadata":{"labels":{"release":"chashma"}}}' --type=merge - Patch SM selector/namespace: use
./bin/chashma connect …orhelm upgrade --reuse-values --set pineGate.*
- Patch SM release:
- Ensure a Service exists in
- Dashboards empty:
- Set the dashboard
jobvariable to your actual job value (see Verify) - In Grafana Explore, check
gateway_requests_totalreturns series
- Set the dashboard
- Dashboards not visible:
- Check imports:
kubectl -n monitoring logs deploy/chashma-grafana -c grafana-sc-dashboard | tail -n 100 - Ensure ConfigMaps are labeled
grafana_dashboard: "1"(created by the chart) - Optional: set folderAnnotation (see Configuration)
- Check imports:
- Set a non‑default Grafana admin password:
--set kube-prometheus-stack.grafana.adminPassword=<strong-password>
- Restrict
/metricsexposure with a NetworkPolicy; do not expose pods externally - Prefer Ingress + auth (basic/OIDC) over port‑forwards in shared clusters
- Prometheus retention and resources: adjust in kube‑prometheus‑stack values to control cost
- Upgrades: use
helm upgrade --install chashma … --reuse-values
# Fetch and build chart dependencies
make chart-deps
# Package chart to dist/
make chart-package
# GitHub Actions on tags build CLI binaries and attach a packaged chart
# (push a tag like v0.1.0 to trigger)