This project aims to create a secure and observable Kubernetes environment using Kind (Kubernetes in Docker). It integrates monitoring tools like Grafana and Prometheus for real-time visibility, and Tetragon for enhanced security.
The core functionality involves deploying a CLSTM (Convolutional Long Short-Term Memory) model to detect anomalies in network traffic. When anomalies are identified, the system triggers alerts through Slack and Firebase, and enforces rate limiting policies to mitigate potential threats.
Key Features:
- Kind Cluster Setup: Easily reproducible Kubernetes environment.
- Monitoring: Integrated Grafana and Prometheus stack for comprehensive monitoring.
- Security: Tetragon for eBPF-based security observability and runtime enforcement.
- Anomaly Detection: CLSTM model for identifying suspicious network traffic patterns.
- Alerting: Real-time notifications via Slack and Firebase.
- Rate Limiting: Automated enforcement of rate limits upon anomaly detection.
The system's architecture is designed around a Kind Kubernetes cluster and several integrated components:
- Kind Cluster: A local Kubernetes cluster with one control plane node and two worker nodes. This setup provides a lightweight and developer-friendly environment for deploying and managing containerized applications.
- Monitoring Stack:
- Prometheus: Deployed within the cluster to collect metrics from various Kubernetes components and applications.
- Grafana: Used for visualizing the metrics collected by Prometheus, providing dashboards for monitoring cluster health and application performance.
- Security Enforcement:
- Tetragon: Leverages eBPF for real-time security observability and runtime enforcement. It monitors system calls and network activity to detect and prevent malicious behavior.
- Traffic Analysis and Anomaly Detection:
- Traffic Mirroring (cicflowmeter): Network traffic within the cluster is mirrored and processed by cicflowmeter. This tool captures network flows and extracts relevant features for analysis.
- CLSTM Model: The extracted flow data is fed into a Convolutional Long Short-Term Memory (CLSTM) model. This deep learning model is trained to identify anomalous patterns in network traffic that may indicate security threats or system misconfigurations.
- Alerting Mechanism:
- Slack Integration: Upon detection of an anomaly, alerts are sent to a designated Slack channel for immediate notification of the operations team.
- Firebase Integration: Alerts and relevant event data are also sent to Firebase, potentially for persistent storage, further analysis, or integration with other services.
- Backend APIs: A set of backend APIs (likely running within the Kubernetes cluster) manage the overall workflow, including:
- Receiving data from the CLSTM model.
- Triggering alerts to Slack and Firebase.
- Initiating rate-limiting actions based on detected anomalies.
- Interfacing with Tetragon for security policy enforcement.
Before you begin, ensure you have the following tools installed on your system:
- Docker: Required to run Kind (Kubernetes in Docker). (Installation Guide)
- Kind: A tool for running local Kubernetes clusters using Docker container "nodes". (Installation Guide)
- kubectl: The Kubernetes command-line tool, used to interact with your cluster. (Installation Guide)
- Helm: The package manager for Kubernetes, used to deploy and manage applications. (Installation Guide)
-
Install Prerequisites: Ensure all tools listed in the Prerequisites section are installed and configured correctly on your system.
-
Set Up the Kind Cluster: This project uses a Kind cluster defined in
kind-config.yaml. To create the cluster, run the following command from the root of the repository:kind create cluster --config kind-config.yaml
This will set up a Kubernetes cluster with one control plane and two worker nodes.
-
Deploy Grafana and Prometheus: The monitoring stack (Grafana and Prometheus) is deployed using Helm.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update kubectl create namespace monitoring helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring -f monitoring/values.yaml
-
Install Tetragon: Tetragon is used for eBPF-based security observability and runtime enforcement. Install it using Helm:
helm repo add cilium https://helm.cilium.io helm repo update helm install tetragon cilium/tetragon -n kube-system -f tetragon/values.yaml
-
Apply Tetragon Policies: Apply the necessary Tetragon TracingPolicies to monitor for security events. The specific policies are located in the
tetragon/policiesdirectory.kubectl apply -f tetragon/policies/
(Note: You may need to adjust the path if your policies are in a different subdirectory or if you want to apply them individually.)
-
Set Up Traffic Mirroring and Anomaly Detection Server: The traffic mirroring and anomaly detection server (
Traffic_Mirroring/server.py) captures network traffic usingcicflowmeter, processes it, and uses a CLSTM model to detect anomalies.- Firebase Credentials: Place your Firebase service account key file named
service-key.jsonin theTraffic_Mirroring/directory. This is required for sending alerts to Firebase. - Slack Webhook: Set the
SLACK_WEBHOOK_URLenvironment variable to your Slack incoming webhook URL.export SLACK_WEBHOOK_URL="YOUR_SLACK_WEBHOOK_URL_HERE"
- Running the Server: The server needs to be run with sudo privileges because
cicflowmeterrequires root access to capture network packets.cd Traffic_Mirroring/ sudo python3 server.py
- Firebase Credentials: Place your Firebase service account key file named
-
Set Up and Run Backend API Server: The backend API server (
Website/app/main.py) provides endpoints for managing detected anomalies, interacting with Firebase, and potentially other administrative tasks.- Firebase Credentials: Ensure your Firebase service account key file named
service-key.jsonis present in theWebsite/app/directory (or update the path inWebsite/app/main.pyif you place it elsewhere). - Running the Server: The server is started using the
start-server.shscript located in theWebsite/directory. This script also handles setting up Telebit to expose the local server to the internet for testing or remote access.cd Website/ ./start-server.sh
(Note: Review
start-server.shfor any specific configurations related to Telebit or other environment settings you might need to adjust.) - Firebase Credentials: Ensure your Firebase service account key file named
The end-to-end workflow of the system is as follows:
- Traffic Ingress: Network traffic enters the Kubernetes cluster, typically through an Ingress controller, and is routed to the appropriate services.
- Traffic Capture: The
Traffic_Mirroring/server.pyscript, running with sudo privileges, utilizescicflowmeterto capture network traffic from the cluster's network interfaces (e.g.,eth0or a specific Kind bridge interface). - Data Processing and Anomaly Detection:
- Captured traffic data is processed to extract relevant features.
- These features are then fed into the pre-trained CLSTM model (
Models/CLSTM/model.py) for anomaly detection.
- Anomaly Response: If the CLSTM model identifies an anomaly:
- Slack Alert: An alert notification is sent to a pre-configured Slack channel using the
SLACK_WEBHOOK_URL. - Firebase Alert: Details of the anomaly are stored in the Firebase Realtime Database for persistence and potential further analysis.
- Rate Limiting: An automated process attempts to enforce rate limiting on the Nginx Ingress Controller. This is typically done by patching the Ingress resource associated with the affected service to include rate-limiting annotations.
- Slack Alert: An alert notification is sent to a pre-configured Slack channel using the
- Continuous Monitoring and Security:
- Tetragon: Provides continuous eBPF-based security observability and runtime enforcement. It monitors kernel-level events and can block malicious activities based on the applied TracingPolicies.
- Prometheus & Grafana: Collect and visualize metrics from the cluster, applications, and network, offering insights into system health and performance.
- Backend API Interaction:
- The backend API server (
Website/app/main.py), exposed via Telebit if enabled, provides RESTful endpoints. - These endpoints can be used to:
- View detected anomalies and alerts from Firebase.
- Potentially manage or update Tetragon policies (depending on API capabilities).
- Monitor overall system status and cluster information.
- Manually trigger or adjust rate-limiting configurations.
- The backend API server (
The Website/app/main.py application serves the following key API endpoints:
-
/api/login(POST):- Purpose: Authenticates users. Expects username and password in the request body.
- Request: JSON body with
usernameandpassword. - Response: JWT token upon successful authentication or an error message.
-
/get-alerts(GET):- Purpose: Fetches anomaly alerts from the Firebase Realtime Database.
- Response: JSON array of alert objects.
-
/api/events(GET):- Purpose: Retrieves Tetragon security events.
- Response: JSON array of Tetragon event objects.
-
/api/detect-smells(POST):- Purpose: Analyzes a submitted Kubernetes YAML manifest file for potential misconfigurations or "code smells."
- Request: Expects a YAML file (
file) in a multipart/form-data request. - Response: JSON object detailing detected smells or a success message.
-
/api/enforce-policy(POST):- Purpose: Applies a new Tetragon TracingPolicy to the Kubernetes cluster.
- Request: JSON body containing the Tetragon policy definition.
- Response: Success or error message.
-
/api/get-policies(GET):- Purpose: Retrieves all currently applied Tetragon TracingPolicies from the cluster.
- Response: JSON array of policy objects.
-
/api/delete-policy(DELETE):- Purpose: Deletes a Tetragon TracingPolicy from the cluster by its name.
- Request: JSON body with
policy_name. - Response: Success or error message.
-
/revert_rate_limit(POST):- Purpose: Removes or reverts rate-limiting configurations from the Nginx Ingress Controller for a specified ingress.
- Request: JSON body likely including
ingress_nameandnamespace. - Response: Success or error message.
-
/apply_rate_limit(POST):- Purpose: Applies rate-limiting configurations to the Nginx Ingress Controller for a specified ingress.
- Request: JSON body likely including
ingress_name,namespace, and rate limit parameters (e.g.,rpm,burst). - Response: Success or error message.
-
/check_rate_limit(GET):- Purpose: Checks the current rate-limiting status for a specified ingress on the Nginx Ingress Controller.
- Request: Query parameters likely including
ingress_nameandnamespace. - Response: JSON object detailing the current rate limit settings.
-
/cluster-info(GET):- Purpose: Retrieves general information about the Kubernetes cluster, such as node status, version, etc.
- Response: JSON object with cluster details.
-
/api/service-map(GET):- Purpose: Provides a map or list of services, deployments, and pods within the Kubernetes cluster, showing their relationships.
- Response: JSON object representing the cluster resource map.
-
/api/scan-image(POST):- Purpose: Scans a specified container image for known vulnerabilities using Trivy.
- Request: JSON body with
image_name. - Response: JSON report of vulnerabilities found in the image.
Contributions are welcome! If you have suggestions for improvements or new features, please feel free to:
- Open an issue to discuss the change.
- Submit a pull request with your contribution.
We appreciate your help in making this project better.

