A comprehensive, Docker-based monitoring solution for modern infrastructure and applications. This stack provides full observability with metrics, logs, and alerts using industry-standard open-source tools.
This project delivers a complete monitoring infrastructure as code, allowing you to quickly deploy a production-ready monitoring solution. The stack includes:
- Metrics collection: Prometheus, Node Exporter, cAdvisor, Telegraf
- Log aggregation: Loki, Promtail
- Alerting: Alertmanager with email and Slack integration
- Visualization: Grafana with pre-configured dashboards
- Endpoint monitoring: Blackbox Exporter for HTTP/HTTPS/TCP checks
This monitoring solution is designed to provide immediate visibility into your infrastructure while remaining highly customizable to meet specific requirements.
- Zero-configuration deployment - Works out of the box with sensible defaults
- Environment-based configuration - Easily customize via
.envfile - Template-based configuration files - All configuration files use templates for easy customization
- Comprehensive metrics collection - From system metrics to container stats
- Centralized logging - Aggregate and search logs from all systems
- Multi-channel alerting - Email, Slack, and more
- Pre-built dashboards - Hit the ground running with ready-to-use dashboards
- Secure by default - Authentication enabled for all components
- Docker-compose deployment - Simple to deploy and manage
- Development-friendly - Includes MailHog for testing email alerts locally
- Docker Engine (19.03.0+)
- Docker Compose (1.27.0+)
- 2GB+ RAM recommended
- 10GB+ disk space
-
Clone the repository
git clone https://github.com/amirk1998/monitoring-stack.git cd devops-monitoring-stack -
Configure your environment
cp .env.example .env # Edit .env file with your preferred settings -
Generate configuration files
./setup-config.sh
-
Launch the stack
docker-compose up -d
-
Access the dashboards
- Grafana: http://localhost:3000 (default credentials: admin/ChangeMe123!)
- Prometheus: http://localhost:9090
- Alertmanager: http://localhost:9093
- MailHog (development only): http://localhost:8025
| Component | Description | Port |
|---|---|---|
| Prometheus | Time-series database and metrics collector | 9090 |
| Grafana | Visualization and dashboarding platform | 3000 |
| Alertmanager | Alert handling and routing | 9093 |
| Component | Description | Port |
|---|---|---|
| Node Exporter | Host system metrics (CPU, memory, disk, network) | 9100 |
| cAdvisor | Container metrics and resource usage | 8080 |
| Blackbox Exporter | Probes endpoints over HTTP, HTTPS, DNS, TCP | 9115 |
| Telegraf | Pluggable metrics collection agent | 9273 |
| Component | Description | Port |
|---|---|---|
| Loki | Log aggregation system | 3100 |
| Promtail | Log collector and forwarder | - |
| Component | Description | Port |
|---|---|---|
| MailHog | SMTP testing server with web interface | 1025, 8025 |
.
βββ alertmanager/ # Alertmanager configuration
βββ blackbox_exporter/ # Blackbox Exporter configuration
βββ grafana/ # Grafana dashboards and datasources
βββ loki/ # Loki configuration
βββ prometheus/ # Prometheus configuration and rules
β βββ alerts/ # Alert rules
β βββ ...
βββ promtail/ # Promtail configuration
βββ telegraf/ # Telegraf configuration
βββ docker-compose.yml # Service definitions
βββ .env.example # Example environment variables
βββ setup-config.sh # Configuration generator script
βββ README.md # This file
The .env file controls key aspects of the monitoring stack:
- Service ports
- Credentials
- Alerting channels
- Retention settings
- Resource limits
See .env.example for all available options.
All configuration files use templates (.yml.template, .conf.template) that are processed during setup:
- Values from the
.envfile are substituted - Final configuration files are generated
- Changes to templates require running
setup-config.shagain
The stack comes with several pre-configured dashboards:
| Dashboard | Description |
|---|---|
| Node Exporter Overview | Host-level metrics (CPU, memory, disk, network) |
| Docker Containers | Container metrics from cAdvisor |
| Prometheus Stats | Prometheus performance and health |
| Alertmanager Overview | Alert status and history |
| Loki Logs | Log exploration and search |
To add custom dashboards:
- Export dashboard JSON from Grafana
- Place in
grafana/provisioning/dashboards/ - Update
grafana/provisioning/dashboards/dashboard.ymlif needed - Restart Grafana:
docker-compose restart grafana
- Email: Configure via SMTP settings in
.env - Slack: Configure via webhook URL in
.env - Other integrations: Can be added in
alertmanager/alertmanager.yml.template
- Default rules are in
prometheus/alerts/custom_alerts.yml - Add new rules by creating files in
prometheus/alerts/ - Rules are automatically picked up by Prometheus
Example alert rule:
groups:
- name: host
rules:
- alert: HighCpuLoad
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: High CPU load (instance {{ $labels.instance }})
description: CPU load is > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}- Add a new section to
prometheus/prometheus.yml.template:
- job_name: 'new-service'
static_configs:
- targets: ['new-service:9090']- Run
./setup-config.shto regenerate configurations - Restart Prometheus:
docker-compose restart prometheus
- Add the exporter to
docker-compose.yml:
custom-exporter:
image: custom-exporter:latest
ports:
- '9999:9999'
networks:
- monitoring- Add a scrape configuration to
prometheus/prometheus.yml.template - Run
./setup-config.sh - Restart the stack:
docker-compose up -d
- Grafana: Protected by username/password (configured in
.env) - Basic auth can be enabled for other components by editing their respective config templates
- Default configuration exposes ports to host
- For production, consider:
- Using a reverse proxy with TLS
- Implementing network isolation
- Setting up firewall rules
- Change all default passwords
- Enable TLS for all connections
- Use Docker secrets or Kubernetes secrets for sensitive values
- Implement proper backup for data volumes
- Loki fails to start: Ensure schema and index type configuration match (see loki-config.yml)
- Prometheus can't scrape targets: Check network connectivity and firewall rules
- Grafana doesn't show data: Verify data source configuration and test connection
- Alerts not sending: Check SMTP or webhook configuration
View logs for any service:
docker-compose logs -f [service_name]Example:
docker-compose logs -f prometheus
docker-compose logs -f lokiTo update the stack to the latest images:
docker-compose pull
docker-compose up -dBack up configuration and data:
# Configuration
tar -czvf config-backup.tar.gz */*.yml */*.conf
# Data volumes
docker run --rm -v prometheus_data:/data -v $(pwd):/backup alpine tar -czvf /backup/prometheus-data.tar.gz /data
docker run --rm -v grafana_data:/data -v $(pwd):/backup alpine tar -czvf /backup/grafana-data.tar.gz /data
docker run --rm -v loki_data:/data -v $(pwd):/backup alpine tar -czvf /backup/loki-data.tar.gz /dataContributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run the tests (if any)
- Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.