This stack is for the monitoring servers, endpoints and other various targets using open source tools:
-
Prometheus: It is open source event monitoring and alerting solution. We use it as time series database. It pulls data from various data sources like, blackbox exporter.
-
Blackbox Exporter: This is for blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP. We can set config to hit any endpoint with multiple type of request to know whether it is down, responding slow, get response time etc. More on: https://github.com/prometheus/blackbox_exporter
-
Alertmanager: We need to send alert to different clients like: slack, pagerduty, email etc for the alerts generated by prometheus. Alertmanager eases us by providing interface to interact with clients api and sends alert when something is wrong and when its recovered too. More on: https://github.com/prometheus/alertmanager
-
Grafana: It's for visualizing metrics in time series. Any metrics on prometheus can be represented on graph in a dashboard and we can get comparision of the metrics in certain span of time. More on: https://github.com/grafana/grafana
All of these services are included on docker-compose.yml file so we don't need to install each of these. Just run docker-compose up -d and it will pull the images of version specified and runs on given port.
-
config/blackbox_targets-example.ymlhas the list of urls or endpoints which we monitor. Copy the file toconfig/blackbox_targets.ymland update the target. -
.envfile comprises of grafana credentials
An example of prometheus rules from config/prometheus-rules.yml:
- name: SiteDownName
rules:
- alert: SiteDown
expr: probe_success < 1
for: 30s
labels:
severity: page
type: http
annotations:
identifier: '{{ $labels.job }}'
description: '{{ $labels.instance }} exporter job has been down for more than 30s'
This rule generates alert when any of the sites in blackbox_targets.yml doesn't meet the specified rule i.e. probe_success<1.
URL: http://localhost:3000/login.
User and password as on .env file.