Skip to content

DevOps-Kathmandu/monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Monitoring With Prometheus Stack

This stack is for the monitoring servers, endpoints and other various targets using open source tools:

  1. Prometheus: It is open source event monitoring and alerting solution. We use it as time series database. It pulls data from various data sources like, blackbox exporter.

  2. Blackbox Exporter: This is for blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP. We can set config to hit any endpoint with multiple type of request to know whether it is down, responding slow, get response time etc. More on: https://github.com/prometheus/blackbox_exporter

  3. Alertmanager: We need to send alert to different clients like: slack, pagerduty, email etc for the alerts generated by prometheus. Alertmanager eases us by providing interface to interact with clients api and sends alert when something is wrong and when its recovered too. More on: https://github.com/prometheus/alertmanager

  4. Grafana: It's for visualizing metrics in time series. Any metrics on prometheus can be represented on graph in a dashboard and we can get comparision of the metrics in certain span of time. More on: https://github.com/grafana/grafana

Getting these stacks running with docker

All of these services are included on docker-compose.yml file so we don't need to install each of these. Just run docker-compose up -d and it will pull the images of version specified and runs on given port.

Adding Variables

  • config/blackbox_targets-example.yml has the list of urls or endpoints which we monitor. Copy the file to config/blackbox_targets.yml and update the target.

  • .env file comprises of grafana credentials

Prometheus Rules

An example of prometheus rules from config/prometheus-rules.yml:

- name: SiteDownName
  rules:
  - alert: SiteDown
    expr: probe_success < 1
    for: 30s
    labels:
      severity: page
      type: http
    annotations:
      identifier: '{{ $labels.job }}'
      description: '{{ $labels.instance }} exporter job has been down for more than 30s'

This rule generates alert when any of the sites in blackbox_targets.yml doesn't meet the specified rule i.e. probe_success<1.

Accessing Current Grafana Dashboard

URL: http://localhost:3000/login. User and password as on .env file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages