Aller au contenu

Prometheus

  • 2nd project graduated from CNCF (just after K8S). First developed by Soundcloud.
  • not a logging / event-based tool, not a tracing tool
  • it's a metrics / alerting tool (time series)
  • pull-based model: scrapes /metrics endpoints
  • created for dynamic / scalable environments

Architecture

  • Prometheus server
  • Jobs/Exporters : endpoints to scrape. Exporters are adapters from xxx format to prom format (most popular: node exporter, blackbox exporter)
  • Grafana (where Prometheus is used as a datasource)

Scrape configuration

= prometheus.yml

Job: a type of server to scrape. If multiple instances of the server, please create multiple targets in the job.

Target: ip/url where the /metrics endpoint can be found

Metrics format

//<metric_name>{<metadata>}
traefik_config_reloads_total{instance="reverse_proxy:8080", job="traefik"}

Metrics types

OpenTelemetry

4 types of metrics: Counter, Gauge, Histogram ,Summary

rate: https://www.innoq.com/en/blog/prometheus-counters/

https://medium.com/platform-engineering/monitoring-traefik-with-grafana-1d037af5b952 https://docs.signalfx.com/en/latest/integrations/agent/monitors/traefik.html https://prometheus.io/docs/practices/histograms/

Operating

Prometheus weak point

NO clustering, "just run 2 instances!"

local storage: non durable, only a few weeks/months of data

Service discovery

choose your strategy: kube, dns A records...