Prometheus
- 2nd project graduated from CNCF (just after K8S). First developed by Soundcloud.
- not a logging / event-based tool, not a tracing tool
- it's a metrics / alerting tool (time series)
- pull-based model: scrapes
/metrics
endpoints - created for dynamic / scalable environments
Architecture
- Prometheus server
- Jobs/Exporters : endpoints to scrape. Exporters are adapters from xxx format to prom format (most popular: node exporter, blackbox exporter)
- Grafana (where Prometheus is used as a datasource)
Notes
CNCF - How to Export Prometheus Metrics from Just About Anything - creating your own exporters in Go
Scrape configuration
= prometheus.yml
Job: a type of server to scrape. If multiple instances of the server, please create multiple targets in the job.
Target: ip/url where the /metrics
endpoint can be found
Metrics format
//<metric_name>{<metadata>}
traefik_config_reloads_total{instance="reverse_proxy:8080", job="traefik"}
Metrics types
4 types of metrics: Counter, Gauge, Histogram ,Summary
rate: https://www.innoq.com/en/blog/prometheus-counters/
https://medium.com/platform-engineering/monitoring-traefik-with-grafana-1d037af5b952 https://docs.signalfx.com/en/latest/integrations/agent/monitors/traefik.html https://prometheus.io/docs/practices/histograms/
Operating
Prometheus weak point
NO clustering, "just run 2 instances!"
local storage: non durable, only a few weeks/months of data
Service discovery
choose your strategy: kube, dns A records...