Why Prometheus + Grafana over other monitoring options?

silmarine@discuss.tchncs.de · 4 months ago

Why Prometheus + Grafana over other monitoring options?

Max-P@lemmy.max-p.me · 4 months ago

Separate components that do one thing and only that thing and does it well are good. Extra containers are basically free.

The exporters provide the metrics. They can be standalone executables like the node exporter, can also be included in apps themselves easily since it’s just HTTP. It’s trivial to add metrics to just about anything without needing extra ports. Its protocol is also easier and more efficient than SNMP.
Prometheus scrapes those metrics and stores it into its database. In other apps that’d be the role things like PostgreSQL have: you don’t really use it directly, but it’s no less important.
Grafana is the frontend you slap in front of Prometheus to actually display your metrics.
Alertmanager looks at the metrics and sends alerts. It’s separate because if your Prometheus box goes down, how are you gonna be alerted of that?

All 4 of those can be swapped with something else equivalent and it all still works. Don’t like the UI? Replace Grafana. Don’t like Prometheus? There’s VictoriaMetrics and InfluxDB

It looks silly on a small scale, but it scales up very well. Couple hundred VMs per Prometheus install, node exporters on every VM and a single Grafana cluster to visualize the data for the whole infrastructure at once.

That makes it all well liked in enterprise which means there are exporters for damn near anything (even the Lemmy server has a built-in exporter I can scrape with Prometheus), which in turn makes it the easy solution for self-hosters too, and here we are.

I feel like it’s easier to set up than some of the all in one solutions I’ve used previously, despite being several components. They’re all components that basically just work out of the box.