Feature request: network throughput alerting
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
NRPE Charm |
Won't Fix
|
Wishlist
|
Unassigned |
Bug Description
It's been suggested that we may want to add alerts for detecting excessive throughput on interfaces, as this could be a symptom of a problem which requires attention.
Concrete examples I can think of:
* Interface has a max throughput of 2 Gbps, and an immediate throughput measurement spikes to 95% of that value.
* Interface has a max throughput of 2 Gbps, and sustained average throughput for the last 5 minutes is 90% of that value.
I think the latter of the two is more valuable and less likely to be an unnecessarily noisy alert, but maybe both have value? Or maybe there's other ideas as well?
I'm suggesting this as a general check for charm-nrpe that may examine metrics directly available on the host, although I'd be open to this being implemented as a check which queries Prometheus. However, this is intended as a check which would work with the current nagios/nrpe charms, regardless of the exact implementation.
I'm marking this as a won't fix for nrpe, but I'll forward this to the observability team as good addition to the (future) grafana agent machine charm.