nrpe default thresholds are too lax
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Fix Committed
|
Low
|
Unassigned |
Bug Description
The current default tuning of the check_ceph_status nrpe check is quite lax, which makes it hard for that check to go critical.
I think charms should however (in general) default to deploying rather noisy checks, and let the operators tune them down based on their environment instead of the opposite. This is for two reasons:
1. be secure by default / opt _into_ insecurity
2. as an operator, you may never learn a check exists if you don't get alerted by it
Specifically, I would propose the following changes in defaults:
nagios_
nagios_
nagios_
I think nagios_
Picking one-size-fits-all thresholds for these checks is borderline impossible due to widely varying usecases. Having said that agreed that these defaults
nagios_ degraded_ thresh: from 1 to 0.1 misplaced_ thresh: from 1 to 0.1
nagios_
are way low for almost any usecase, might as well have no monitoring at all.
A case could be made for lax recovery rate check as many clouds would prefer client traffic over recovery though.