Nagios not reloading due to config errors after dedupe patch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Nagios Charm |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I've found on one particular environment that the new nagios charm's dedupe logic isn't quite working as intended.
Environment before upgrade was:
* Dupes observed across ceph-osd and nova-compute-kvm. Both are deployed to the same metals, and each app has its own nrpe subordinate.
* Upgraded nagios first to cs:nagios-46, waited for it to settle, then upgraded the nrpe apps to cs:nrpe-75. Not sure re: original revisions, but they were before dedupe logic was added to both.
Expected behavior: multiple entries for the same host, each with a unique prefix.
Observed behavior:
* Some entries with the old host entry, some additional entries with the prefix.
* Some records have a parent listed with the prefixed ID, however the parent record doesn't appear to exist on disk, thus nagios refuses to reload. (This also has the side effect of causing hooks take awhile to rerun because of attempting to wait for a nagios reload.)
Workaround: wait for all the hooks to run (even if they take awhile), then run the rewrite-peer-config action to do a clean rewrite of the config.
Note: after running rewrite-peer-config (and without checking the nagios status beforehand, unfortunately), the duplicate records disappeared - instead, I have merged host records with no duplicate prefixes. Thus, whatever happened was likely a side effect of in-between state. Definitely a bug, but the rewrite-peer-config appears to have got things to a good state in the end.