Incorrect parent values when using cross-model relations
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Nagios Charm |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Nagios sets a "parents" value in host definitions so that when a host dies and alerts, the containers running on that host do not also alert. This functionality works fine when charm-nagios and charm-nrpe are related within the same model, but this value is never set properly when a cross-model relation is used.
When these two charms are related, the following takes place:
1. Each nrpe unit provides its machine id and hostname and to nagios
2. Nagios extracts the parent's machine id from the unit's id, if it's an lxd
3. If that parent machine id exists in a dictionary of hosts, then the parent relationship is set up
This process can break down in cases where nagios is working with multiple models with overlapping machine ids (which is very likely). As an example, it's possible for machine "3/lxd/0" in model "openstack" to be given machine 3 in the "lma" model as a parent, which it should be machine 3 in the "openstack" model.
Related branches
- Paul Goins: Needs Fixing
- 🤖 prod-jenkaas-bootstack (community): Approve (continuous-integration)
- Zachary Zehring (community): Needs Fixing
- Drew Freiberger (community): Needs Fixing
- James Troup: Pending requested
-
Diff: 153 lines (+51/-16)1 file modifiedhooks/monitors_relation_changed.py (+51/-16)
- Drew Freiberger (community): Approve
- James Troup (community): Approve
- 🤖 prod-jenkaas-bootstack (community): Approve (continuous-integration)
-
Diff: 11 lines (+1/-0)1 file modifiedhooks/nrpe_helpers.py (+1/-0)
- 🤖 prod-jenkaas-bootstack (community): Approve (continuous-integration)
- James Troup (community): Needs Fixing
- Zachary Zehring (community): Needs Fixing
- BootStack Reviewers: Pending requested
-
Diff: 73 lines (+26/-4)1 file modifiedhooks/monitors_relation_changed.py (+26/-4)
summary: |
- Parents are missing from host definitions + Incorrect parent values when using cross-model relations |
description: | updated |
Changed in charm-nagios: | |
status: | New → Fix Released |
milestone: | none → 21.10 |
I think one way to avoid this problem is to use a different nagios_host_context on each model, and have the model name be part of that context.
I see concretely where we've done this for a cloud where keystone is deployed to both openstack and kubernetes models. Hostnames come up as e.g. <customer> -<cloud> -openstack- keystone- 1 and <customer> -<cloud> -k8s-keystone- 1, with no collision.