Add nrpe check for related ovn-chassis units not present in 'ovn-sbctl show'

Bug #1929838 reported by Drew Freiberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-central
Triaged
Wishlist
Unassigned

Bug Description

We have witnessed issues with communication between the OVS sb_db and ovn-chassis units on a number of occasions. This is most easily witnessed with dead "XXX" 'openstack network agent list' entries, however, when this happens, sometimes there is also an issue with some of the ovn-controllers (ovn-chassis service) not being present in the southbound database.

It would be useful for visualization of Southbound communication status with ovn-chassis units to have a check that checks the list of related ovn-chassis FQDNs against the list of present Chassis members in the southbound database query "ovn-sbctl show".

For instance, you might have 3 related units on the ovsdb relation with private-addresses of
10.0.0.1,ovn-chassis/0
10.0.0.2,ovn-chassis/1
10.0.0.3,ovn-chassis/2

In ovn-sbctl show | grep ip:, you may only see:
ip: 10.0.0.1
ip: 10.0.0.2

It would then be helpful for operational visibility to see an alert such as:
"Critical: ovn-chassis/2 with ip: 10.0.0.3 not found in the southbound database"

We have resolved this issue in the past by pausing the ovn-central unit holding the sb_db and the failover of sb_db picks back up all ovn-chassis units.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

correction, the encap geneve IP does not match the private-address. This would likely require updating the ovsdb relation to pass the fqdn to match with the Chassis names in the sb_db.

Perhaps a simple alert would be to count the number of chassis entries in the sb_db and alert if it doesn't match the number of related ovsdb units.

Changed in charm-ovn-central:
importance: Undecided → Wishlist
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.