NRPE check for detecting RabbitMQ split brain

Bug #1902791 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
New
Undecided
Unassigned

Bug Description

We may want to add a check which would check logs (or elsewhere) to detect whether RabbitMQ is potentially running in a split-brain situation.

We saw a situation on a customer cloud where the cluster looked OK per "rabbitmqctl cluster_status", however messages were being dropped (e.g. "Discarding message {'$gen_call',{<0.1234.0>,#Ref<0.1231231231.4564564564.123456>},stat} from <0.1234.0> to <0.56789.9876> in an old incarnation (1) of this node (2)"). We ultimately needed to shut everything down and rebuild the cluster.

An alert for this would save time in diagnosing the problem.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.