Comment 3 for bug 1472135

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This looks like a flaw in the promote logic of the rabbitmq pacemaker RA.
Logs show that the node-2 was selected for promotion but "something went wrong" ending up in post-promote exited in unexpected state: http://paste.openstack.org/show/VK4oWR2SroyJK5Pvh5WQ/, see lines 18-22.

A corresponding OCF script code is https://github.com/stackforge/fuel-library/blob/master/files/fuel-ha-utils/ocf/rabbitmq#L1415-L1419 and it exited after the line 1455 below. As a result, the post promote exited with running but, it seems, completely broken rabbit app (you can in logs see the list_channels reported error 2).
Other rabbit nodes failed to join this master and failed to operate normally, making the node removal operation to fail as well.

So this situation is definitely a buggy and should be fixed. The post promote should exit with generic error when rabbit app was running but list_channels reported errors.