Comment 4 for bug 1437348

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Note for QAm how to test:
0) deploy any HA environment with 3 controllers;
at some controller node issue "pcs resource unmanage master_p_rabbitmq-server"

should not kick alive nodes:
1) at the 1st controller, for example node-1, stop corosync service gracefully
2) at master node check the /var/log/remote/node-*/rabbit-fence.log:
* it should contain info like:
"Got node-1.test.domain.local that left cluster
...
Preparing to fence node rabbit@node-1 from rabbit cluster
... (within a 1 minute) ...
Ignoring alive node rabbit@node-1"
3) at other (not the node-1, where corosync was stopped) controllers check rabbitmq cluster_status:
* it should contain all 3 rabbit nodes running and mentioned as cluster members
4) teardown:
* start stopped corosync service; restart pacemaker service at the same node
* pcs status should show all 3 nodes online within a 1 minute

should kick failed rabbit node only once:
5) at the 1st controller, for example node-1, issue rabbitmqctl stop_app; and stop
corosync service gracefully
6) at master node check the /var/log/remote/node-*/rabbit-fence.log:
* some of the controller node's log should contain info like:
"Got node-1.test.domain.local that left cluster
...
Preparing to fence node rabbit@node-1 from rabbit cluster
... (within a 1 minute) ...
Disconnecting rabbit@node-1
Forgetting cluster node rabbit@node-1"
3) at other (not the node-1, where corosync was stopped) controllers check rabbitmq cluster_status:
* it should contain only 2 rabbit nodes running and mentioned as cluster members (the node-1 should not be listed there)