Yes, I can see now there were few events after which it seems something became very broken in the pacemaker cluster, although looks and being reported healthy:
/var/log/remote/node-6.domain.tld/crmd.log:2015-08-27T11:15:52.760185+00:00 notice: notice: peer_update_callback: Our peer on the DC (node-1.domain.tld) is dead
/var/log/remote/node-6.domain.tld/crmd.log:2015-08-27T11:16:22.346080+00:00 warning: warning: reap_dead_nodes: Our DC node (node-7.domain.tld) left the cluster
And probably w/o STONITH enabled this situation could lead to such type of bugs. We probably should address this in the ops guide
Yes, I can see now there were few events after which it seems something became very broken in the pacemaker cluster, although looks and being reported healthy:
/var/log/ remote/ node-6. domain. tld/crmd. log:2015- 08-27T11: 15:52.760185+ 00:00 notice: notice: peer_update_ callback: Our peer on the DC (node-1.domain.tld) is dead remote/ node-6. domain. tld/crmd. log:2015- 08-27T11: 16:22.346080+ 00:00 warning: warning: reap_dead_nodes: Our DC node (node-7.domain.tld) left the cluster
/var/log/
And probably w/o STONITH enabled this situation could lead to such type of bugs. We probably should address this in the ops guide