Comment 21 for bug 1323277

Revision history for this message
Aleksandr Didenko (adidenko) wrote : Re: vip__management recovered with error

When you put br-mgmt down, you literally remove the controller from the cluster, because it can't comunicate with any other node via management network anymore. So commands like "crm_mon -1" on node-1 unfortunately provide no useful info as well as pacemaker logs. All the corosync checks, "nova service-list", etc should be performed on any other remaining controller.

I've checked the snapshot and I see the following records in node-2 crmd.log:

2014-07-04T13:53:35.982239+00:00 warning: warning: reap_dead_nodes: Our DC node (node-1.test.domain.local) left the cluster
2014-07-04T13:53:38.941189+00:00 notice: notice: te_rsc_command: Initiating action 15: start vip__management_old_start_0 on node-2.test.domain.local (local)
2014-07-04T13:53:38.941189+00:00 notice: notice: te_rsc_command: Initiating action 17: start vip__public_old_start_0 on node-4.test.domain.local
2014-07-04T13:53:40.111458+00:00 notice: notice: process_lrm_event: LRM operation vip__management_old_start_0 (call=159, rc=0, cib-update=108, confirmed=true) ok

node-4 crmd.log:

2014-07-04T13:53:35.982245+00:00 warning: warning: reap_dead_nodes: Our DC node (node-1.test.domain.local) left the cluster
2014-07-04T13:53:39.954750+00:00 notice: notice: process_lrm_event: LRM operation vip__public_old_start_0 (call=155, rc=0, cib-update=80, confirmed=true) ok

Also I can see the following in node-4 netstat output (management_vip: 10.108.2.2):

tcp 0 0 10.108.2.6:46448 10.108.2.2:3306 ESTABLISHED 29702/python
tcp 0 0 10.108.2.6:46451 10.108.2.2:3306 ESTABLISHED 4500/python
tcp 0 0 10.108.2.6:39112 10.108.2.2:3306 ESTABLISHED 2058/python
tcp 0 0 10.108.2.6:46879 10.108.2.2:3306 ESTABLISHED 30656/python

So it looks like both vips were successfully migrated to other controllers and node-4 was even able to connect to mysql via management_vip.

If by any chance you still have this env around or you're able to reproduce the issue, please let me know so I could check it on the live env.