When you put br-mgmt down, you literally remove the controller from the cluster, because it can't comunicate with any other node via management network anymore. So commands like "crm_mon -1" on node-1 unfortunately provide no useful info as well as pacemaker logs. All the corosync checks, "nova service-list", etc should be performed on any other remaining controller.
I've checked the snapshot and I see the following records in node-2 crmd.log:
2014-07-04T13:53:35.982239+00:00 warning: warning: reap_dead_nodes: Our DC node (node-1.test.domain.local) left the cluster
2014-07-04T13:53:38.941189+00:00 notice: notice: te_rsc_command: Initiating action 15: start vip__management_old_start_0 on node-2.test.domain.local (local)
2014-07-04T13:53:38.941189+00:00 notice: notice: te_rsc_command: Initiating action 17: start vip__public_old_start_0 on node-4.test.domain.local
2014-07-04T13:53:40.111458+00:00 notice: notice: process_lrm_event: LRM operation vip__management_old_start_0 (call=159, rc=0, cib-update=108, confirmed=true) ok
node-4 crmd.log:
2014-07-04T13:53:35.982245+00:00 warning: warning: reap_dead_nodes: Our DC node (node-1.test.domain.local) left the cluster
2014-07-04T13:53:39.954750+00:00 notice: notice: process_lrm_event: LRM operation vip__public_old_start_0 (call=155, rc=0, cib-update=80, confirmed=true) ok
Also I can see the following in node-4 netstat output (management_vip: 10.108.2.2):
tcp 0 0 10.108.2.6:46448 10.108.2.2:3306 ESTABLISHED 29702/python
tcp 0 0 10.108.2.6:46451 10.108.2.2:3306 ESTABLISHED 4500/python
tcp 0 0 10.108.2.6:39112 10.108.2.2:3306 ESTABLISHED 2058/python
tcp 0 0 10.108.2.6:46879 10.108.2.2:3306 ESTABLISHED 30656/python
So it looks like both vips were successfully migrated to other controllers and node-4 was even able to connect to mysql via management_vip.
If by any chance you still have this env around or you're able to reproduce the issue, please let me know so I could check it on the live env.
When you put br-mgmt down, you literally remove the controller from the cluster, because it can't comunicate with any other node via management network anymore. So commands like "crm_mon -1" on node-1 unfortunately provide no useful info as well as pacemaker logs. All the corosync checks, "nova service-list", etc should be performed on any other remaining controller.
I've checked the snapshot and I see the following records in node-2 crmd.log:
2014-07- 04T13:53: 35.982239+ 00:00 warning: warning: reap_dead_nodes: Our DC node (node-1. test.domain. local) left the cluster 04T13:53: 38.941189+ 00:00 notice: notice: te_rsc_command: Initiating action 15: start vip__management _old_start_ 0 on node-2. test.domain. local (local) 04T13:53: 38.941189+ 00:00 notice: notice: te_rsc_command: Initiating action 17: start vip__public_ old_start_ 0 on node-4. test.domain. local 04T13:53: 40.111458+ 00:00 notice: notice: process_lrm_event: LRM operation vip__management _old_start_ 0 (call=159, rc=0, cib-update=108, confirmed=true) ok
2014-07-
2014-07-
2014-07-
node-4 crmd.log:
2014-07- 04T13:53: 35.982245+ 00:00 warning: warning: reap_dead_nodes: Our DC node (node-1. test.domain. local) left the cluster 04T13:53: 39.954750+ 00:00 notice: notice: process_lrm_event: LRM operation vip__public_ old_start_ 0 (call=155, rc=0, cib-update=80, confirmed=true) ok
2014-07-
Also I can see the following in node-4 netstat output (management_vip: 10.108.2.2):
tcp 0 0 10.108.2.6:46448 10.108.2.2:3306 ESTABLISHED 29702/python
tcp 0 0 10.108.2.6:46451 10.108.2.2:3306 ESTABLISHED 4500/python
tcp 0 0 10.108.2.6:39112 10.108.2.2:3306 ESTABLISHED 2058/python
tcp 0 0 10.108.2.6:46879 10.108.2.2:3306 ESTABLISHED 30656/python
So it looks like both vips were successfully migrated to other controllers and node-4 was even able to connect to mysql via management_vip.
If by any chance you still have this env around or you're able to reproduce the issue, please let me know so I could check it on the live env.