HA doesn't work: pacemaker doesn't detect that several controllers gone down

Bug #1434471 reported by Timur Nurlygayanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Critical
Fuel Library (Deprecated)

Bug Description

Note: bug was found on MOS 6.1, but looks like it can be reproduced on MOS 5.x and 6.x releases as well.
It is reproduced for VirtualBox environment but looks like it can be reproduced for KVM/baremetal as well.
The diagnostic snapshot is available by the link: https://yadi.sk/d/4am7qbV_fP7NU

Steps To Reproduce:
1. Take the fresh MOS image (in my case in was 202 image: http://mc0n2-msk.msk.mirantis.net/fuelweb-iso/fuel-6.1-202-2015-03-16_22-54-44.iso)
2. Deploy OpenStack cloud with the following configuration: Ubuntu, HA, 3 controllers, 1 compute, Neutron VLAN, Swift file storage backend. (using VirtualBox scripts)
3. Shutdown 2 controllers (to make sure that it will be reproduced, let's shutdown primary and non-primary controller). Example for VirtualBox:
VBoxManage controlvm "fuel-slave-1" poweroff
VBoxManage controlvm "fuel-slave-2" poweroff
4. Wait 10 minutes
5. Try to open Horizon dashboard. It will not available, 404 code.
6. Login to the existing controller node and try to run any OpenStack CLI commands, it will fail:
source openrc ; keystone user-lists
7. Check status of OpenStack services in pacemaker:
pcs resource

Observed Result:
Cluster doesn't work at all: we can't access Horzion dashboard (with 404 code), OpenStack CLI commands doesn't work (with 500 code), but pcs shown that 'all good':

________
root@node-2:~# pcs resource
 vip__public_vrouter (ocf::fuel:ns_IPaddr2): Started
 vip__management_vrouter (ocf::fuel:ns_IPaddr2): Started
 vip__public (ocf::fuel:ns_IPaddr2): Started
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-1 node-2 node-3 ]
 vip__management (ocf::fuel:ns_IPaddr2): Started
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_dns [p_dns]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_ntp [p_ntp]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_mysql [p_mysql]
     p_mysql (ocf::fuel:mysql-wss): Started FAILED
     Started: [ node-1 node-3 ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-2 ]
     Slaves: [ node-1 node-3 ]
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_neutron-dhcp-agent [p_neutron-dhcp-agent]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-1 node-2 node-3 ]
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Started: [ node-1 node-2 node-3 ]
________

Tags: ha pacemaker
summary: - HA doesn't work: pacemaker doesn't detect that several controller go
+ HA doesn't work: pacemaker doesn't detect that several controllers go
down
summary: - HA doesn't work: pacemaker doesn't detect that several controllers go
+ HA doesn't work: pacemaker doesn't detect that several controllers gone
down
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
status: Confirmed → Invalid
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

According to the HA Reference architecture http://docs.mirantis.com/fuel/fuel-6.0/reference-architecture.html#openstack-environment-architecture, this test case is invalid. When you deploy 3 controllers, you must maintain a quorum, which is 2 nodes, in order to keep your cluster operate. That means that the failover procedure can succeed only for 3-1 case, but will fail for 3-2 case.

Note, we have an unresolved documentation bug about missing supported failover cases https://bugs.launchpad.net/fuel/+bug/1326605

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.