Fuel for OpenStack

Ubuntu HA with neutron, rabbit cluster is completelly broken after failovers

Bug #1401956 reported by Tatyanka on 2014-12-12

This bug report is a duplicate of: Bug #1396946: Rabbitmq OCF script requires additional criteria to be met for Master/Slave statuses. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	New	Critical	Fuel Library (Deprecated)	Fuel for OpenStack 6.0

Bug Description

{"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "auth_required": true, "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}}}, "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}

Steps:
1. Deploy Ubuntu ha with neeutron: 3 controllers, 2 computes
2. When cluster ready - run ostf (it passed)
3. Turn off non primary controller, wait while cluster recovers and run ostf (ostf -passed)
4. Turn on non primary controller, sync time on it(if needed) wait while cluster recovers, run ostf (it passed again)
5. Turn of primary controller
6. Wait 20 minutes
7. Run ostf ha suit

Actual result:
test on rabbit failed
crm_mon -1 says next:
Online: [ node-4 node-5 ]
OFFLINE: [ node-1 ]

vip__public (ocf::mirantis:ns_IPaddr2): Started node-4
Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-4 node-5 ]
vip__management (ocf::mirantis:ns_IPaddr2): Started node-5
Clone Set: clone_p_heat-engine [p_heat-engine]
     Started: [ node-4 node-5 ]
Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Slaves: [ node-4 node-5 ]
Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-4 node-5 ]
p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-4
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-4 node-5 ]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-4 node-5 ]
Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-4 node-5 ]
Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-4 node-5 ]

root@node-5:/var/log/rabbitmq# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-5' ...
Error: unable to connect to node 'rabbit@node-5': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@node-5']

rabbit@node-5:
  * connected to epmd (port 4369) on node-5
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on node-5
  * suggestion: start the node

current node details:
- node name: 'rabbitmqctl18364@node-5'
- home dir: /var/lib/rabbitmq
- cookie hash: soeIWU2jk2YNseTyDSlsEA==

status on node-4
[root@nailgun ~]# ssh node-4
Warning: Permanently added 'node-4' (RSA) to the list of known hosts.
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.13.0-40-generic x86_64)

* Documentation: https://help.ubuntu.com/
New release '14.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Fri Dec 12 14:42:32 2014 from 10.120.0.2
root@node-4:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-4' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-4','rabbit@node-5']}]},
{running_nodes,['rabbit@node-4']},
{cluster_name,<<"<email address hidden>">>},
{partitions,[]}]
...done.
root@node-4:~#