Ubuntu HA with neutron, rabbit cluster is completelly broken after failovers

Bug #1401956 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
Critical
Fuel Library (Deprecated)

Bug Description

{"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "auth_required": true, "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}}}, "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}

Steps:
1. Deploy Ubuntu ha with neeutron: 3 controllers, 2 computes
2. When cluster ready - run ostf (it passed)
3. Turn off non primary controller, wait while cluster recovers and run ostf (ostf -passed)
4. Turn on non primary controller, sync time on it(if needed) wait while cluster recovers, run ostf (it passed again)
5. Turn of primary controller
6. Wait 20 minutes
7. Run ostf ha suit

Actual result:
test on rabbit failed
crm_mon -1 says next:
Online: [ node-4 node-5 ]
OFFLINE: [ node-1 ]

 vip__public (ocf::mirantis:ns_IPaddr2): Started node-4
 Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-4 node-5 ]
 vip__management (ocf::mirantis:ns_IPaddr2): Started node-5
 Clone Set: clone_p_heat-engine [p_heat-engine]
     Started: [ node-4 node-5 ]
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Slaves: [ node-4 node-5 ]
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-4 node-5 ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-4
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-4 node-5 ]
 Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-4 node-5 ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-4 node-5 ]
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-4 node-5 ]

root@node-5:/var/log/rabbitmq# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-5' ...
Error: unable to connect to node 'rabbit@node-5': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@node-5']

rabbit@node-5:
  * connected to epmd (port 4369) on node-5
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on node-5
  * suggestion: start the node

current node details:
- node name: 'rabbitmqctl18364@node-5'
- home dir: /var/lib/rabbitmq
- cookie hash: soeIWU2jk2YNseTyDSlsEA==

status on node-4
[root@nailgun ~]# ssh node-4
Warning: Permanently added 'node-4' (RSA) to the list of known hosts.
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.13.0-40-generic x86_64)

 * Documentation: https://help.ubuntu.com/
New release '14.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Last login: Fri Dec 12 14:42:32 2014 from 10.120.0.2
root@node-4:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-4' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-4','rabbit@node-5']}]},
 {running_nodes,['rabbit@node-4']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
...done.
root@node-4:~#

Tags: rabbitmq
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This bug is not a dup of https://bugs.launchpad.net/bugs/1394635. I'm working on this case as well as a related fixes for https://bugs.launchpad.net/fuel/+bug/1396946. So it probably a dup for https://bugs.launchpad.net/fuel/+bug/1396946

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.