rabbitMQ became broken after 1 controller node was deleted and 1 added at the same time

Bug #1370558 reported by Tatyana Dubyk
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Vladimir Kuklin
5.1.x
Won't Fix
High
Fuel Library (Deprecated)

Bug Description

On CentOS in HA mode on vcenter's machine,on primary controller deploy of openstack is crashed after addition 1 secondary controller and deletion of another due to reason that rabbitmq can't connect to
primary controller

==============vcenter settings===========================
export VCENTER_IP='172.16.0.254'
export <email address hidden>'
export VCENTER_PASSWORD='Qwer!1234'
export VCENTER_CLUSTERS='Cluster1,Cluster2'

=====================================================
Configuration:
===================================================
steps to reproduce:
1.set up lab on vcenter's machine from 5.1-9(RC3) iso
2.create env and start deploy:
   OS: CentOS (HA mode)
   roles: 2 controllers,
          1 controller + 1 cinder (vmdk)
3. check that deployment of openstack is finished sucessfully
   сheck that services on primary controller are available
4. add one secondary controller and delete another secondary controller
5. re-deploy again
6. check that deploy of openstack on primary controller is failed by error:
    (/Stage[main]/Rabbitmq::Server/Rabbitmq_user[guest])
Error: unable to connect to node 'rabbit@node-2': nodedown

node-2 is primary controller. And it is available.

Expected result: Deployment process of openstack on each of nodes will be finished successfully
Actual result: deployment of openstack on primary controller is failed due to reason:
 (/Stage[main]/Rabbitmq::Server/Rabbitmq_user[guest])
Error: unable to connect to node 'rabbit@node-2': nodedown

------------Logs------------------------------------

---------------------fuel-version-------------------------------
[root@nailgun ~]# fuel --fuel-version
api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-09-17_04-49-39
build_number: '9'
feature_groups:
- mirantis
fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
nailgun_sha: 51231834c61920a5dea8ce402ad027b2505d632d
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: '5.1'
release_versions:
  2014.1.1-5.1:
    VERSION:
      api: '1.0'
      astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
      build_id: 2014-09-17_04-49-39
      build_number: '9'
      feature_groups:
      - mirantis
      fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
      fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
      nailgun_sha: 51231834c61920a5dea8ce402ad027b2505d632d
      ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
      production: docker
      release: '5.1'
----

[root@node-2 rabbitmq]# vim <email address hidden>

=INFO REPORT==== 17-Sep-2014::13:46:56 ===
Error description:
   {error,{inconsistent_cluster,"Node 'rabbit@node-2' thinks it's clustered with node 'rabbit@node-4', but 'rabbit@node-4' disagrees"}}

Log files (may contain more information):
   /<email address hidden>
   /<email address hidden>

[root@node-2 rabbitmq]# vim <email address hidden>
Stack trace:
   [{rabbit_mnesia,check_cluster_consistency,0},
    {rabbit,'-start/0-fun-1-',0},
    {rabbit,start_it,1},
    {rpc,'-handle_call_call/6-fun-0-',5}]

Revision history for this message
Tatyana Dubyk (tdubyk) wrote :
description: updated
Changed in fuel:
assignee: nobody → Fuel Partner Integration Team (fuel-partner)
importance: Undecided → High
Changed in fuel:
assignee: Fuel Partner Integration Team (fuel-partner) → Fuel Library Team (fuel-library)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I am not sure that such a use case is supported right now - we need to investigate how rabbitmq operates when you reset the cluster completely.

Also, it is not clear if you were deleting and adding controllers simultaneously. It looks like bad things can happen if we try to shake rabbitmq cluster in such a way.

I will mark this bug as a known issue for 5.1 and target it to 6.0 for further investigation

tags: added: release-notes
Changed in fuel:
status: New → Won't Fix
no longer affects: fuel/5.1.x
Revision history for this message
Andrey Danin (gcon-monolake) wrote :

When a user deletes nodes from already deployed environment Nailgun just reboot deleted nodes to the Bootstrap image and doesn't reconfigure other nodes in cluster. It leads to inconsistency in RabbitMQ and Galera clusters configuration. It can be a root of the problem Tatyana faced with.

summary: - On CentOS in HA mode on vcenter's machine,on primary controller deploy
- of openstack is crashed due to reason that rabbitmq can't connect to
- primary controller
+ rabbitMQ became broken after 1 controller node was deleted and 1 added
+ at the same time
tags: removed: vcenter
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This bug looks like a duplicate of https://bugs.launchpad.net/fuel/+bug/1394188. I am marking it this way. Please, check if the issue is solved, otherwise - reopen the bug.

no longer affects: fuel/6.0.x
Changed in fuel:
milestone: 5.1 → 6.0
status: Won't Fix → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.