Fuel for OpenStack

rabbitMQ became broken after 1 controller node was deleted and 1 added at the same time

Bug #1370558 reported by Tatyana Dubyk on 2014-09-17

This bug report is a duplicate of: Bug #1394188: [pacemaker provider] Deployment of additional/replace controller fails. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Confirmed	High	Vladimir Kuklin	Fuel for OpenStack 6.0
	5.1.x	Won't Fix	High	Fuel Library (Deprecated)	Fuel for OpenStack 5.1.1

Bug Description

On CentOS in HA mode on vcenter's machine,on primary controller deploy of openstack is crashed after addition 1 secondary controller and deletion of another due to reason that rabbitmq can't connect to
primary controller

==============vcenter settings===========================
export VCENTER_IP='172.16.0.254'
export <email address hidden>'
export VCENTER_PASSWORD='Qwer!1234'
export VCENTER_CLUSTERS='Cluster1,Cluster2'

=====================================================
Configuration:
===================================================
steps to reproduce:
1.set up lab on vcenter's machine from 5.1-9(RC3) iso
2.create env and start deploy:
   OS: CentOS (HA mode)
   roles: 2 controllers,
          1 controller + 1 cinder (vmdk)
3. check that deployment of openstack is finished sucessfully
   сheck that services on primary controller are available
4. add one secondary controller and delete another secondary controller
5. re-deploy again
6. check that deploy of openstack on primary controller is failed by error:
    (/Stage[main]/Rabbitmq::Server/Rabbitmq_user[guest])
Error: unable to connect to node 'rabbit@node-2': nodedown

node-2 is primary controller. And it is available.

Expected result: Deployment process of openstack on each of nodes will be finished successfully
Actual result: deployment of openstack on primary controller is failed due to reason:
(/Stage[main]/Rabbitmq::Server/Rabbitmq_user[guest])
Error: unable to connect to node 'rabbit@node-2': nodedown

------------Logs------------------------------------

---------------------fuel-version-------------------------------
[root@nailgun ~]# fuel --fuel-version
api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-09-17_04-49-39
build_number: '9'
feature_groups:
- mirantis
fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
nailgun_sha: 51231834c61920a5dea8ce402ad027b2505d632d
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: '5.1'
release_versions:
  2014.1.1-5.1:
    VERSION:
      api: '1.0'
      astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
      build_id: 2014-09-17_04-49-39
      build_number: '9'
      feature_groups:
      - mirantis
      fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
      fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
      nailgun_sha: 51231834c61920a5dea8ce402ad027b2505d632d
      ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
      production: docker
      release: '5.1'
----

[root@node-2 rabbitmq]# vim <email address hidden>

=INFO REPORT==== 17-Sep-2014::13:46:56 ===
Error description:
{error,{inconsistent_cluster,"Node 'rabbit@node-2' thinks it's clustered with node 'rabbit@node-4', but 'rabbit@node-4' disagrees"}}

Log files (may contain more information):
/<email address hidden>
/<email address hidden>

[root@node-2 rabbitmq]# vim <email address hidden>
Stack trace:
   [{rabbit_mnesia,check_cluster_consistency,0},
    {rabbit,'-start/0-fun-1-',0},
    {rabbit,start_it,1},
    {rpc,'-handle_call_call/6-fun-0-',5}]

See original description

Tags:

Revision history for this message

Tatyana Dubyk (tdubyk) wrote on 2014-09-17:

Diagnostic snapshot Edit (10.8 MiB, application/x-tar)

description:

updated

Evgeniya Shumakher (eshumakher) on 2014-09-17

Changed in fuel:
assignee:	nobody → Fuel Partner Integration Team (fuel-partner)
importance:	Undecided → High

Evgeniya Shumakher (eshumakher) on 2014-09-17

Changed in fuel:
assignee:	Fuel Partner Integration Team (fuel-partner) → Fuel Library Team (fuel-library)

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-09-17:

I am not sure that such a use case is supported right now - we need to investigate how rabbitmq operates when you reset the cluster completely.

Also, it is not clear if you were deleting and adding controllers simultaneously. It looks like bad things can happen if we try to shake rabbitmq cluster in such a way.

I will mark this bug as a known issue for 5.1 and target it to 6.0 for further investigation

tags:

added: release-notes

Vladimir Kuklin (vkuklin) on 2014-09-17

Changed in fuel:
status:	New → Won't Fix
no longer affects:	fuel/5.1.x

Revision history for this message

Andrey Danin (gcon-monolake) wrote on 2014-10-23:

When a user deletes nodes from already deployed environment Nailgun just reboot deleted nodes to the Bootstrap image and doesn't reconfigure other nodes in cluster. It leads to inconsistency in RabbitMQ and Galera clusters configuration. It can be a root of the problem Tatyana faced with.

summary:

- On CentOS in HA mode on vcenter's machine,on primary controller deploy
- of openstack is crashed due to reason that rabbitmq can't connect to
- primary controller
+ rabbitMQ became broken after 1 controller node was deleted and 1 added
+ at the same time

Andrey Danin (gcon-monolake) on 2014-11-18

tags:

removed: vcenter

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-11-24:

This bug looks like a duplicate of https://bugs.launchpad.net/fuel/+bug/1394188. I am marking it this way. Please, check if the issue is solved, otherwise - reopen the bug.

no longer affects:	fuel/6.0.x
Changed in fuel:
milestone:	5.1 → 6.0
status:	Won't Fix → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1394188 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Diagnostic snapshot Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.