On CentOS in HA mode on vcenter's machine, deploy of cluster has already been finished successfully, but on primary-controller is failed nova-network service.

Bug #1371638 reported by Tatyana Dubyk
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Fuel Partner Integration Team
5.1.x
Won't Fix
High
Fuel Partner Integration Team
6.0.x
Won't Fix
High
Fuel Partner Integration Team

Bug Description

==============vcenter settings===========================
export VCENTER_IP='172.16.0.254'
export <email address hidden>'
export VCENTER_PASSWORD='Qwer!1234'
export VCENTER_CLUSTERS='Cluster1,Cluster2'
=====================================================
Configuration:
===================================================
steps to reproduce:
1.set up lab on vcenter's machine from 5.1-11(RC5) iso
2.create env and start deploy:
   OS: CentOS (ha mode)
   create nodes with roles: 1st, 2nd - controllers,
                            3rd - controller-cinder (vmdk)
3. start deployment
4. check that deployment has been finished successfully,
   check that node-8 is a primary contorller,
   check that services on slaves are enabled and are smiling
   verify network connectivity
   verify ostf tests
5. delete node that is primary controller and
   add new node with controller role using Fuel UI
   re-deploy cluster again.
6. check that deploy has been finished successfully
   verify network connectivity
   run ostf tests
7. check that test: 'Check that required services are running' is failed
8. on node-11 perform 'nova-manage service list' manually
   and check that nova-network service is failed and marked 'XXX'

Expected result: Deployment process of openstack on each of nodes will be finished successfully
                 and all services are marked as smile in state.
Actual result: Deployment process of openstack on each of nodes finished successfully,
               but on primary controller nova-network service is failed.

--------------------------Logs------------------------------------
---------------------fuel-version---------------------------------

node-10 - primary controller.

[root@nailgun ~]# fuel --fuel-version
api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-09-17_21-40-34
build_number: '11'
feature_groups:
- mirantis
fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
nailgun_sha: eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: '5.1'
release_versions:
  2014.1.1-5.1:
    VERSION:
      api: '1.0'
      astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
      build_id: 2014-09-17_21-40-34
      build_number: '11'
      feature_groups:
      - mirantis
      fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
      fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
      nailgun_sha: eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d
      ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
      production: docker
      release: '5.1'

[root@nailgun ~]# fuel nodes list
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|----------|------------------|---------|--------------|-------------------|--------------------|---------------|-------
12 | discover | Untitled (7c:04) | None | 10.108.0.213 | 64:25:92:2c:7c:04 | | | True
9 | ready | Untitled (d8:d8) | 3 | 10.108.0.4 | 64:fe:f2:3e:d8:d8 | controller | | True
10 | ready | Untitled (85:67) | 3 | 10.108.0.5 | 64:80:1d:c5:85:67 | cinder, controller | | True
13 | discover | Untitled (50:6b) | None | 10.108.0.152 | 64:c5:3e:a7:50:6b | | | True
11 | ready | Untitled (67:d2) | 3 | 10.108.0.6 | 64:a6:20:06:67:d2 | controller | | True

[root@node-11 ~]# nova-manage service list
Binary Host Zone Status State Updated_At
nova-consoleauth node-8.test.domain.local internal enabled XXX 2014-09-19 13:13:25
nova-scheduler node-8.test.domain.local internal enabled XXX 2014-09-19 13:13:25
nova-conductor node-8.test.domain.local internal enabled XXX 2014-09-19 13:13:28
nova-compute node-8.test.domain.local nova enabled XXX 2014-09-19 13:13:32
nova-network node-8.test.domain.local internal enabled XXX 2014-09-19 13:13:31
nova-cert node-8.test.domain.local internal enabled XXX 2014-09-19 13:13:28
nova-consoleauth node-10.test.domain.local internal enabled :-) 2014-09-19 14:06:12
nova-scheduler node-10.test.domain.local internal enabled :-) 2014-09-19 14:06:17
nova-conductor node-10.test.domain.local internal enabled :-) 2014-09-19 14:06:19
nova-consoleauth node-9.test.domain.local internal enabled :-) 2014-09-19 14:06:18
nova-scheduler node-9.test.domain.local internal enabled :-) 2014-09-19 14:06:13
nova-conductor node-9.test.domain.local internal enabled :-) 2014-09-19 14:06:15
nova-cert node-10.test.domain.local internal enabled :-) 2014-09-19 14:06:18
nova-cert node-9.test.domain.local internal enabled :-) 2014-09-19 14:06:16
nova-compute node-9.test.domain.local nova enabled :-) 2014-09-19 14:06:17
nova-network node-10.test.domain.local internal enabled XXX 2014-09-19 13:48:54
nova-consoleauth node-11.test.domain.local internal enabled :-) 2014-09-19 14:06:18
nova-scheduler node-11.test.domain.local internal enabled :-) 2014-09-19 14:06:20
nova-conductor node-11.test.domain.local internal enabled :-) 2014-09-19 14:06:10
nova-network node-11.test.domain.local internal enabled :-) 2014-09-19 14:06:13
nova-cert node-11.test.domain.local internal

Tags: nailgun
Revision history for this message
Tatyana Dubyk (tdubyk) wrote :
Changed in fuel:
milestone: 5.1 → 6.0
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Andrey Danin (gcon-monolake) wrote :

The problem is in node deletion logic of Nailgun. It just reboot a deleted node to Bootstrap and doesn't reconfigure another nodes in Cluster. It case of this bug it leads that we finally have 4 Controller nodes with one of them shut down (actually rebooted to Bootstrap). It can (and eventually does) cause problems with HA up to a total inoperability. So, this bug is not about a VMWare stuff but about an environment management stuff.

As I see, there is no easy and fast way to fix that. We should redesign (almost totally) the feature of role deletion, and it should be done as a blueprint not a bug.

tags: added: nailgun
removed: vcenter
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Reassigned 5.1 -> 5.1.1, as 5.1 is already released.

Revision history for this message
Evgeniya Shumakher (eshumakher) wrote :

Let's propose a blueprint then.
Andrey, please add it to the Engineering suggestion for 6.1 doc.

Revision history for this message
Stepan Rogov (srogov) wrote : Re: [Bug 1371638] Re: On CentOS in HA mode on vcenter's machine, deploy of cluster has already been finished successfully, but on primary-controller is failed nova-network service.

Это дупликейт баг.
https://bugs.launchpad.net/fuel/+bug/1394188
> Let's propose a blueprint then.
> Andrey, please add it to the Engineering suggestion for 6.1 doc.
>
> ** Changed in: fuel/5.1.x
> Status: Confirmed => Won't Fix
>
> ** Changed in: fuel/6.0.x
> Status: Confirmed => Won't Fix
>

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Node deletion logic depends on granular deployment feature. So it will not be fixed in near future. AFAIK all other components have own logic for removal of nodes as part of HA.

Anyway it is a wrong behavior to have something broken in HA if one node goes down. So the current issue is not about nodes removal in general. It is about broken HA for vCenter.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.