When a node is removed from the cluster, its OS services are not unregistered properly

Bug #1457515 reported by Olesia Tsvigun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Wishlist
Fuel Library (Deprecated)
6.0.x
Won't Fix
Wishlist
Fuel Library (Deprecated)
6.1.x
Won't Fix
Wishlist
Fuel Documentation Team

Bug Description

Fuel ISO#443

OS Ubuntu, CentOS

Steps to reproduce:
1. Create cluster with vCenter support
2. Add 4 nodes with Controller roles
3. Add 2 nodes with compute role
4. Set Nova-Network VlanManager as a network backend.
5. Deploy the cluster
6. Run OSTF.
7. Remove 1 node with controller role and redeploy cluster.
8. Run OSTF.

Expected result
All OSTF test cases passed.

Actual result:
OSTF test 'Check that required services are running' failed with error 'Some nova services have not been started.. Please refer to OpenStack logs for more details.'

__________________________________________________________________________

The problem is that despite we remove a node from the cluster, OpenStack APIs are not notified properly, i.e. we don't unregister the services running on the cluster node to be removed. OpenStack APIs track availability of such services and obviously, `nova service-list` or similar command for other OS APIs will report its daemons running on the deleted node as down.

The solution is to unregister each service explicitly *right after* remove a node from the cluster. This needs to be done *at least* for:

1) Nova services (`nova service-list`)
2) Cinder services (`cinder service-list`)
3) Neutron agents (`neutron agent-list`)
...

Changed in fuel:
milestone: none → 6.1
importance: Undecided → High
assignee: nobody → Fuel Partner Integration Team (fuel-partner)
Revision history for this message
Olesia Tsvigun (otsvigun) wrote :
Andrian Noga (anoga)
Changed in fuel:
assignee: Fuel Partner Integration Team (fuel-partner) → Igor Gajsin (igajsin)
description: updated
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

If nova-manage service list shows with XXX for controller that we remove - it is related to the mos(some times ago we post such issues) , so please, add output of command nova-manage service list from controller and id of removed controller

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Also is there some snapshot?

Revision history for this message
Olesia Tsvigun (otsvigun) wrote :

Yes, snapshot fuel_nova_serv_error.tar.gz is added. And I will add output of command nova-manage service as soon as possible.

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

yep, sorry I miss it at the first time, I looked at snapshot on output from test, and next things I see here
packages/nova/servicegroup/drivers/db.py:75
nova-cert node-5.test.domain.local internal enabled XXX 2015-05-21 10:31:40
nova-consoleauth node-2.test.domain.local internal enabled :-) 2015-05-21 12:32:26
nova-scheduler node-2.test.domain.local internal enabled :-) 2015-05-21 12:32:29
nova-conductor node-2.test.domain.local internal enabled :-) 2015-05-21 12:32:31
nova-cert node-2.test.domain.local internal enabled :-) 2015-05-21 12:32:32
nova-consoleauth node-3.test.domain.local internal enabled :-) 2015-05-21 12:32:36
nova-scheduler node-3.test.domain.local internal enabled :-) 2015-05-21 12:32:39
nova-conductor node-3.test.domain.local internal enabled :-) 2015-05-21 12:32:41
nova-cert node-3.test.domain.local internal enabled :-) 2015-05-21 12:32:43
2015-05-21 12:32:48.538 23307 DEBUG nova.servicegroup.drivers.db [req-6525241a-228f-496b-a424-f8c5fb50160e None] Seems service is down. Last heartbeat was 2015-05-21 12:28:29. Elapsed time is 259.538842 is_up /usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py:75
nova-compute node-7.test.domain.local nova enabled XXX 2015-05-21 12:28:29
2015-05-21 12:32:48.540 23307 DEBUG nova.servicegroup.drivers.db [req-6525241a-228f-496b-a424-f8c5fb50160e None] Seems service is down. Last heartbeat was 2015-05-21 12:29:21. Elapsed time is 207.540535 is_up /usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py:75
nova-compute node-4.test.domain.local nova enabled XXX 2015-05-21 12:29:21

As you can see nodes where compute is running updated at 12:29:21 and controllers has 12:32:29, so it may related to the time sync issues

Andrian Noga (anoga)
Changed in fuel:
status: New → In Progress
Revision history for this message
Olesia Tsvigun (otsvigun) wrote :
Download full text (5.5 KiB)

node-5 controler was deleted, but its status of nova servises is displaed on controlers.

fuel node list
[root@nailgun ~]# fuel node list
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|---------------------|---------|-------------|-------------------|------------|---------------|--------|---------
6 | ready | slave-06_compute | 1 | 10.109.0.8 | 64:89:c0:8a:c7:a2 | compute | | True | 1
2 | ready | slave-02_controller | 1 | 10.109.0.4 | 64:7e:e0:52:c6:5d | controller | | True | 1
3 | ready | slave-05_compute | 1 | 10.109.0.7 | 64:a7:0a:99:f1:88 | compute | | True | 1
4 | ready | slave-03_controller | 1 | 10.109.0.5 | 64:e0:6d:dc:e1:f3 | controller | | True | 1
1 | ready | slave-01_controller | 1 | 10.109.0.3 | 64:d0:a8:fd:e6:49 | controller | | True | 1
8 | discover | Untitled (fd:5d) | None | 10.109.0.9 | 64:d9:b7:8b:fd:5d | | | True | None
9 | discover | Untitled (06:77) | None | 10.109.0.6 | 64:22:df:ea:06:77 | | | True | None
7 | discover | Untitled (47:93) | None | 10.109.0.10 | 64:51:d0:a8:47:93 | | | True | None

Output frome controler

[root@node-1 ~]# nova-manage service list
2015-05-22 18:37:14.514 16606 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:65
2015-05-22 18:37:15.309 16606 DEBUG oslo.db.sqlalchemy.session [req-6c4ab87a-4184-45ab-808f-69c5be4407da ] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.6/site-packages/oslo/db/sqlalchemy/session.py:482
Binary Host Zone Status State Updated_At
nova-consoleauth node-1.test.domain.local internal enabled :-) 2015-05-22 18:36:56
nova-scheduler node-1.test.domain.local internal enabled :-) 2015-05-22 18:37:00
nova-conductor node-1.test.domain.local internal enabled :-) 2015-05-22 18:37:01
nova-cert node-1.test.domain.local internal enabled :-) 2015-05-22 18:37:13
nova-compute vcenter-vmcluster1 vcenter enabled :-) 2015-05-22 18:37:07
nova-network nova-network-ha internal enabled :-) 2015-05-22 18:36:45
2015-05-22 18:37:15.666 16606 DEBUG nova.servicegroup.drivers.db [req-6c4ab87a-4184-45ab-808f-69c5be4407da None] Seems service is down. Last heartbeat was 2015-05...

Read more...

Revision history for this message
Olesia Tsvigun (otsvigun) wrote :

Fuel version

[root@nailgun ~]# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
auth_required: true
build_id: 2015-05-21_04-04-09
build_number: '446'
feature_groups:
- mirantis
fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 795f8a045400fe82ccc30ae018e85324b3fa1de5
      build_id: 2015-05-21_04-04-09
      build_number: '446'
      feature_groups:
      - mirantis
      fuel-library_sha: a03efb582b06bfe8d9776dce244d4a2f2e2ba886
      fuel-ostf_sha: 3dd25a018f2a5c47ec6c885436b3ba69690ef1b9
      fuelmain_sha: 5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93
      nailgun_sha: 403c6b7ea3c62bb4fda27eb9cedee37f7144558c
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: e19f1b65792f84c4a18b5a9473f85ef3ba172fce
      release: '6.1'

Changed in fuel:
assignee: Igor Gajsin (igajsin) → MOS Nova (mos-nova)
Revision history for this message
Olesia Tsvigun (otsvigun) wrote :
summary: - Some nova services have not been started after delete controler and
- redeploy cluster with Vcenter.
+ When a node is removed from the cluster, OS services are not deleted
+ properly
description: updated
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote : Re: When a node is removed from the cluster, OS services are not deleted properly

Updated the description and decreased the bug priority - this is not a blocker for 6.1.

description: updated
summary: - When a node is removed from the cluster, OS services are not deleted
- properly
+ When a node is removed from the cluster, OS services are not
+ unregistered properly
summary: - When a node is removed from the cluster, OS services are not
+ When a node is removed from the cluster, its OS services are not
unregistered properly
Changed in fuel:
milestone: 6.1 → 7.0
Revision history for this message
Igor Zinovik (izinovik) wrote :

Olesia, please try following commands to workaround the problem:

controller# . openrc
controller# nova service-list 2>&1 | grep XXX
# Then use IDs that were found in grep output
controller# nova delete failed_service_ID1
controller# nova delete failed_service_ID2
...

Revision history for this message
Igor Gajsin (igajsin) wrote :

I have same environment, which was deployed manually. First environment had 4 controller. After deploy was finished I removed one of them (node-4) and redeploy the environment..

As a result can see nova services for node-4 in a down state: http://paste.mirantis.net/show/482/
They can be delete with the `nova service-delete ` command: http://paste.mirantis.net/show/483/

tags: added: release-notes
no longer affects: fuel/7.0.x
Changed in fuel:
assignee: Fuel for Openstack (fuel) → Fuel Library Team (fuel-library)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

should be addressed a part of life cycle management feature

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This issue is related to our orchestator engine limitations. It cannot be reliably fixed without altering our architecture. It is more feature request than a bug itself. Marking as wishlist.

Changed in fuel:
importance: Medium → Wishlist
status: Confirmed → Won't Fix
tags: added: life-cycle-management
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.