Distributed Cloud – swact failed due to dcorch-engine not being enabled

Bug #1855791 reported by Tao Liu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tao Liu

Bug Description

Brief Description
-----------------
While performing swact testing on System Controller which was installed and configured in a virtual environment, I observed that the swact operation failed due to the dcorch-engine process
not being enabled.

At the startup, the dcorch-engine starts the QuotaManager which tries to open keystone client session against the keystone-api-proxy. At this time, the keystone-api-proxy is not yet running.

The current issues:
1.The QuotaManager, is no longer needed. It will be removed as part of story 2006588.
2.The class EndpointCache() opens, the keystone client session with the region name is set to consts.VIRTUAL_MASTER_CLOUD, where it should be set to consts.CLOUD_0. As a result, it attempts to open connections to the keystone-api-proxy.

After a quick scan of the dc code, it appears that there are multiple places that should have used consts.CLOUD_0 as region name not the consts.VIRTUAL_MASTER_CLOUD.

Severity
--------
Medium

Steps to Reproduce
---------
Install and configure a distributed cloud system w/ AIO system controller and perform host-swact.

Expected Behavior
---------
The host-swact operation should be completed successfully.

Actual Behavior
----------------
The host-swact operation did not completed successfully.

Reproducibility
---------
Yes

System Configuration
---------
Distributed Cloud system w/ AIO system controller

Branch/Pull Time/Commit
---------
stx master 2019-12-05 21:14:11

Last Pass
---------
N/A

Timestamp/Logs
--------------
ERROR oslo_service.service [req-e4bb4f7c-fc06-4fa6-b002-ea58f029bb08 - - - - -] Error starting thread.: ConnectFailure: Unable to establish connection to http://192.168.204.2:25000/v3/services?: HTTPConnectionPool(host='192.168.204.2', port=25000): Max retries exceeded with url: /v3/services (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb9ce561410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service Traceback (most recent call last):
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 796, in run_service
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service service.start()
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 103, in start
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service self.init_qm()
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 87, in init_qm
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service self.qm = QuotaManager()
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/engine/quota_manager.py", line 73, in __init__
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service self.endpoints = endpoint_cache.EndpointCache()
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/common/endpoint_cache.py", line 46, in __init__
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service self._update_endpoints()
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/common/endpoint_cache.py", line 136, in _update_endpoints
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service endpoint_map = EndpointCache._get_endpoint_from_keystone(self)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/dcorch/common/endpoint_cache.py", line 107, in _get_endpoint_from_keystone
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service for service in self.keystone_client.services.list():
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneclient/v3/services.py", line 93, in list
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service **kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneclient/base.py", line 86, in func
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service return f(*args, **new_kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneclient/base.py", line 448, in list
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service list_resp = self._list(url_query, self.collection_key)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneclient/base.py", line 141, in _list
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service resp, body = self.client.get(url, **kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 375, in get
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service return self.request(url, 'GET', **kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 534, in request
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 237, in request
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service return self.session.request(url, method, **kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 835, in request
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service resp = send(**kwargs)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 942, in _send_request
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service raise exceptions.ConnectFailure(msg)
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service ConnectFailure: Unable to establish connection to http://192.168.204.2:25000/v3/services?: HTTPConnectionPool(host='192.168.204.2', port=25000): Max retries exceeded with url: /v3/services (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb9ce561410>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
2019-12-08 18:53:52.907 206380 ERROR oslo_service.service
2019-12-08 18:53:53.061 206380 ERROR dcorch.engine.service [-] Failed to stop engine service: 'NoneType' object has no attribute 'stop'
2019-12-08 18:53:53.062 206380 INFO dcorch.engine.service [-] All threads were gone, terminating engine

Test Activity
-------------
Design test for Story 2006980

Tao Liu (tliu88)
summary: - Distributed Cloud – swact failed due to dcorch-engine not being enable
+ Distributed Cloud – swact failed due to dcorch-engine not being enabled
Ghada Khalil (gkhalil)
tags: added: stx.distcloud
Changed in starlingx:
assignee: nobody → Tao Liu (tliu88)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Requested info from the test team on whether they see this race condition on baremetal distributed cloud systems.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/698605

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on information from Yang Liu (WR test lead), they have not seen this issue when executing 30 swacts on a baremetal DC system. This appears to be a race condition mostly on vbox and perhaps slower machines.

Marking for stx.4.0 as this doesn't appear serious enough to be included in an stx.3.0 mtce release.

tags: added: stx.4.0
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/698605
Committed: https://git.openstack.org/cgit/starlingx/distcloud/commit/?id=111cdcc2d8c82345f252703b7b82a72cff124a52
Submitter: Zuul
Branch: master

commit 111cdcc2d8c82345f252703b7b82a72cff124a52
Author: Tao Liu <email address hidden>
Date: Wed Dec 11 18:01:23 2019 -0500

    Fix a swact issue on system controller

    The swact operations failed few times due to dcorch-engine not
    being enabled. At the startup, the dcorch-engine starts the
    QuotaManager, which tries to open keystone client session
    against the keystone-api-proxy. At this time,
    the keystone-api-proxy is not yet running.

    This dependency is introduced by an incorrect region name used
    in the class EndpointCache. The default keystone client session
    should be opened against the keystone in RegionOne.
    The dcorch-engine enabling process, has a dependency on keystone
    which enables keystone first.

    This update corrects the region name in EndpointCache.

    Change-Id: I1da34ed489c2a7bd6cf43889bb9e173f070b3fa8
    Closes-Bug: 1855791
    Signed-off-by: Tao Liu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.