Distributed Cloud Ipv6: subcloud out-of-sync after initial setup

Bug #1859242 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yang Liu

Bug Description

Brief Description
-----------------
After distributed cloud system controller installed success, try to add subcloud. Subcloud added successfully, it shows online, but out-of-sync.
After unmanage/manage, subcloud comes back in-sync.

Severity
--------
Major

Steps to Reproduce
------------------
install DC system controller
install subcloud
check subcloud status

TC-name: DC install

Expected Behavior
------------------
subcloud shows in-sync

Actual Behavior
----------------
shows out-of-sync

Reproducibility
---------------
Reproducible
2/4
After 4 subclouds installed, 2 of them has this issue

System Configuration
--------------------
Multi-node system

Lab-name: WCP_80-91

Branch/Pull Time/Commit
-----------------------
"2020-01-06_08-04-16"

Last Pass
---------
2019-12-13_19-03-42

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+-------------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+-------------+
| 2 | subcloud1 | managed | online | complete | out-of-sync |
| 3 | subcloud4 | managed | online | complete | in-sync |
| 4 | subcloud5 | managed | online | complete | in-sync |
| 5 | subcloud6 | managed | online | complete | out-of-sync |
+----+-----------+------------+--------------+---------------+-------------+
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 2
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 2 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 20.01 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:2::0/64 |
| management_start_ip | fd01:2::2 |
| management_end_ip | fd01:2::11 |
| management_gateway_ip | fd01:2::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2020-01-09 16:23:50.672338 |
| updated_at | 2020-01-09 17:29:37.566852 |
| identity_sync_status | unknown |
| patching_sync_status | in-sync |
| platform_sync_status | unknown |
+-----------------------------+----------------------------+

Test Activity
-------------
install

Revision history for this message
Peng Peng (ppeng) wrote :
Peng Peng (ppeng)
description: updated
description: updated
Revision history for this message
Tao Liu (tliu88) wrote :
Download full text (3.2 KiB)

I took a look at the lab, and found that both subcloud 1 & 6 out-of-sync were caused by the same error.

The dcmanager failed to notify the dcorch about the subcloud state change due to an exception from the dcorch. The exception was originated from the dbsyncclient due to ‘Unauthorized request’.

There is no retry for RPC notification and no more subcloud state change to trigger a new notification. As a result, the subcloud 1 & 6 were offline in the dcorch and sync states were not updated.

After un-manage/manage subcloud 1 &6, the system recovered.

2020-01-09 17:29:37.444 115738 INFO dcmanager.manager.subcloud_audit_manager [-] Setting new availability status: online on subcloud: subcloud1
2020-01-09 17:29:37.970 115738 ERROR dcmanager.manager.subcloud_audit_manager [-] Remote error: Unauthorized Unauthorized request.
2020-01-09 17:29:37.973 115738 WARNING dcmanager.manager.subcloud_audit_manager [-] Problem informing dcorch of subcloud state change, subcloud: subcloud1: RemoteError: Remote error: Unauthorized Unauthorized request.
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 48, in wrapped\n return func(self, ctx, *args, **kwargs)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 204, in update_subcloud_states\n self.gsm.initial_sync(ctxt, subcloud_name)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/generic_sync_manager.py", line 116, in initial_sync\n subcloud_engine.initial_sync()\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/subcloud.py", line 138, in initial_sync\n thread.initial_sync()\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 239, in initial_sync\n consts.RESOURCE_TYPE_IDENTITY_USERS)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1525, in get_subcloud_resources\n self.sc_dbs_client)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1095, in _get_resource_audit_handler\n return self._get_users_resource(client.identity_manager)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1136, in _get_users_resource\n users = client.list_users()\n', u' File "/usr/lib/python2.7/site-packages/dcdbsync/dbsyncclient/v1/identity/identity_manager.py", line 192, in list_users\n return self.users_list(url)\n', u' File "/usr/lib/python2.7/site-packages/dcdbsync/dbsyncclient/v1/identity/identity_manager.py", line 108, in users_list\n raise exceptions.Unauthorized(\'Unauthorized request.\')\n', u'Unauthorized: Unauthorized request.\n'].: RemoteError: Rem...

Read more...

Ghada Khalil (gkhalil)
tags: added: stx.distcloud
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority as there is a workaround for this issue

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.4.0
Changed in starlingx:
assignee: nobody → Dariush Eslimi (deslimi)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to DC PL for next steps

Yang Liu (yliu12)
tags: added: stx.retestneeded
Ghada Khalil (gkhalil)
summary: - Distributed Cloud Ipv6: subcloud out-of-sync after initial steup
+ Distributed Cloud Ipv6: subcloud out-of-sync after initial setup
Revision history for this message
Dariush Eslimi (deslimi) wrote :

There has been many improvement in this area, please retest.

Changed in starlingx:
status: Triaged → Fix Released
assignee: Dariush Eslimi (deslimi) → Yang Liu (yliu12)
Revision history for this message
Peng Peng (ppeng) wrote :

We did not see this issue on
Lab: DC-3
Load: 2020-03-25_21-02-05

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.