Distributed Cloud Ipv6: subcloud out-of-sync after initial setup

Bug #1859242 reported by Peng Peng on 2020-01-10
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Dariush Eslimi

Bug Description

Brief Description
-----------------
After distributed cloud system controller installed success, try to add subcloud. Subcloud added successfully, it shows online, but out-of-sync.
After unmanage/manage, subcloud comes back in-sync.

Severity
--------
Major

Steps to Reproduce
------------------
install DC system controller
install subcloud
check subcloud status

TC-name: DC install

Expected Behavior
------------------
subcloud shows in-sync

Actual Behavior
----------------
shows out-of-sync

Reproducibility
---------------
Reproducible
2/4
After 4 subclouds installed, 2 of them has this issue

System Configuration
--------------------
Multi-node system

Lab-name: WCP_80-91

Branch/Pull Time/Commit
-----------------------
"2020-01-06_08-04-16"

Last Pass
---------
2019-12-13_19-03-42

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+-------------+
| id | name | management | availability | deploy status | sync |
+----+-----------+------------+--------------+---------------+-------------+
| 2 | subcloud1 | managed | online | complete | out-of-sync |
| 3 | subcloud4 | managed | online | complete | in-sync |
| 4 | subcloud5 | managed | online | complete | in-sync |
| 5 | subcloud6 | managed | online | complete | out-of-sync |
+----+-----------+------------+--------------+---------------+-------------+
[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud show 2
+-----------------------------+----------------------------+
| Field | Value |
+-----------------------------+----------------------------+
| id | 2 |
| name | subcloud1 |
| description | None |
| location | None |
| software_version | 20.01 |
| management | managed |
| availability | online |
| deploy_status | complete |
| management_subnet | fd01:2::0/64 |
| management_start_ip | fd01:2::2 |
| management_end_ip | fd01:2::11 |
| management_gateway_ip | fd01:2::1 |
| systemcontroller_gateway_ip | fd01:1::1 |
| created_at | 2020-01-09 16:23:50.672338 |
| updated_at | 2020-01-09 17:29:37.566852 |
| identity_sync_status | unknown |
| patching_sync_status | in-sync |
| platform_sync_status | unknown |
+-----------------------------+----------------------------+

Test Activity
-------------
install

Peng Peng (ppeng) on 2020-01-10
description: updated
description: updated
Tao Liu (tliu88) wrote :
Download full text (3.2 KiB)

I took a look at the lab, and found that both subcloud 1 & 6 out-of-sync were caused by the same error.

The dcmanager failed to notify the dcorch about the subcloud state change due to an exception from the dcorch. The exception was originated from the dbsyncclient due to ‘Unauthorized request’.

There is no retry for RPC notification and no more subcloud state change to trigger a new notification. As a result, the subcloud 1 & 6 were offline in the dcorch and sync states were not updated.

After un-manage/manage subcloud 1 &6, the system recovered.

2020-01-09 17:29:37.444 115738 INFO dcmanager.manager.subcloud_audit_manager [-] Setting new availability status: online on subcloud: subcloud1
2020-01-09 17:29:37.970 115738 ERROR dcmanager.manager.subcloud_audit_manager [-] Remote error: Unauthorized Unauthorized request.
2020-01-09 17:29:37.973 115738 WARNING dcmanager.manager.subcloud_audit_manager [-] Problem informing dcorch of subcloud state change, subcloud: subcloud1: RemoteError: Remote error: Unauthorized Unauthorized request.
[u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 48, in wrapped\n return func(self, ctx, *args, **kwargs)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/service.py", line 204, in update_subcloud_states\n self.gsm.initial_sync(ctxt, subcloud_name)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/generic_sync_manager.py", line 116, in initial_sync\n subcloud_engine.initial_sync()\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/subcloud.py", line 138, in initial_sync\n thread.initial_sync()\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 239, in initial_sync\n consts.RESOURCE_TYPE_IDENTITY_USERS)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1525, in get_subcloud_resources\n self.sc_dbs_client)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1095, in _get_resource_audit_handler\n return self._get_users_resource(client.identity_manager)\n', u' File "/usr/lib/python2.7/site-packages/dcorch/engine/sync_services/identity.py", line 1136, in _get_users_resource\n users = client.list_users()\n', u' File "/usr/lib/python2.7/site-packages/dcdbsync/dbsyncclient/v1/identity/identity_manager.py", line 192, in list_users\n return self.users_list(url)\n', u' File "/usr/lib/python2.7/site-packages/dcdbsync/dbsyncclient/v1/identity/identity_manager.py", line 108, in users_list\n raise exceptions.Unauthorized(\'Unauthorized request.\')\n', u'Unauthorized: Unauthorized request.\n'].: RemoteError: Rem...

Read more...

Ghada Khalil (gkhalil) on 2020-01-13
tags: added: stx.distcloud
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority as there is a workaround for this issue

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.4.0
Changed in starlingx:
assignee: nobody → Dariush Eslimi (deslimi)
Ghada Khalil (gkhalil) wrote :

Assigning to DC PL for next steps

Yang Liu (yliu12) on 2020-01-17
tags: added: stx.retestneeded
Ghada Khalil (gkhalil) 18 hours ago
summary: - Distributed Cloud Ipv6: subcloud out-of-sync after initial steup
+ Distributed Cloud Ipv6: subcloud out-of-sync after initial setup
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers