Distributed Cloud: Initial keystone sync does not handle failures correctly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Dan Voiculeasa |
Bug Description
Brief Description
-----------------
The initial sync code for keystone (identity) does not handle failures properly. For example, if an initial sync is triggered, but the request to the subcloud to get the users fails due to “Unauthorized request”, then the initial sync passes without actually doing an initial sync. This will delay the availability of the subcloud - it may go offline due to unsynced identity data.
Severity
--------
Major: Subcloud seems to recover after some time.
Steps to Reproduce
------------------
This happens regularly when a subcloud is installed and managed.
Expected Behavior
------------------
If the initial keystone sync fails for any reason, the sync should indicate the failure so the dcorch code can re-attempt the initial sync until it is successful.
Actual Behavior
----------------
Failures in the initial keystone sync are silent.
Reproducibility
---------------
Reproducible
System Configuration
-------
Distributed cloud
Branch/Pull Time/Commit
-------
Designer load built from master on Feburary 4, 2020.
Last Pass
---------
Unknown
Timestamp/Logs
--------------
The following logs show the initial sync passing when it should have failed:
2020-02-11 15:24:26.036 1779411 INFO dcorch.
2020-02-11 15:24:27.023 1779411 INFO dcorch.
2020-02-11 15:24:28.018 1779411 INFO dcorch.
2020-02-11 15:24:28.727 1779411 INFO dcorch.
2020-02-11 15:24:30.042 1779411 INFO dcorch.
2020-02-11 15:24:30.181 1779411 INFO dcorch.
2020-02-11 15:24:30.182 1779411 INFO dcorch.
2020-02-11 15:24:31.358 1779411 INFO dcorch.
2020-02-11 15:24:31.454 1779411 WARNING cgtsclient.
2020-02-11 15:24:31.455 1779411 INFO dcorch.
2020-02-11 15:24:31.475 1779411 INFO dcorch.
2020-02-11 15:24:31.476 1779411 INFO dcorch.
2020-02-11 15:24:31.611 1779411 INFO dcorch.
2020-02-11 15:24:31.611 1779411 INFO dcorch.
2020-02-11 15:24:32.680 1779411 INFO dcorch.
2020-02-11 15:24:33.193 1779411 INFO dcorch.
2020-02-11 15:24:33.196 1779411 INFO dcorch.
2020-02-11 15:24:33.697 1779411 INFO dcorch.
2020-02-11 15:24:33.699 1779411 INFO dcorch.
2020-02-11 15:24:33.712 1779411 INFO dcorch.
In this case, I think the bug is in get_subcloud_
Test Activity
-------------
Developer Testing
Workaround
----------
Wait for audit to correctly sync keystone data. I am not sure that this will resolve the issue in all cases.
Changed in starlingx: | |
assignee: | Tao Liu (tliu88) → Dan Voiculeasa (dvoicule) |
Changed in starlingx: | |
status: | In Progress → Fix Released |
stx.4.0 / medium priority - issue with handling of failure condition