DC subcloud group restore: Keystone token expired for subclouds that were in the queue for more than an hour
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Victor Romano |
Bug Description
Brief Description
DC subcloud group restore: Keystone token expired for subclouds that were in the line to be restored after the first 100 ones get restored
CMD: 2023-11-
Number of subclouds under testing:
dcmanager subcloud-group list-subclouds upgrade-group | grep subcloud | wc -l
106
Severity
<Major: System/Feature is usable but degraded>
Note:
Subcloud group restore with more than 100 subclouds is affected only.
The issue is only reproducible if the restore of the first 100 subclouds take more than 01h00. The likelihood to reproduce this issue for subclouds running previous release 21.12 is higher than we have for 22.12 as the restore operation for 21.12 is much slower.
The issue was reproduced twice. Here it is the other logs timestamp we got from another test run:
restore command timestamp:
2023-11-
Keystone related exception:
2023-11-25 00:09:29.257 151476 INFO dcmanager.
2023-11-25 00:09:29.270 151476 INFO dccommon.
2023-11-25 00:09:31.193 151476 INFO dccommon.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.308 151476 ERROR dcmanager.
2023-11-25 00:09:31.323 151476 ERROR dcmanager.
Steps to Reproduce
Deploy systemcontroller and subclouds with 21.12
Upgrade systemcontroller
Create backup for subclouds
Upgrade subcloud to 22.12
Restore subclouds back to 21.12
Expected Behavior
All subclouds should be restored successfully
Actual Behavior
The subclouds exceeding the first 100 ones, failed to have the keystone token validaded.
Reproducibility
100 %
2 out of 2 attempts
System Configuration
DC
Load info (eg: 2022-03-
cat /etc/build.info
SW_VERSION="22.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
SRC_BUILD_
BUILD_BY="jenkins"
BUILD_NUMBER="50"
BUILD_HOST=
BUILD_DATE=
Last Pass
This issue is only reproduced when we have ALL 100 subclouds taking more than one hour to complete.
Alarms
no alarms
Test Activity
Regression Testing
Workaround
Create a subcloud group for restore operation with a maximum of 100 subclouds
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.9.0 stx.distcloud |
Changed in starlingx: | |
assignee: | nobody → Victor Romano (vgluzrom) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /distcloud/ +/902140
Review: https:/