Orchestrated subcloud upgrade failed at unlocking host step
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Gabriel Silva Trevisan |
Bug Description
Brief Description
-----------------
While performing the unlock_simplex step during a subcloud upgrade, the subcloud platform endpoints became unavailable, causing the step to fail.
Retrying the requests on the host unlock step may be required to work around the flakiness of platform services post data migration.
Severity
--------
Major
Steps to Reproduce
------------------
Prepare a distributed cloud system for orchestrated subcloud upgrade
Create and apply an upgrade strategy for a large number of simplex subclouds in parallel
Expected Behavior
------------------
The upgrade strategy completes successfully.
Actual Behavior
----------------
Upgrade failed for one subcloud at the unlocking-host step as it was unable to retrieve host data from the subcloud.
Reproducibility
---------------
Seen once
System Configuration
-------
Distributed cloud
Branch/Pull Time/Commit
-------
2022-01-19 18:42:01
Last Pass
---------
Not sure if an upgrade test with a large number of subclouds was previously performed for the stx/6.0 load.
Timestamp/Logs
--------------
2022-03-10 21:10:16.655 3236316 INFO dcmanager.
2022-03-10 21:10:16.667 3236316 INFO dcmanager.
2022-03-10 21:10:17.167 3236316 ERROR dccommon.
2022-03-10 21:10:17.170 3236316 WARNING dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
2022-03-10 21:10:17.171 3236316 ERROR dcmanager.
Test Activity
-------------
Feature Testing
Workaround
Delete the failed strategy, create a new one and reapply
description: | updated |
description: | updated |
description: | updated |
Changed in starlingx: | |
assignee: | nobody → Gabriel Silva Trevisan (g-trevisan) |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.7.0 |
Fix proposed to branch: master /review. opendev. org/c/starlingx /distcloud/ +/834067
Review: https:/