2020-07-01 00:28:09 |
Tee Ngo |
description |
Brief Description
-----------------
During subcloud upgrade, failure can occur during remote install or data migration step for reasons such as misconfigurations, temporary network glitch. If this occurs, the subcloud deploy_status is set to install-failed/data-migration-failed respectively. This status change results in subcloud audit being skipped and the subcloud can never be deleted.
Severity
--------
Critical
Steps to Reproduce
------------------
With the system controller running N+1 load and at least one subcloud running N load
- Create a subcloud upgrade strategy using the command "dcmanager upgrade-strategy create <subcloud-name>"
- Apply the upgrade strategy using the command "dcmanager upgrade-strategy apply"
- Induce a failure during upgrade simplex step by temporarily removing route to the subcloud bootstrap IP
- View the subcloud detail using the command "dcmanager subcloud show <subcloud-name>"
Expected Behavior
------------------
The affected subcloud should be listed as "offline".
Actual Behavior
----------------
The affected subcloud would alaywas be listed as "online" as dcmanager subcloud audit skips auditing any subclouds with deploy_status not equal to 'deploy-failed', 'deploying' or 'complete' status.
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Distributed Cloud
Branch/Pull Time/Commit
-----------------------
Jun 30th master build
Last Pass
---------
I don't think there an existing test case for this specific scenario.
Timestamp/Logs
--------------
N/A, there are no error logs. This is a design oversight.
Test Activity
-------------
Developer Testing
Workaround
----------
Change the deploy_status of the affected subcloud to 'complete' in the database, wait up to 20s for the subcloud audit to resume auditing the affected subcloud. |
Brief Description
-----------------
During subcloud upgrade, failure can occur during remote install or data migration step for reasons such as misconfigurations, temporary network glitch. If this occurs, the subcloud deploy_status is set to install-failed/data-migration-failed respectively. This status change results in subcloud audit being skipped and the subcloud can never be deleted.
Severity
--------
Major
Steps to Reproduce
------------------
With the system controller running N+1 load and at least one subcloud running N load
- Create a subcloud upgrade strategy using the command "dcmanager upgrade-strategy create <subcloud-name>"
- Apply the upgrade strategy using the command "dcmanager upgrade-strategy apply"
- Induce a failure during upgrade simplex step by temporarily removing route to the subcloud bootstrap IP
- View the subcloud detail using the command "dcmanager subcloud show <subcloud-name>"
Expected Behavior
------------------
The affected subcloud should be listed as "offline".
Actual Behavior
----------------
The affected subcloud would alaywas be listed as "online" as dcmanager subcloud audit skips auditing any subclouds with deploy_status not equal to 'deploy-failed', 'deploying' or 'complete' status.
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Distributed Cloud
Branch/Pull Time/Commit
-----------------------
Jun 30th master build
Last Pass
---------
I don't think there an existing test case for this specific scenario.
Timestamp/Logs
--------------
N/A, there are no error logs. This is a design oversight.
Test Activity
-------------
Developer Testing
Workaround
----------
Change the deploy_status of the affected subcloud to 'complete' in the database, wait up to 20s for the subcloud audit to resume auditing the affected subcloud. |
|