Activity log for bug #1885817

Date Who What changed Old value New value Message
2020-07-01 00:22:53 Tee Ngo bug added bug
2020-07-01 00:28:09 Tee Ngo description Brief Description ----------------- During subcloud upgrade, failure can occur during remote install or data migration step for reasons such as misconfigurations, temporary network glitch. If this occurs, the subcloud deploy_status is set to install-failed/data-migration-failed respectively. This status change results in subcloud audit being skipped and the subcloud can never be deleted. Severity -------- Critical Steps to Reproduce ------------------ With the system controller running N+1 load and at least one subcloud running N load - Create a subcloud upgrade strategy using the command "dcmanager upgrade-strategy create <subcloud-name>" - Apply the upgrade strategy using the command "dcmanager upgrade-strategy apply" - Induce a failure during upgrade simplex step by temporarily removing route to the subcloud bootstrap IP - View the subcloud detail using the command "dcmanager subcloud show <subcloud-name>" Expected Behavior ------------------ The affected subcloud should be listed as "offline". Actual Behavior ---------------- The affected subcloud would alaywas be listed as "online" as dcmanager subcloud audit skips auditing any subclouds with deploy_status not equal to 'deploy-failed', 'deploying' or 'complete' status. Reproducibility --------------- Reproducible System Configuration -------------------- Distributed Cloud Branch/Pull Time/Commit ----------------------- Jun 30th master build Last Pass --------- I don't think there an existing test case for this specific scenario. Timestamp/Logs -------------- N/A, there are no error logs. This is a design oversight. Test Activity ------------- Developer Testing Workaround ---------- Change the deploy_status of the affected subcloud to 'complete' in the database, wait up to 20s for the subcloud audit to resume auditing the affected subcloud. Brief Description ----------------- During subcloud upgrade, failure can occur during remote install or data migration step for reasons such as misconfigurations, temporary network glitch. If this occurs, the subcloud deploy_status is set to install-failed/data-migration-failed respectively. This status change results in subcloud audit being skipped and the subcloud can never be deleted. Severity -------- Major Steps to Reproduce ------------------ With the system controller running N+1 load and at least one subcloud running N load - Create a subcloud upgrade strategy using the command "dcmanager upgrade-strategy create <subcloud-name>" - Apply the upgrade strategy using the command "dcmanager upgrade-strategy apply" - Induce a failure during upgrade simplex step by temporarily removing route to the subcloud bootstrap IP - View the subcloud detail using the command "dcmanager subcloud show <subcloud-name>" Expected Behavior ------------------ The affected subcloud should be listed as "offline". Actual Behavior ---------------- The affected subcloud would alaywas be listed as "online" as dcmanager subcloud audit skips auditing any subclouds with deploy_status not equal to 'deploy-failed', 'deploying' or 'complete' status. Reproducibility --------------- Reproducible System Configuration -------------------- Distributed Cloud Branch/Pull Time/Commit ----------------------- Jun 30th master build Last Pass --------- I don't think there an existing test case for this specific scenario. Timestamp/Logs -------------- N/A, there are no error logs. This is a design oversight. Test Activity ------------- Developer Testing Workaround ---------- Change the deploy_status of the affected subcloud to 'complete' in the database, wait up to 20s for the subcloud audit to resume auditing the affected subcloud.
2020-07-01 00:52:03 Ghada Khalil tags stx.distcloud stx.update
2020-07-02 11:08:32 Bart Wensley bug added subscriber Bart Wensley
2020-07-06 14:24:23 Bart Wensley starlingx: assignee Al Bailey (albailey1974)
2020-07-07 17:25:10 OpenStack Infra starlingx: status New In Progress
2020-07-08 15:36:23 Ghada Khalil bug added subscriber Allain Legacy
2020-07-08 21:19:58 OpenStack Infra starlingx: status In Progress Fix Released
2020-07-13 22:08:38 Ghada Khalil tags stx.distcloud stx.update stx.5.0 stx.distcloud stx.update
2020-07-13 22:09:20 Ghada Khalil starlingx: importance Undecided Medium
2021-06-16 12:25:49 OpenStack Infra tags stx.5.0 stx.distcloud stx.update in-f-centos8 stx.5.0 stx.distcloud stx.update
2021-06-16 12:25:50 OpenStack Infra bug watch added https://github.com/kubernetes-client/python/issues/765