StarlingX

Bug #1976109
Comment #1

Comment 1 for bug 1976109

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-05-30: Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/840216
Committed: https://opendev.org/starlingx/distcloud/commit/116c119541b2b7ac85b475d5b0e1c7c4292ea2fa
Submitter: "Zuul (22348)"
Branch: master

commit 116c119541b2b7ac85b475d5b0e1c7c4292ea2fa
Author: BoYuan Chang <email address hidden>
Date: Mon May 2 11:18:41 2022 -0500

Ensure one patching worker thread per subcloud

    Remove the worker thread creation in STRATEGY_STATE_UPDATING_PATCHES
    state since it's already done in STRATEGY_STATE_INITIAL state. This
    code flaw was revealed from testing patch orchestration of large
    max_parallel_subclouds size. The patch orch thread loops through the
    list of subclouds of the current step and process each one of them
    every 10s. When the batch size is large, the subcloud state
    retrieved at the beginning of the loop and stored in memory may be
    stale for a subcloud by the time it processes that subcloud.
    Try/Catch statement is also added to pre_check to prevent state stuck
    indefinitely when failed to obtain the health report from sysinv.

Test Plan:

    1. Ensure all subclouds are free of mgmt affecting alarms. Execute patch
       orchestration of large max_parallel_subclouds size and verify that it
       works without any failure(s) due to "Patch in-progress" alarm.

2. Check when the get system health report timed out the subcloud will
fail rather than hanging indefinitely.

    Closes-Bug: 1971172
    Closes-Bug: 1976109
    Signed-off-by: BoYuan Chang <email address hidden>
    Change-Id: Ib70f00228ebd181d175c09a462b95077b0a8218b