Remove the worker thread creation in STRATEGY_STATE_UPDATING_PATCHES
state since it's already done in STRATEGY_STATE_INITIAL state. This
code flaw was revealed from testing patch orchestration of large
max_parallel_subclouds size. The patch orch thread loops through the
list of subclouds of the current step and process each one of them
every 10s. When the batch size is large, the subcloud state
retrieved at the beginning of the loop and stored in memory may be
stale for a subcloud by the time it processes that subcloud.
Try/Catch statement is also added to pre_check to prevent state stuck
indefinitely when failed to obtain the health report from sysinv.
Test Plan:
1. Ensure all subclouds are free of mgmt affecting alarms. Execute patch orchestration of large max_parallel_subclouds size and verify that it
works without any failure(s) due to "Patch in-progress" alarm.
2. Check when the get system health report timed out the subcloud will
fail rather than hanging indefinitely.
Reviewed: https:/ /review. opendev. org/c/starlingx /distcloud/ +/840216 /opendev. org/starlingx/ distcloud/ commit/ 116c119541b2b7a c85b475d5b0e1c7 c4292ea2fa
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 116c119541b2b7a c85b475d5b0e1c7 c4292ea2fa
Author: BoYuan Chang <email address hidden>
Date: Mon May 2 11:18:41 2022 -0500
Ensure one patching worker thread per subcloud
Remove the worker thread creation in STRATEGY_ STATE_UPDATING_ PATCHES STATE_INITIAL state. This parallel_ subclouds size. The patch orch thread loops through the
state since it's already done in STRATEGY_
code flaw was revealed from testing patch orchestration of large
max_
list of subclouds of the current step and process each one of them
every 10s. When the batch size is large, the subcloud state
retrieved at the beginning of the loop and stored in memory may be
stale for a subcloud by the time it processes that subcloud.
Try/Catch statement is also added to pre_check to prevent state stuck
indefinitely when failed to obtain the health report from sysinv.
Test Plan:
1. Ensure all subclouds are free of mgmt affecting alarms. Execute patch
orchestration of large max_parallel_ subclouds size and verify that it
works without any failure(s) due to "Patch in-progress" alarm.
2. Check when the get system health report timed out the subcloud will
fail rather than hanging indefinitely.
Closes-Bug: 1971172 1d175c09a462b95 077b0a8218b
Closes-Bug: 1976109
Signed-off-by: BoYuan Chang <email address hidden>
Change-Id: Ib70f00228ebd18