Subclouds not going out-of-sync timely after applying patch on the System Controller

Bug #2047852 reported by Christopher de Oliveira Souza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Christopher de Oliveira Souza

Bug Description

Brief Description
-------------------------
Subclouds are not transitioning out-of-sync in a timely manner after the application of the patch on the System Controller.

The current behavior is that the patching endpoint initially goes into an "unknown" state, goes back to "in-sync" and then it eventually transitions to an "out-of-sync" state.

However, the expected behavior is that patching endpoint should promptly move to an "uknown"/"out-of-sync" right after applying the patch on the System Controller.

Severity
-------------------------
Major

Steps to Reproduce
-------------------------
1. Upload the test patch to the system controller
$ sw-patch --os-region-name SystemController upload <patch>

2. Use the sw-patch CLI command to apply the patch to the system controller:
$ sw-patch --os-region-name SystemController apply <patch>

// At this point, subcloud patching endpoint should go out-of-sync

Expected Behavior
-------------------------
Apply test patch on the System Controller
Subcloud Patching endpoint "unknown"
Subcloud Patching endpoint "out-of-sync"

Actual Behavior
-------------------------
Apply test patch on the System Controller
Subcloud Patching endpoint "unknown"
Subcloud Patching endpoint "in-sync"
Around 30 minutes later
Subcloud Patching endpoint "out-of-snyc"

Reproducibility
-------------------------
3 out of 5.

System Configuration
-------------------------
DC

Load info (eg: 2022-03-10_20-00-07)
--------------------------
22.12

Last Pass
--------------------------
N/A

Timestamp/Logs
N/A
--------------------------
Alarms
N/A

Test Activity
--------------------------
DC System Test

Workaround
--------------------------
None - Eventually, subclouds will transition to out-of-sync.

Changed in starlingx:
assignee: nobody → Christopher de Oliveira Souza (cdeolive)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/904523

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/904523
Committed: https://opendev.org/starlingx/distcloud/commit/153d6ba105b2ccb274f447db44aeaae78ffa757e
Submitter: "Zuul (22348)"
Branch: master

commit 153d6ba105b2ccb274f447db44aeaae78ffa757e
Author: Christopher de Oliveira Souza <email address hidden>
Date: Tue Jan 2 14:41:33 2024 -0300

    Increase the number of audit worker threads

    In this commit, the number of audit worker threads was increased
    from 100 to 150 in order to prevent the audit to fall behind.
    If the audit doesn't have enough threads to handle all the audit
    requests, it will cause a queue of audit requests waiting for a free
    thread to start auditing a subcloud. If this behavior continues,
    it will increase the size of the queue and the audit will fall behind,
    taking hours to empty the queue and stabilize again.

    Test Plan:
    PASS: Leave the audit running for one day and verify that the audit
    didn't fall behind.
    PASS: Upload and Apply a patch to systemcontroller and verify
    that the subclouds went to out-of-sync.

    Closes-bug: 2047852

    Change-Id: I4e129b56ca0abfaea1e9998e466a439c76b33187
    Signed-off-by: Christopher de Oliveira Souza <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.10.0 stx.distcloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.