After subcloud initial install, alarm 250.001 "controller-0 Configuration is out-of-date" is raised

Bug #1955744 reported by Jerry Sun
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jerry Sun

Bug Description

Brief Description
-----------------
After subcloud initial install, alarm 250.001 "controller-0 Configuration is out-of-date" is raised. Investigation revealed that the clearing of the alarm was not done due to kube-apiserver being restarted while it was needed.

Severity
--------
Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
This issue is intermittent. It is observed when installing a subcloud

Expected Behavior
------------------
No alarm 250.001 present

Actual Behavior
----------------
alarm 250.001 present

Reproducibility
---------------
Intermittent

System Configuration
--------------------
one node system

Branch/Pull Time/Commit
-----------------------
2021-12-25

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/822922

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.7.0 / medium - intermittent alarm issue

tags: added: stx.7.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Jerry Sun (jerry-sun-u)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by "Jerry Sun <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config/+/822922
Reason: abandoning due to concerns raised during review

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/823863

Ghada Khalil (gkhalil)
tags: added: stx.6.0
tags: added: stx.cherrypickneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Discussed with Jerry to include the fix in the r/stx.6.0 branch as well since the frequency of occurrence is higher than initially thought - 20% of subclouds report the alarm on a large system.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/823863
Committed: https://opendev.org/starlingx/config/commit/f671ec5ab64aec1317f326c1da7686b21df4513a
Submitter: "Zuul (22348)"
Branch: master

commit f671ec5ab64aec1317f326c1da7686b21df4513a
Author: Jerry Sun <email address hidden>
Date: Fri Jan 7 13:05:12 2022 -0500

    Revert "Apply runtime class for kube-apiserver when installing ca cert"

    This reverts commit 3bf25e423c9408d59874fd49d833ae26ba8162f5.
    There is an issue when installing subclouds where kubeadm tries
    to execute kubernetes commands while kube-apiserver is restarting.
    The cause of the restart is executing the runtime class introduced
    in the reverted commit. The issue is intermittent.

    Change-Id: I332ea9257de3fdfa327c5405cb45bb414b422c23
    Closes-Bug: 1955744
    Signed-off-by: Jerry Sun <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (r/stx.6.0)

Fix proposed to branch: r/stx.6.0
Review: https://review.opendev.org/c/starlingx/config/+/824011

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.6.0)

Reviewed: https://review.opendev.org/c/starlingx/config/+/824011
Committed: https://opendev.org/starlingx/config/commit/aeec7da4310307f21af848360f2c958c74eadb70
Submitter: "Zuul (22348)"
Branch: r/stx.6.0

commit aeec7da4310307f21af848360f2c958c74eadb70
Author: Jerry Sun <email address hidden>
Date: Fri Jan 7 13:05:12 2022 -0500

    Revert "Apply runtime class for kube-apiserver when installing ca cert"

    This reverts commit 3bf25e423c9408d59874fd49d833ae26ba8162f5.
    There is an issue when installing subclouds where kubeadm tries
    to execute kubernetes commands while kube-apiserver is restarting.
    The cause of the restart is executing the runtime class introduced
    in the reverted commit. The issue is intermittent.

    Change-Id: I332ea9257de3fdfa327c5405cb45bb414b422c23
    Closes-Bug: 1955744
    Signed-off-by: Jerry Sun <email address hidden>
    (cherry picked from commit f671ec5ab64aec1317f326c1da7686b21df4513a)

Ghada Khalil (gkhalil)
tags: added: in-r-stx60
removed: stx.cherrypickneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.