System Controller Controller-0 in failed state after K8s upgrade

Bug #1998629 reported by Ramesh kumar Sivanandam
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Ramesh kumar Sivanandam

Bug Description

Brief Description:
Controller-0 went to failed state after upgrading K8s (1.23.1 to 1.24.4).
According to Chris F., K8s itself was good - it successfully upgraded to kubelet 1.24. So we need to understand the reason Controller-0 was not able to come up properly.

Severity:
Major

Steps to Reproduce:
Verify the System Controller is healthy and running kubelet version 1.23.1.
Controller-0 was the Active Controller initially
Create and apply the kube upgrade strategy
sw-manager kube-upgrade-strategy create --to-version v1.24.4
sw-manager kube-upgrade-strategy apply
Watch progress - "sw-manager kube-upgrade-strategy show"

Expected Behavior:
K8s Upgrade completed:

Actual Behavior:
K8s Upgrade timed out (likely because of Controller-0 in a bad state)

Reproducibility:
1 out of 1

Test Activity:
Scalability Testing

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/866494

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by "Ramesh kumar Sivanandam <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/866494

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/866498

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/866498
Committed: https://opendev.org/starlingx/integ/commit/b4af71310946f0096b2dc2eed0671c532a9c8f75
Submitter: "Zuul (22348)"
Branch: master

commit b4af71310946f0096b2dc2eed0671c532a9c8f75
Author: Ramesh Kumar Sivanandam <email address hidden>
Date: Fri Dec 2 13:25:15 2022 -0500

    Remove KUBE_ALLOW_PRIV from kubelet.service

    KUBE_ALLOW_PRIV results in trying to run kubelet with the
    "--allow-privileged=true" flag, which has not been supported by
    kubelet since K8s 1.15 that in turn causes the kubelet to error out.

    Default kubelet.service contains KUBE_ALLOW_PRIV invalid setting due
    to the fact that the upstream kubernetes-contrib package hasn't been
    updated in years.

    This change removes KUBE_ALLOW_PRIV from kubelet.service in the
    kubernetes-unversioned package.

    Closes-Bug: 1998629

    Test-plan:
    PASS - Install AIO-SX and ensure that
           /lib/systemd/system/kubelet.service doesn't contain
           "$KUBE_ALLOW_PRIV"

    Signed-off-by: Ramesh Kumar Sivanandam <email address hidden>
    Change-Id: Ide0f9c8db180908cc9c6528f474214966655be95

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.containers stx.update
Changed in starlingx:
assignee: nobody → Ramesh kumar Sivanandam (rsivanan)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.