kubelet-config custom parameters are missing after k8s upgrade

Bug #2012975 reported by Jim Gauld
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
kubelet-config custom parameters are missing after k8s upgrade.

The custom kubelet-config Garbage Collection related parameters are missing after a kubernetes version upgrade. The command 'system kube-config-kubelet' that should also reconfigure kubelet-config parameters has no effect after the upgrade. This means that nodes are incorrectly configured and could run into critical filesystem usage issues when thresholds are exceeded.

Root cause:
During the k8s upgrade procedure, the old kubelet-config-<version> ConfigMap is left hanging around.
In historical releases we also had a kubelet_override.yaml file used to override kubelet parameters. That override mechanism had a significant problem in that it requires complete specification of all parameters be present. This override mechanism is no longer needed. The old version of kubelet-config ConfigMap is no longer needed. Note there was also upstream change to remove the version from the name kubelet-config-<version>. The upgrade code we have to update the kubelet-config ConfigMap was failing since it didn't expect the multiple versions so this breaks the 'system kube-config-kubelet' command. Even though the kubelet_override.yaml is no longer used in Debian, there is still remaining packaging cruft, and the kube-config-kubelet mechanism is broken after an upgrade, and would have incorrect values if upgrading from 21.12.

Solution:
- remove usage of --config kubelet_override.yaml from 'kubeadm upgrade apply'
- remove the old kubelet-config ConfigMap after the upgrade apply

Severity
--------
Provide the severity of the defect.
Major. System is usable, but there is risk of becoming unstable due to exceeding resource limits.

Steps to Reproduce
------------------
Perform kubernetes version upgrade.

Write down the steps to reproduce the issue :
Example steps for AIO-SX (abridged):

// perform upgrade from k8s 1.19.13 to 1.20.9
fm alarm-list --mgmt_affecting
system health-query-kube-upgrade
system kube-upgrade-start v1.20.9 --force
system kube-upgrade-show
system kube-upgrade-download-images
system kube-upgrade-show
system kube-upgrade-networking
system kube-upgrade-show
system kube-host-upgrade controller-0 control-plane
system kube-upgrade-show
system kube-host-upgrade controller-0 kubelet
system kube-upgrade-complete
system kube-upgrade-delete

Expected Behavior
------------------
kubectl -n kube-system get configmaps -oname | grep -e kubelet-config
kubelet-config-1.20

See all parameters in config.yaml and in the ConfigMap.
grep -rs -e evictionHard: -e available -e nodefs -e imageGC /var/lib/kubelet/config.yaml
evictionHard:
  imagefs.available: 2Gi
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
imageGCHighThresholdPercent: 79
imageGCLowThresholdPercent: 75

kubectl -n kube-system get configmaps -oname --sort-by=.metadata.creationTimestamp | \
grep -e kubelet-config | xargs -i -r kubectl -n kube-system get {} -oyaml | \
grep -e evictionHard: -e available -e nodefs -e imageGC
    evictionHard:
      imagefs.available: 2Gi
      memory.available: 100Mi
      nodefs.inodesFree: 5%
      nodefs.available: 10%
    imageGCHighThresholdPercent: 79
    imageGCLowThresholdPercent: 75

Actual Behavior
----------------
grep -rs -e evictionHard: -e available -e nodefs -e imageGC /var/lib/kubelet/config.yaml
<when broken, this gives empty result, i.e., missing parameters>

kubectl -n kube-system get configmaps -oname | grep -e kubelet-config
kubelet-config-1.19
kubelet-config-1.20
<this has the older versions and the newly upgraded version>

Reproducibility
---------------
100%

System Configuration
--------------------
All configs.
- Standard, AIO-DX, AIO-SX
- System Controller and Subcloud nodes.

Branch/Pull Time/Commit
-----------------------
N/A.

Last Pass
---------
Never.
Day one bug with k8s upgrades.
Seen now that kubelet-config parameters are customized different than upstream default.

Timestamp/Logs
--------------
N/A.

Test Activity
-------------
Platform upgrade test. Evaluation.

Workaround
----------
Manually purge old kubelet-config ConfigMap.
re-issue command 'system kube-config-kubelet'.

Jim Gauld (jgauld)
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/881153

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/881154

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/881158

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by "Jim Gauld <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/881158
Reason: Will port this verbatim in older release. No longer making CentOS changes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/881153
Committed: https://opendev.org/starlingx/stx-puppet/commit/de57231375ef01646a62ae449db1bf6e1d31db76
Submitter: "Zuul (22348)"
Branch: master

commit de57231375ef01646a62ae449db1bf6e1d31db76
Author: Jim Gauld <email address hidden>
Date: Thu Apr 20 18:16:36 2023 -0400

    kubelet-config custom parameters are missing after k8s upgrade

    The kubernetes.pp class platform::kubernetes::upgrade_first_control_plane
    which does 'kubeadm upgrade apply' resulted in versioned kubelet-config
    ConfigMap. The pre-upgrade ConfigMap was left behind.

    Having multiple ConfigMap causes 'system kube-config-kubelet' to fail,
    so reconfiguration was broken.

    In historical releases, we had specified '--config
    /etc/kubernetes/kubelet_override.yaml', so the the kubelet garbage
    collection eviction parameters became incorrect post k8s upgrade,
    without a way to reconfigure.

    This update will purge all kubelet-config ConfigMap except the most
    recent. This occurs immediately following 'kubeadm upgrade apply' step.

    Testplan:
    PASS: AIO-SX perform k8s upgrade, run 'system kube-config-kubelet'.
          Verify only current version kubelet-config ConfigMap exists.

    Closes-Bug: 2012975
    Change-Id: I5e34299616690628267c07a744dc9923144e606d
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/881154
Committed: https://opendev.org/starlingx/integ/commit/725580bdd4747497906d77a987ce1d7696fc2f14
Submitter: "Zuul (22348)"
Branch: master

commit 725580bdd4747497906d77a987ce1d7696fc2f14
Author: Jim Gauld <email address hidden>
Date: Thu Apr 20 17:38:47 2023 -0400

    kubelet-config custom parameters are missing after k8s upgrade

    The kubelet_override.yaml file is removed since not needed. The option
    '--config /etc/kubernetes/kubelet_override.yaml' was removed from
    kubernetes.pp class platform::kubernetes::upgrade_first_control_plane.

    Testplan:
    PASS: Install AIO-SX, verify kubelet_override.yaml not present.

    Partial-Bug: 2012975
    Change-Id: I20baa84b88cc6dc6f738314628c79392a05f2c27
    Signed-off-by: Jim Gauld <email address hidden>

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
Ghada Khalil (gkhalil)
tags: added: stx.9.0 stx.containers stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.