during K8s upgrade it's possible for needed images to be garbage collected
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Chris Friesen |
Bug Description
Brief Description
SX subcloud K8s upgrade timeout in control plane 1.22.5
Retry worked fine and this issue is only found in one subcloud in the same DC lab another subcloud passed
Severity
Major
Steps to Reproduce
dcmanager kube-upgrade-
dcmanager kube-upgrade-
Expected Behavior
K8s upgrade should be successful
Actual Behavior
[sysadmin@
subcloud2 3 failed kube applying vim kube upgrade strategy: (kube-upgrade) Vim strategy apply failed. Unexpected State: abort-failed. 2023-11-16 19:19:32.947260 2023-11-16 19:40:32.497926
Reproducibility
Intermittent
System Configuration
Distributed Cloud system controller
Bob Church analyzed the logs and found that some of the control plane images used by the static pods had been garbage-collected by kubelet just before they were actually needed. Because static pods can't use Secrets, we were unable to set the imagePullSecrets field on these pods.
The solution is to disable garbage-collecting of images prior to pulling the new images, and re-enabling it after the upgrade is complete.
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.9.0 stx.containers |
Changed in starlingx: | |
assignee: | nobody → Chris Friesen (cbf123) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /config/ +/901816
Review: https:/