Restore Operation failed with Backup done after K8s upgrade to 1.24
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Chris Friesen |
Bug Description
System Restore failed on AIO-DX (ip_31-32_k8s) using Backup after upgrading K8s (1.23.1 to 1.24.4).
According to Chris F.,
/var/log/
2022-11-
Severity
Major
Steps to Reproduce
Verify the System Controller is healthy and running kubelet version 1.23.1.
Controller-0 was the Active Controller initially
Create and apply the kube upgrade strategy
sw-manager kube-upgrade-
sw-manager kube-upgrade-
Watch progress - "sw-manager kube-upgrade-
After K8s Upgrade completed, perform a Backup.
Re-Initialize Debian with the same load, and attempt to Restore from the Platform_Backup
Expected Behavior
System is Restored with K8s Upgraded, and All Pods are running
Actual Behavior
[wait-
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition (likely kubeapi-server exited)
Changed in starlingx: | |
assignee: | nobody → Chris Friesen (cbf123) |
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.8.0 stx.containers stx.update |
Reviewed: https:/ /review. opendev. org/c/starlingx /integ/ +/866804 /opendev. org/starlingx/ integ/commit/ 15db2d6990a717f 50cb7611b1e4ee7 6f3c626af7
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 15db2d6990a717f 50cb7611b1e4ee7 6f3c626af7
Author: Chris Friesen <email address hidden>
Date: Tue Dec 6 14:33:08 2022 -0600
clean up feature gates on k8s upgrade
During a K8s feature upgrade from 1.23 to 1.24 we need to remove =false" feature gate from kube-apiserver.
the "RemoveSelfLink
We had previously handled updating the kubeadm configmap, which kube_extra_ config_ bootstrap. yaml file.
was sufficient to handle the running system. However, in order
to properly handle backup and restore after the K8s upgrade to
1.24 (and just for general tidiness) we need to also remove the
feature gate from the saved service parameters and from the
last_
It's possible that there are other kube-apiserver feature gates
specified by the end user, this adds a bit of complexity to the
code.
Test Plan: extra_config_ bootstrap. yaml have been =false" feature gate removed.
PASS: Test python script and bash script in isolation.
PASS: End-to-end test with k8s upgrade and backup/restore with
manual modification of service parameters and yaml file.
Tested with AIO-DX, AIO-SX unoptimised restore, and
AIO-SX optimised restore.
PASS: K8s upgrade using the new code, ensure service parameter
and last_kube_
updated with "RemoveSelfLink
Closes-Bug: 1999095 5ab0f480f9f9c01 78757521038
Signed-off-by: Chris Friesen <email address hidden>
Change-Id: I82ecd821d4e174