[1.29+] test_service_cidr_expansion leaves pods in CrashLoopBackOff

Bug #2045696 reported by George Kraft
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charmed Kubernetes Testing
New
Undecided
Unassigned
Kubernetes Control Plane Charm
New
Undecided
Unassigned

Bug Description

After running test_service_cidr_expansion with 1.29 / Ops charms, pods that talk to the Kubernetes API are unable to run:

$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-69486b4665-shsd6 0/1 CrashLoopBackOff 24 (3s ago) 3h56m
calico-node-4sl6s 1/1 Running 0 3h56m
calico-node-lkldc 1/1 Running 2 (3h52m ago) 3h55m
calico-node-wswdt 1/1 Running 3 (3h51m ago) 3h52m
calico-node-xtdnf 1/1 Running 3 (3h44m ago) 3h46m
calico-node-zh68f 1/1 Running 2 (3h52m ago) 3h53m
coredns-bddfd76d7-tgm7g 1/1 Running 0 3h56m
kube-state-metrics-78c475f58b-6p4tq 1/1 Running 2 (3h51m ago) 3h56m
metrics-server-v0.6.3-69d7fbfdf8-z4p69 2/2 Running 0 3h56m

$ kubectl logs -n kube-system calico-kube-controllers-69486b4665-shsd6
...
2023-12-05 21:20:56.222 [ERROR][1] main.go 297: Received bad status code from apiserver error=Get "https://10.152.183.1:443/healthz": x509: certificate is valid for 127.0.0.1, 10.246.154.14, 10.152.182.1, not 10.152.183.1 status=0
2023-12-05 21:20:56.222 [INFO][1] main.go 313: Health check is not ready, retrying in 2 seconds with new timeout: 16s

The service CIDR was expanded. The kubernetes-control-plane charm updated its certificate to have 10.152.182.1 as the kubernetes service IP. However, the actual IP still in use is 10.152.183.1 which is no longer included in the certificate. This causes x509 validation to fail.

The old reactive code handled service cidr expansions by deleting the kubernetes service[1] and restarting impacted services[2]. The new ops code does not do this.

[1]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/8769db394bf377a03ce94066307ecf831b88ad17/reactive/kubernetes_control_plane.py#L2510
[2]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/8769db394bf377a03ce94066307ecf831b88ad17/reactive/kubernetes_control_plane.py#L2530-L2547

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.