Comment 1 for bug 2009515

Revision history for this message
George Kraft (cynerva) wrote :

This is a race condition between build_kubeconfig, start_control_plane, and configure_apiserver.

In build_kubeconfig, a new client kubeconfig was written[1] with the new CA. Later in build_kubeconfig, it tried to fetch kube-scheduler's token from a secret[2]. Fetching the secret failed:

2023-03-04 02:53:50 INFO unit.kubernetes-control-plane/0.juju-log server.go:316 certificates:55: Executing ['kubectl', '--kubeconfig=/root/.kube/config', 'get', 'secrets', '-n', 'kube-system', '--field-selector', 'type=juju.is/token-auth', '-o', 'json']
2023-03-04 02:53:50 WARNING unit.kubernetes-control-plane/0.certificates-relation-changed logger.go:60 E0304 02:53:50.359454 135532 memcache.go:238] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": x509: certificate signed by unknown authority
2023-03-04 02:53:50 WARNING unit.kubernetes-control-plane/0.certificates-relation-changed logger.go:60 E0304 02:53:50.365873 135532 memcache.go:238] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": x509: certificate signed by unknown authority
2023-03-04 02:53:50 WARNING unit.kubernetes-control-plane/0.certificates-relation-changed logger.go:60 E0304 02:53:50.369305 135532 memcache.go:238] couldn't get current server API group list: Get "https://127.0.0.1:6443/api?timeout=32s": x509: certificate signed by unknown authority
2023-03-04 02:53:50 WARNING unit.kubernetes-control-plane/0.certificates-relation-changed logger.go:60 Unable to connect to the server: x509: certificate signed by unknown authority

This is because the client kubeconfig had the new CA, but kube-apiserver had not been restarted yet, so it was still serving with a server certificate from the old CA. Since build_kubeconfig could not obtain the secret, it skipped writing a new kubeconfig for kube-scheduler.

During start_control_plane, the charm restarted kube-scheduler to pick up the new CA. However, since no new kubeconfig had been written for kube-scheduler, it started with the old kubeconfig instead, still using the old CA.

Later, configure_apiserver ran, which restarted kube-apiserver with the new server certificate. This fixed the charm's ability to get secrets, but the damage had already been done. Kube-scheduler was never restarted again.

[1]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/d9f276f1e54c22f3f5d739c82f1a3b5894d140c7/reactive/kubernetes_control_plane.py#L2151-L2157
[2]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/d9f276f1e54c22f3f5d739c82f1a3b5894d140c7/reactive/kubernetes_control_plane.py#L2198-L2206