Kubernetes Control Plane Charm

Bug #1906732
Comment #2

Comment 2 for bug 1906732

Revision history for this message

Kevin W Monroe (kwmonroe) wrote on 2020-12-07: Re: kubernetes-master charm versions newer than 850 not registering all worker nodes

Your first pastebin link has some data that seems weird to me. The node list includes:

node02ob100 Ready <none> 37m v1.17.14

From the same output, that node is machine 5:

5 started 172.27.100.106 node02ob100 bionic

But machine 5 is k8s-master/1:

kubernetes-master/1* active idle 5 172.27.100.106

I'm confused why a k8s-master would show up as a cluster node since kubelet doesn't run on masters by default. A few questions for you:

- Is the failing deployment re-using old machines? I ask because your second pastebin (the successful one) shows 172.27.100.106 as the IP address for k8s-worker-ref/0, which would have been a valid node for a different deployment.

- Did you see this failure during an upgrade or new deployment of the latest stable charms? If the former, can you provide the charm revs that you started with?

- Are you attempting to run kubelet on master units? If so, what is your process for installing/configuring kubelet?

- If you have a failed env available, ssh to the available nodes (3/9 in your original failure case) and check the "server:" entry in /root/cdk/kubeconfig. Is it pointing to the expected load balancer / master address?

- In the failed env, where did you run "kubectl get no" from -- within the cluster or a separate management workstation? The ~/.kube/config file does change with the current stable charms, so it's possible you have an old kubeconfig that is pointing to the wrong cluster and/or auth mechanism.

Fwiw, I deployed the stable charms with 1.19 and tainted/labelled workers with no effect on them being recognized as cluster nodes. I'll keep poking through the crashdump for more clues, but answers to the above would help narrow this down.