1.25 k8s CP sometimes stays stuck waiting for auth-webhook tokens
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes Control Plane Charm |
New
|
Undecided
|
Unassigned |
Bug Description
Occasionally in the SQA lab we see k8s deployments get stuck where:
k8s-cp - Waiting for auth-webhook tokens
k8s-worker - Waiting for cluster credentials.
calico - Waiting to retry Calico node configuration
poking about in the logs, I see that the k8s-cp "cdk.master.
testrun: https:/
crashdump: https:/
I'm not sure why var/log/syslog is missing from the crashdumps. I've seen that sporadically. I was able to find the kube-apiserver journal in kubernetes- control- plane_0/ debug-202208301 70735.tar. gz/kubernetes- master- services/ kube-apiserver- journal. In there, I see kube-apiserver repeatedly crashed with:
kube-apiserver. daemon[ 184714] : Error: invalid argument "CSIMigrationAW S=false" for "--feature-gates" flag: cannot set feature gate CSIMigrationAWS to false, feature is locked to true apiserver. daemon. service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: snap.kube-
The waiting status messages on kubernetes- control- plane, kubernetes-worker, and calico are all symptoms of the Kubernetes API being unavailable, because kubernetes- control- plane tried to disable a feature gate that can no longer be disabled.
Looks like this is the same issue as https:/ /bugs.launchpad .net/bugs/ 1988186. I'm marking this one as a duplicate since we're already actively working the bug there.