1.25 k8s CP sometimes stays stuck waiting for auth-webhook tokens

Bug #1988206 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
New
Undecided
Unassigned

Bug Description

Occasionally in the SQA lab we see k8s deployments get stuck where:

k8s-cp - Waiting for auth-webhook tokens
k8s-worker - Waiting for cluster credentials.
calico - Waiting to retry Calico node configuration

poking about in the logs, I see that the k8s-cp "cdk.master.auth-webhook.log" that it is in a loop waiting for kube-apiserver to start, but i dont see any logs for the kube-apiserver. Interestingly the logs also dont have a syslog. this could be a result of running on aws though?

testrun: https://solutions.qa.canonical.com/testruns/testRun/0356316a-5ac7-4afa-a7fc-50240f34fa94
crashdump: https://oil-jenkins.canonical.com/artifacts/0356316a-5ac7-4afa-a7fc-50240f34fa94/generated/generated/kubernetes-aws/juju-crashdump-kubernetes-aws-2022-08-30-17.06.28.tar.gz

Revision history for this message
Alexander Balderson (asbalderson) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

I'm not sure why var/log/syslog is missing from the crashdumps. I've seen that sporadically. I was able to find the kube-apiserver journal in kubernetes-control-plane_0/debug-20220830170735.tar.gz/kubernetes-master-services/kube-apiserver-journal. In there, I see kube-apiserver repeatedly crashed with:

kube-apiserver.daemon[184714]: Error: invalid argument "CSIMigrationAWS=false" for "--feature-gates" flag: cannot set feature gate CSIMigrationAWS to false, feature is locked to true
systemd[1]: snap.kube-apiserver.daemon.service: Main process exited, code=exited, status=1/FAILURE

The waiting status messages on kubernetes-control-plane, kubernetes-worker, and calico are all symptoms of the Kubernetes API being unavailable, because kubernetes-control-plane tried to disable a feature gate that can no longer be disabled.

Looks like this is the same issue as https://bugs.launchpad.net/bugs/1988186. I'm marking this one as a duplicate since we're already actively working the bug there.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.