Stale auth-webhook pid file causing the control plane to go down

Bug #2019070 reported by Diko Parvanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Triaged
Medium
Unassigned

Bug Description

juju logs

unit-xxx-kubernetes-master-1: 05:32:15 WARNING unit.xxx-kubernetes-master/1.update-status error: You must be logged in to the server (Unauthorized)
unit-xxx-kubernetes-master-1: 05:32:15 INFO unit.xxx-kubernetes-master/1.juju-log Executing ['kubectl', '--kubeconfig=/root/.kube/config', 'get', 'secrets', '-n', 'kube-system', '--field-selector', 'type=juju.is/token-auth', '-o', 'json']

journalctl -u snap.kube-controller-manager.daemon
Feb 01 07:37:19 juju-9f67a1-k8s-1-25 kube-controller-manager.daemon[47112]: I0201 07:37:19.516792 47112 dynamic_serving_content.go:111] Loaded a new cert/key pair for "serving-cert::/root/cdk/server.crt::/roo>
Feb 01 07:37:20 juju-9f67a1-k8s-1-25 kube-controller-manager.daemon[47112]: unable to load configmap based request-header-client-ca-file: Unauthorized
Feb 01 07:37:20 juju-9f67a1-k8s-1-25 systemd[1]: snap.kube-controller-manager.daemon.service: Main process exited, code=exited, status=1/FAILURE
Feb 01 07:37:20 juju-9f67a1-k8s-1-25 systemd[1]: snap.kube-controller-manager.daemon.service: Failed with result 'exit-code'.

In: /root/cdk/auth-webhook/auth-webhook.log
Error: Already running on PID 758 (or pid file 'auth-webhook.pid' is stale)

killing the PID in /root/cdk/auth-webhook/auth-webhook.pid makes everything start working again / the update-status hook configures everything properly

Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the report. I believe this error comes from gunicorn, which we run with the --pid argument here: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/54e02bbe6fc9bd37406574b9f1658f9def26d095/templates/cdk.master.auth-webhook.service#L17

Were you doing anything with the cluster when this happened? Do you have any idea how it got into a state with a stale pidfile?

Changed in charm-kubernetes-master:
importance: Undecided → Medium
status: New → Triaged
summary: - Stale auth-webhook causingthe control plane to go down
+ Stale auth-webhook pid file causing the control plane to go down
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.