One Kuberentes master of two failed to get the proper certs from vault

Bug #1867645 reported by Alexander Balderson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
New
Undecided
Unassigned

Bug Description

One kuberentes master unit failed to start the kube-controller-manager service; the journalctl is showing an unauthorized:

kube-controller-manager.daemon[10232]: unable to load configmap based request-header-client-ca-file: Unauthorized

Attaching the crashdump for the deploy

Revision history for this message
Alexander Balderson (asbalderson) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

Please provide reproduction steps.

Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
Km olsen (km-phones) wrote :
Download full text (6.0 KiB)

Same problem on kubernetes-master/1:

unable to load configmap based request-header-client-ca-file: Unauthorized

Cannot reproduce as it appears to be part of day to day running of Juju deployed CDK, perhaps as part of general snap upgrades?

(I am running vault instead of easyrsa)

juju status

--snipped--
kubernetes-master/0* active idle 8 192.168.70.25 6443/tcp Kubernetes master running.

kubernetes-master/1 blocked idle 9 192.168.70.12 6443/tcp Stopped services: kube-controller-manager

On kube-master/1
snap list
Name Version Rev Tracking Publisher Notes
cdk-addons 1.17.7 2655 1.17/stable canonical✓ in-cohort
core 16-2.45 9289 latest/stable canonical✓ core
kube-apiserver 1.17.7 1683 1.17/stable canonical✓ in-cohort
kube-controller-manager 1.17.7 1587 1.17/stable canonical✓ in-cohort
kube-proxy 1.17.7 1579 1.17/stable canonical✓ classic,in-cohort
kube-scheduler 1.17.7 1558 1.17/stable canonical✓ in-cohort
kubectl 1.17.7 1544 1.17/stable canonical✓ classic,in-cohort

juju debug-log -i unit-kubernetes-master-1 --replay --tail
--snipped--
unit-kubernetes-master-1: 09:16:05 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: reactive/kubernetes_master.py:2229:send_cluster_tag
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: reactive/kubernetes_master.py:2450:setup_keystone_user
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: reactive/kubernetes_master.py:2470:keystone_config
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: reactive/vault_kv.py:40:clear_ready
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/openstack-integration/requires.py:84:remove_ready:openstack
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/http/provides.py:11:joined:kube-api-endpoint
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/aws-integration/requires.py:106:remove_ready:aws
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/vault-kv/requires.py:32:broken:vault-kv
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/azure-integration/requires.py:114:remove_ready:azure
unit-kubernetes-master-1: 09:16:06 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/kubernetes-cni/provides.py:10:changed:cni
unit-kubernetes-master-1: 09:16:07 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations/gcp-integration/requires.py:116:remove_ready:gcp
unit-kubernetes-master-1: 09:16:07 INFO unit.kubernetes-master/1.juju-log Invoking reactive handler: hooks/relations...

Read more...

Revision history for this message
Km olsen (km-phones) wrote :

This seemed to fix the problem on the juju server running kubernetes-master/1:

Whether or not all the steps (like upgrades) is related, I am not sure.

1. Noticed that juju models were in 'suspended' mode
2. updated model credentials so models were 'available'

Kubernetes-master/1 still blocked

3. juju ssh (Kubernetes-master/1)
apt update
apt upgrade
reboot

Fixed!

As the rest of the servers in my deployment have not been updated, I think the reboot must have fixed the problem on its own and probably has nothing to do with the model credentials either.

Revision history for this message
George Kraft (cynerva) wrote :

What revision of the kubernetes-master charm are you running?

This appears very similar to a bug that we believe we fixed in Charmed Kubernetes 1.18+ck1 / kubernetes-master rev 850. That bug is here: https://bugs.launchpad.net/charm-kubernetes-master/+bug/1869388

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Seeing this issue after upgrading a EasyRSA cluster, strange that the reboot/upgrade seems to resolve this issue.

I encountered this issue after upgrading from 2.8 to 2.9 on the controller/model.

Kubernetes master revision: 1079

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

Attaching small crashdump (Model appid-5795-qua-01) has the broken units - Kubernetes master 0 was rebooted, rest of the masters were not yet "Fixed"

George Kraft (cynerva)
Changed in charm-kubernetes-master:
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.