when waiting for certificates relation, status messages are unhelpful

Bug #1868541 reported by Alvaro Uria
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
Medium
Martin Kalcok
Kubernetes Worker Charm
Fix Released
Medium
Martin Kalcok

Bug Description

== Quick summary:
* Juju 2.7.5 (from 2.7/candidate channel)
* k8s deployed on top of bionic-stein
* Kubernetes channel: 1.17/stable
* All k8s services deployed except vault
* Several minutes later, vault application is also deployed
* Hours later, vault is unsealed
* after the weekend, kubernetes-master machines were removed and bundle redeployed, getting the same status
* "juju status --format yaml": https://pastebin.canonical.com/p/3j4NTqz5Y2/

== Longer explanation
Status showed that kubernetes-master was: Waiting for master components to start.

I tried to remove the relation between vault<->kubernetes-master, and I got the following error. I ran "juju resolve --no-retry" and tried to add the relation again:
https://pastebin.canonical.com/p/xxkyQ3FWw7/

I removed the following applications (had to "juju resolve --no-retry" on each -relation-departed, -broken, stop hook, etc.) and redeployed them again (all of them running on 2 nova instances):
* 2x units of kubernetes-master
* 1x easyrsa/0
* 1x openstack-integrator/0

After redeployment (nova instances were recreated), "juju status" looks the same:
https://pastebin.canonical.com/p/jKtxssRGmW/

"journalctl -xe" shows:
Mar 23 09:57:24 juju-f5d4f1-kubernetes-18 kube-apiserver.daemon[30876]: Error: open /root/cdk/server.crt: no such file or directory

~# ls /root/cdk/
audit basic_auth.csv known_tokens.csv serviceaccount.key

Revision history for this message
Alvaro Uria (aluria) wrote :

I initially escalated to ~field-critical but have now de-escalated it after some help in #k8s irc channel.

Several vault:certificates relations were missing. I followed the shared doc [1] by cynerva, and things started working.

FWIW, both easyrsa:client<->kubernetes-master:certificates and vault:certificates<->kubernetes-master:certificates were missing. I've used the second option, as well as related vault to kubernetes-worker, the LB and etcd.

1. https://ubuntu.com/kubernetes/docs/using-vault

Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the detailed report, and for the follow-up.

The "Waiting for master components to start" status is very unclear here. It should say something like "Missing certificates relation" instead.

> I tried to remove the relation between vault<->kubernetes-master, and I got the following error.

This is a known issue: https://bugs.launchpad.net/charm-kubernetes-master/+bug/1844103

> I removed the following applications (had to "juju resolve --no-retry" on each -relation-departed, -broken, stop hook, etc.)

Did you have to run `juju resolve --no-retry` for all the applications you listed? I would expect it for kubernetes-master, given the issue I linked above, but not for the others.

George Kraft (cynerva)
summary: - Fresh k8s deployment, and later Vault deployment + unseal: Waiting for
- master components to start
+ when certificates relation is missing, "Waiting for master components to
+ start" status message is unclear
summary: when certificates relation is missing, "Waiting for master components to
- start" status message is unclear
+ start" status message is unhelpful
Changed in charm-kubernetes-master:
importance: Undecided → Medium
status: New → Triaged
George Kraft (cynerva)
summary: - when certificates relation is missing, "Waiting for master components to
- start" status message is unhelpful
+ when certificates relation is missing or stalled, "Waiting for master
+ components to start" status message is unhelpful
summary: - when certificates relation is missing or stalled, "Waiting for master
- components to start" status message is unhelpful
+ when waiting for certificates relation, "Waiting for master components
+ to start" status message is unhelpful
Revision history for this message
George Kraft (cynerva) wrote :

Added kubernetes-worker since it also needs a better status message when tls_client.certs.saved is missing. The "Waiting for kubelet,kube-proxy to start." message is not helpful.

summary: - when waiting for certificates relation, "Waiting for master components
- to start" status message is unhelpful
+ when waiting for certificates relation, status messages are unhelpful
Changed in charm-kubernetes-worker:
importance: Undecided → Medium
status: New → Triaged
Changed in charm-kubernetes-master:
assignee: nobody → Martin Kalcok (martin-kalcok)
Changed in charm-kubernetes-worker:
assignee: nobody → Martin Kalcok (martin-kalcok)
Revision history for this message
Martin Kalcok (martin-kalcok) wrote :
tags: added: review-needed
George Kraft (cynerva)
tags: added: backport-needed
removed: review-needed
Changed in charm-kubernetes-master:
status: Triaged → Fix Committed
Changed in charm-kubernetes-worker:
status: Triaged → Fix Committed
Changed in charm-kubernetes-master:
milestone: none → 1.20+ck1
Changed in charm-kubernetes-worker:
milestone: none → 1.20+ck1
Revision history for this message
George Kraft (cynerva) wrote :
tags: removed: backport-needed
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.