kubernetes-control-plane stuck in "Waiting for certificates"

Bug #2064130 reported by Jeffrey Chang
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Kubernetes API Load Balancer
Fix Released
High
Adam Dyess
Kubernetes Control Plane Charm
Invalid
Undecided
Unassigned

Bug Description

Deploying Charmed Kubernetes on top of Baremetal with focal,
and all 3 kubernetes-control-plan units stuck in "Waiting for certificates"
in this SQA test run - https://solutions.qa.canonical.com/testruns/5f00eba4-5312-4850-8f7c-8383676ec390

Error Logs

2024-04-28 00:35:56 ERROR unit.kubernetes-control-plane/2.juju-log server.go:325 certificates relation data not yet available.
2024-04-28 00:35:56 ERROR unit.kubernetes-control-plane/2.juju-log server.go:325 certificates relation data not yet available.
2024-04-28 00:35:56 WARNING unit.kubernetes-control-plane/2.juju-log server.go:325 Relation certificates is not yet available.
2024-04-28 00:35:56 ERROR unit.kubernetes-control-plane/2.juju-log server.go:325 'NoneType' object has no attribute 'data'
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-2/charm/venv/charms/reconciler.py", line 34, in reconcile
    result = self.reconcile_function(event)
  File "./src/charm.py", line 489, in reconcile
    self.write_service_account_key()
  File "./src/charm.py", line 593, in write_service_account_key
    key = peer_relation.data[self.app].get("service-account-key")
AttributeError: 'NoneType' object has no attribute 'data'
2024-04-28 00:35:56 INFO unit.kubernetes-control-plane/2.juju-log server.go:325 Status context closed with: [BlockedStatus('Missing relation to certificate authority'), BlockedStatus('Missing relation to etcd'), BlockedStatus('Failed to reconcile, see debug-log'), WaitingStatus('Waiting for certificates')]

2024-04-28 00:45:50 ERROR unit.kubernetes-control-plane/2.juju-log server.go:325 certificates:51: certificates relation data not yet valid. (3 validation errors for Data
ca
  field required (type=value_error.missing)
client.cert
  field required (type=value_error.missing)
client.key
  field required (type=value_error.missing)

Revision history for this message
Jeffrey Chang (modern911) wrote (last edit ):

Noticed same error on kubeapi-load-balancer in this run - https://solutions.qa.canonical.com/testruns/3c230c34-949f-4323-869a-495dc4bca10e
the leader instance of kubeapi-load-balancer stuck in 'hook failed: "leader-elected"'

2024-04-27 08:58:07 DEBUG juju.worker.uniter.operation executor.go:135 preparing operation "run relation-joined (25; unit: vault/0) hook" for kubeapi-load-balancer/1
2024-04-27 08:58:07 DEBUG juju.worker.uniter.operation executor.go:135 executing operation "run relation-joined (25; unit: vault/0) hook" for kubeapi-load-balancer/1
2024-04-27 08:58:07 DEBUG juju.worker.uniter agent.go:22 [AGENT-STATUS] executing: running certificates-relation-joined hook for vault/0
2024-04-27 08:58:07 DEBUG juju.worker.uniter.runner runner.go:719 starting jujuc server {unix @/var/lib/juju/agents/unit-kubeapi-load-balancer-1/agent.socket <nil>}
2024-04-27 08:58:07 DEBUG unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: ops 2.12.0 up and running.
2024-04-27 08:58:09 DEBUG juju.worker.uniter.remotestate watcher.go:803 got a relation units change for kubeapi-load-balancer/1 : {27 {map[kubernetes-control-plane/0:{0}] map[] []}}
2024-04-27 08:58:10 DEBUG unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: Emitting Juju event certificates_relation_joined.
2024-04-27 08:58:10 ERROR unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: certificates relation data not yet valid. (3 validation errors for Data
ca
  field required (type=value_error.missing)
client.cert
  field required (type=value_error.missing)
client.key
  field required (type=value_error.missing)
2024-04-27 08:58:10 INFO unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: Certificates evaluation: Waiting for certificates
2024-04-27 08:58:10 WARNING unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: Relation certificates has yet to set 'common_name'.
2024-04-27 08:58:10 INFO unit.kubeapi-load-balancer/1.juju-log server.go:325 certificates:25: Waiting for certificate

Revision history for this message
Jeffrey Chang (modern911) wrote :

All occurrences could be found on https://solutions.qa.canonical.com/bugs/2064130
And we see this on jammy as well.

Changed in charm-kubernetes-master:
milestone: none → 1.30
status: New → Triaged
importance: Undecided → High
Revision history for this message
Adam Dyess (addyess) wrote :

Jeffrey, i believe i have made some adjustments to the kubeapi-load-balancer i'm seeing in some of these runs where it will go into error:

But the stacktraces you are identifying are not actually a failure, but instead the charm code is merely printing a handled stack trace

https://github.com/charmed-kubernetes/charm-lib-reconciler/blob/release_1.29/charms/reconciler.py#L41-L42

I was able to find in some of the SOLQA runs mentioned in this bug ACTUAL issues in kubeapi-load-balancer (unrelated to "waiting for certificates" and have PRs up for those. The charms were actually in error

1.29 - https://github.com/charmed-kubernetes/charm-kubeapi-load-balancer/pull/35
1.30 - https://github.com/charmed-kubernetes/charm-kubeapi-load-balancer/pull/36
main - https://github.com/charmed-kubernetes/charm-kubeapi-load-balancer/pull/37

Changed in charm-kubernetes-master:
status: Triaged → In Progress
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Re-targeting this as k-l-b fixes since the noted stack traces are expected in those debug logs.

Changed in charm-kubeapi-load-balancer:
status: New → Fix Committed
milestone: none → 1.30
assignee: nobody → Adam Dyess (addyess)
importance: Undecided → High
Changed in charm-kubernetes-master:
milestone: 1.30 → none
importance: High → Undecided
status: In Progress → Invalid
Revision history for this message
Jeffrey Chang (modern911) wrote :

Is it possible to backport to 1.29 ?

Adam Dyess (addyess)
Changed in charm-kubeapi-load-balancer:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.