kubernetes-worker units stuck awaiting tokens
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes Worker Charm |
Fix Released
|
High
|
Mateo Florido |
Bug Description
Hello,
We've added 2 worker nodes to a CDK 1.29 cluster for one of our customer.
The nodes got added successfully but the newly added k8s-worker units are stuck `waiting/idle` for tokens for cos user from kubernetes-
```
kubernetes-worker 1.28.9 waiting 5 kubernetes-worker 1.29/beta 183 no Token request for users system:
kubernetes-worker/0 active idle 3 172.25.105.35
kubernetes-
kubernetes-worker/2 active idle 5 172.25.105.36
kubernetes-worker/3 waiting idle 6 172.25.105.40 Token request for users system:
kubernetes-worker/4 waiting idle 7 172.25.105.41 Token request for users system:
```
Same happened for another cluster for the same customer environment.
I checked the `tokens` relation details from requirer end (i.e k8s-worker end) and the tokens for new nodes are only updated in the relation for leader k-c-p unit:
```
{
"relation-id": 20,
"endpoint": "tokens",
"related-
"application-
"local-unit": {
"in-scope": false,
"data": null
},
"related-units": {
"kubernetes
"in-scope": true,
"data": {
"tokens": "{\"system:
}
},
"kubernetes
"in-scope": true,
"data": {
"tokens": "{\"system:
}
},
"kubernetes
"in-scope": true,
"data": {
"tokens": "{\"system:
}
}
}
}
```
```
hacluster-
kubernetes-
kubernetes-
hacluster-
kubernetes-
hacluster-
kubernetes-
hacluster-
```
This cluster is deployed over vSphere cloud added to juju already:
```
Clouds available on the client:
Cloud Regions Default Type Credentials Source Description
juju-context 0 k8s 0 built-in A local Kubernetes context
localhost 1 localhost lxd 0 built-in LXD Container Hypervisor
vsphere-fstk 1 TAM-FSTK-vDC vsphere 1 local
```
Expectations:
1. A workaround to update the "tokens" relation data properly to fix the state of the units.
2. A fix to ensure the relation data is also updated for non-leader k-c-p units in "tokens" relation.
A juju-crashdump has been collected from the cluster and being uploaded.
Please let me know if any further details are needed.
Changed in charm-kubernetes-worker: | |
status: | In Progress → Fix Committed |
importance: | Undecided → High |
Changed in charm-kubernetes-worker: | |
status: | Fix Committed → Fix Released |
PR to main /github. com/charmed- kubernetes/ charm-kubernete s-worker/ pull/172
https:/