kubernetes-worker units stuck awaiting tokens

Bug #2065251 reported by Kamal Bhaskar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Worker Charm
In Progress
Undecided
Mateo Florido

Bug Description

Hello,

We've added 2 worker nodes to a CDK 1.29 cluster for one of our customer.

The nodes got added successfully but the newly added k8s-worker units are stuck `waiting/idle` for tokens for cos user from kubernetes-control-plane charm as shown below:

```
kubernetes-worker 1.28.9 waiting 5 kubernetes-worker 1.29/beta 183 no Token request for users system:cos:juju-45eb4c-6 is not yet fulfilled.
kubernetes-worker/0 active idle 3 172.25.105.35
kubernetes-worker/1* active idle 4 172.25.105.34
kubernetes-worker/2 active idle 5 172.25.105.36
kubernetes-worker/3 waiting idle 6 172.25.105.40 Token request for users system:cos:juju-45eb4c-6 is not yet fulfilled.
kubernetes-worker/4 waiting idle 7 172.25.105.41 Token request for users system:cos:juju-45eb4c-7 is not yet fulfilled.

```

Same happened for another cluster for the same customer environment.

I checked the `tokens` relation details from requirer end (i.e k8s-worker end) and the tokens for new nodes are only updated in the relation for leader k-c-p unit:

```
{
  "relation-id": 20,
  "endpoint": "tokens",
  "related-endpoint": "tokens",
  "application-data": {},
  "local-unit": {
    "in-scope": false,
    "data": null
  },
  "related-units": {
    "kubernetes-control-plane/0": {
      "in-scope": true,
      "data": {
        "egress-subnets": "172.25.105.31/32",
        "ingress-address": "172.25.105.31",
        "private-address": "172.25.105.31",
        "tokens": "{\"system:cos:juju-45eb4c-7\": \"kubernetes-worker/4::redacted\", \"system:cos:juju-45eb4c-6\": \"kubernetes-worker/3::redacted\", \"system:cos:juju-45eb4c-3\": \"kubernetes-worker/0::redacted\", \"system:cos:juju-45eb4c-4\": \"kubernetes-worker/1::redacted\", \"system:cos:juju-45eb4c-5\": \"kubernetes-worker/2::redacted\"}"
      }
    },
    "kubernetes-control-plane/1": {
      "in-scope": true,
      "data": {
        "egress-subnets": "172.25.105.32/32",
        "ingress-address": "172.25.105.32",
        "private-address": "172.25.105.32",
        "tokens": "{\"system:cos:juju-45eb4c-3\": \"kubernetes-worker/0::redacted\", \"system:cos:juju-45eb4c-5\": \"kubernetes-worker/2::redacted\", \"system:cos:juju-45eb4c-4\": \"kubernetes-worker/1::redacted\"}"
      }
    },
    "kubernetes-control-plane/2": {
      "in-scope": true,
      "data": {
        "egress-subnets": "172.25.105.33/32",
        "ingress-address": "172.25.105.33",
        "private-address": "172.25.105.33",
        "tokens": "{\"system:cos:juju-45eb4c-4\": \"kubernetes-worker/1::redacted\", \"system:cos:juju-45eb4c-3\": \"kubernetes-worker/0::redacted\", \"system:cos:juju-45eb4c-5\": \"kubernetes-worker/2::redacted\"}"
      }
    }
  }
}
```

```
hacluster-kubernetes-control-plane 2.1.2 active 3 hacluster 2.4/stable 131 no Unit is ready and clustered
kubernetes-control-plane 1.28.9 active 3 kubernetes-control-plane 1.29/beta 377 no
kubernetes-control-plane/0* active idle 0 172.25.105.31
  hacluster-kubernetes-control-plane/1* active idle 172.25.105.31 Unit is ready and clustered
kubernetes-control-plane/1 active idle 1 172.25.105.32
  hacluster-kubernetes-control-plane/2 active idle 172.25.105.32 Unit is ready and clustered
kubernetes-control-plane/2 active idle 2 172.25.105.33
  hacluster-kubernetes-control-plane/0 active idle 172.25.105.33 Unit is ready and clustered
```

This cluster is deployed over vSphere cloud added to juju already:
```
Clouds available on the client:
Cloud Regions Default Type Credentials Source Description
juju-context 0 k8s 0 built-in A local Kubernetes context
localhost 1 localhost lxd 0 built-in LXD Container Hypervisor
vsphere-fstk 1 TAM-FSTK-vDC vsphere 1 local
```

Expectations:
1. A workaround to update the "tokens" relation data properly to fix the state of the units.
2. A fix to ensure the relation data is also updated for non-leader k-c-p units in "tokens" relation.

A juju-crashdump has been collected from the cluster and being uploaded.

Please let me know if any further details are needed.

Tags: backport
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-worker:
milestone: none → 1.29+ck2
assignee: nobody → Mateo Florido (mateoflorido)
status: New → In Progress
tags: added: backport
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.