kubelet ID collisions when using multiple kubernetes-worker apps

Bug #1906094 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
New
Undecided
Kevin W Monroe

Bug Description

Hi,

As part of exploratory work to see if it's possible to deploy multiple kubernetes-worker apps and relate them with the same kubernetes-master. Unfortunately, it looks like this presently is not possible.

I believe that this line of code in kubernetes_master.py (function create_tokens_and_sign_auth_requests) is preventing this from working:

  userid = "kubelet-{}".format(request[0].split('/')[1])

In plain English for those lacking immediate context: the request variable is a 2-item tuple of (unit_name, data_mapping) reflecting workers related to the kubernetes-master app. We're simply taking the unit_name (e.g. kubernetes-worker/0), completely discarding the app name and keeping only the suffix (0), and adding a "kubelet-" prefix to come up with the userid ("kubelet-0").

userid is a field used in the known_tokens.csv file. The problem here is, if a second kubernetes-worker app is deployed, the kubernetes-master will hit this code when trying to handle setting up auth for the new worker - and if the unit is e.g. kubernetes-worker-vm/0, that'll also resolve to kubelet-0.

The rest of the code is written so that it will be detected that the new worker doesn't have a token (since that's looked up via username, e.g. system:node:<hostname>, but then when it tries to write a new token, it will find the existing userid record for kubelet-0, replace the username/group/token in that existing record (clobbering whatever was there prior), and then write the data back out to the CSV.

Likewise, if the existing worker's record is processed after, it will see its record doesn't exist, and a new token will be created for it as well, clobbering the new worker's record.

The end result is: tokens may end up getting replaced for both conflicting units (not just the new one), and ultimately only one of the tokens will be retained, thus only one of the conflicting workers will be allowed to communicate with the kubernetes-master.

Why a fix is desired: while it may be a niche use case, there may be cases where different classes of kubernetes-workers need to be deployed, e.g. mix of metals and KVMs, or mix of machines with different types of network access. The current code more-or-less prevents it since it creates a conflict which can only be avoided by ensuring the unit name suffixes never collide, e.g. that there is never a kubernetes-worker/0 and kubernetes-worker-vm/0 at the same time.

Changed in charm-kubernetes-master:
assignee: nobody → Kevin W Monroe (kwmonroe)
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

@Paul, thanks for the report and triage! This was recently hit again and opened as 1906732 with a few more logs detailing the issue. I'm going to dupe this to that; we should have a fix out in the first bugfix release of CK 1.20.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.