Failed to deploy when kubeapi-load-balancer and kubernetes-control-plane are on the same host

Bug #2042393 reported by Yoshi Kadokawa
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Worker Charm
New
Undecided
Rafael Lopez

Bug Description

When deploying kubernetes-control-plane and kubeapi-load-balancer on the same host, the juju status for kubernetes-control-plane gets the following error message "hook failed: "kube-control-relation-changed"

And the following error message was found in juju logs.

2023-11-01 04:12:33 ERROR unit.kubernetes-control-plane/1.juju-log server.go:325 kube-control:6: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/reactive/kubernetes_control_plane.py", line 1245, in start_control_plane
    hookenv.open_port(6443)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 832, in open_port
    _port_op('open-port', port, protocol)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 822, in _port_op
    subprocess.check_call(_args)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['open-port', '6443/TCP']' returned non-zero exit status 1.

This can be easily reproduced with charmed-kubernetes bundle with the following overlay.
I used juju v3.1.6

juju deploy charmed-kubernetes --channel 1.28/stable -overlay ./overlay-k8s-placement.yaml

machines:
  # k8s-control-plane
  '0': {constraints: instance-type=c6a.xlarge root-disk=100G}
  '1': {constraints: instance-type=c6a.xlarge root-disk=100G}
  '2': {constraints: instance-type=c6a.xlarge root-disk=100G}
  # k8s-worker
  '3': {constraints: instance-type=c6a.xlarge root-disk=100G}
  '4': {constraints: instance-type=c6a.xlarge root-disk=100G}
  '5': {constraints: instance-type=c6a.xlarge root-disk=100G}

applications:
  easyrsa:
    num_units: 1
    to:
    - 0
  etcd:
    num_units: 3
    to:
    - 0
    - 1
    - 2
  kubeapi-load-balancer:
    num_units: 1
    to:
    - 1
  kubernetes-control-plane:
    num_units: 3
    to:
    - 0
    - 1
    - 2
  kubernetes-worker:
    num_units: 3
    to:
    - 3
    - 4
    - 5

This placement used to work with previous bundle revision.
I believe this is happening after the following relations:

- ['kubernetes-control-plane:loadbalancer-external', 'kubeapi-load-balancer:lb-consumers']
- ['kubernetes-control-plane:loadbalancer-internal', 'kubeapi-load-balancer:lb-consumers']

Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :

BTW, I get the same error when deploying kubeapi-load-balancer on the same host as kubernetes-worker.

Revision history for this message
Rafael Lopez (rafael.lopez) wrote :

Hi Yoshi, which exact bundle revision can you get this working with? was it an earlier revision of 1.28?

I can actually reproduce this on 1.28/stable, 1.27/stable and 1.24/stable. Haven't tried revisions within any channel.

Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote (last edit ):

I believe it worked with the relations that we used to have before

- ['kubernetes-control-plane:loadbalancer-external', 'kubeapi-load-balancer:lb-consumers']
- ['kubernetes-control-plane:loadbalancer-internal', 'kubeapi-load-balancer:lb-consumers']

According to https://github.com/charmed-kubernetes/bundle, I believe 1.21 is the last bundle that used the previous relations.

https://github.com/charmed-kubernetes/bundle/blob/main/releases/1.21/bundle.yaml

So basically, replacing the above 2 relations to the following 3 relations will allow to deploy kubeapi-load-balancer kubernetes-control-plane on the same host.

- ['kubeapi-load-balancer:apiserver', 'kubernetes-control-plane:kube-api-endpoint']
- ['kubeapi-load-balancer:loadbalancer', 'kubernetes-control-plane:loadbalancer']
- ['kubeapi-load-balancer:website', 'kubernetes-worker:kube-api-endpoint']

But I believe the `kube-api-endpoint` relations is kind of deprecated and as far as I see from the ops branch in kuberenetes-control-plane, this relation is been removed.

Revision history for this message
Rafael Lopez (rafael.lopez) wrote :

The simple answer is that in it's current state, the placement you selected is unsupported because kubeapi-load-balancer cannot be deployed on any machine that has applications already listening on ports 6443 or 443.

The kubernetes-control-plane charm is coded to tell the load balancer to serve on the same port that it uses itself (6443) for the loadbalancer-internal relation and 443 for the loadbalancer-external relation. So on control plane nodes, both kube-apiserver and the lb end up trying to listen on 6443. The situation is slightly different for the kubernetes-worker, where 6443 is fine but another service is using 443 - namely the ingress-controller, a container running in kubernetes that is part of ckd.

Although the 3 relations you mentioned may result in a deployed bundle, it actually is not configuring load balancing as intended. The intention is that there is load balancing for both internal and external clients that contact the k8s api, and internal and external traffic are separated by port, hence the two relations. This is what happens if, for example, you deploy the load balancer on its own server. The lb serves on 6443 and 443 fronting the control plane nodes (also on 6443), and the resulting relations set the 'kube-control' endpoint for each kube-worker to point to the lb on 6443, rather than using the external port 443.

I will see if I can put together a patch with a config option to allow adjusting either the lb serving ports or the kube-apiserver internal port to avoid this conflict.

Changed in charm-kubernetes-worker:
assignee: nobody → Rafael Lopez (rafael.lopez)
Revision history for this message
Rafael Lopez (rafael.lopez) wrote :
tags: added: review-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.