kubernetes-control-plane installation hook failed: "coordinator-relation-changed"

Bug #2021513 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Triaged
Medium
Unassigned
Kubernetes Worker Charm
Triaged
Medium
Unassigned

Bug Description

In testrun https://solutions.qa.canonical.com/v2/testruns/7b1ed314-cb5e-4b46-9071-0d39d9a3e009/, which is ck8s 1.27 jammy on maas with kube-ovn, the control-plane installation fails with status:

======================
kubernetes-control-plane/0 active idle 3/kvm/0 10.246.167.159 6443/tcp Kubernetes control-plane running.
  containerd/7 active idle 10.246.167.159 Container runtime available
  filebeat/23 blocked idle 10.246.167.159 filebeat service not running
  kube-ovn/7 waiting idle 10.246.167.159 Waiting to retry configuring Kube-OVN
  nrpe/29 active idle 10.246.167.159 icmp,5666/tcp Ready
  ntp/10 active idle 10.246.167.159 123/udp chrony: Ready, OK: offset is 0.000046
  telegraf/23 active idle 10.246.167.159 9103/tcp Monitoring kubernetes-control-plane/0 (source version/commit 23.01-8-...)
kubernetes-control-plane/1* error idle 4/kvm/0 10.246.164.138 6443/tcp hook failed: "coordinator-relation-changed"
  containerd/6 active idle 10.246.164.138 Container runtime available
  filebeat/21 blocked idle 10.246.164.138 filebeat service not running
  kube-ovn/6 active idle 10.246.164.138
  nrpe/27 active idle 10.246.164.138 icmp,5666/tcp Ready
  ntp/9 active idle 10.246.164.138 123/udp chrony: Ready, OK: offset is 0.000018
  telegraf/20 active idle 10.246.164.138 9103/tcp Monitoring kubernetes-control-plane/1 (source version/commit 23.01-8-...)
kubernetes-control-plane/2 active executing 5/kvm/0 10.246.167.181 6443/tcp Kubernetes control-plane running.
  containerd/8 active idle 10.246.167.181 Container runtime available
  filebeat/24 blocked idle 10.246.167.181 filebeat service not running
  kube-ovn/8 active idle 10.246.167.181
  nrpe/30 active idle 10.246.167.181 icmp,5666/tcp Ready
  ntp/11 active idle 10.246.167.181 123/udp chrony: Ready, OK: offset is 0.000023
  telegraf/24 active idle 10.246.167.181 9103/tcp Monitoring kubernetes-control-plane/2 (source version/commit 23.01-8-...)
======================

The logs show:
======================
nit-kubernetes-control-plane-1: 16:06:07 DEBUG unit.kubernetes-control-plane/1.coordinator-relation-changed Cluster "juju-cluster" set.
unit-kubernetes-control-plane-1: 16:06:07 WARNING unit.kubernetes-control-plane/1.coordinator-relation-changed internal error, please report: running "kubectl" failed: transient scope could not be started, job /org/freedesktop/systemd1/job/22029 finished with result failed
unit-kubernetes-control-plane-1: 16:06:07 ERROR unit.kubernetes-control-plane/1.juju-log coordinator:5: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/reactive/kubernetes_control_plane.py", line 2180, in build_kubeconfig
    create_kubeconfig(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/lib/charms/layer/kubernetes_common.py", line 355, in create_kubeconfig
    check_call(split(cmd.format(new_kubeconfig)))
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['kubectl', 'config', '--kubeconfig=/home/ubuntu/config.new', 'unset', 'users']' returned non-zero exit status 46.
======================

Crashdumps and configs can be found here:
https://oil-jenkins.canonical.com/artifacts/7b1ed314-cb5e-4b46-9071-0d39d9a3e009/index.html

tags: added: cdo-qa foundations-engine
Revision history for this message
George Kraft (cynerva) wrote :

Looks like an internal error coming from systemd. Not much we can do to prevent it, but we can make the charm handle it better.

This failed kubectl call occurred in create_kubeconfig. We could perhaps avoid the kubectl call entirely and just render our own kubeconfig via yaml.safe_dump and file writes. That or we have to handle failed kubectl calls and retry appropriately.

Changed in charm-kubernetes-master:
importance: Undecided → Medium
status: New → Triaged
Changed in charm-kubernetes-worker:
importance: Undecided → Medium
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.