kubernetes-control-plane stuck in maintenance with message: Restarting snap.kube-apiserver.daemon service

Bug #1981604 reported by Konstantinos Kaskavelis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
New
Undecided
Unassigned

Bug Description

We got a test run which timed out with kubernetes-control-plane being stuck in maintenance with the message: "Restarting snap.kube-apiserver.daemon service"

Relevant logs (./7/baremetal/var/log/juju/unit-kubernetes-control-plane-0.log)

2022-07-12 17:34:27 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/reactive/kubernetes_control_plane.py", line 1927, in apply_system_monitoring_rbac_role
    kubectl("apply", "-f", path)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/kubernetes_common.py", line 259, in kubectl
    return check_output(command)
  File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['kubectl', '--kubeconfig=/root/.kube/config', 'apply', '-f', '/root/cdk/system-monitoring-rbac-role.yaml']' returned non-zero exit status 1.

2022-07-12 17:34:27 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: Waiting to retry applying system:monitoring RBAC role
2022-07-12 17:34:27 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: Invoking reactive handler: reactive/kubernetes_control_plane.py:2265:configure_apiserver
2022-07-12 17:34:27 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: Executing ['kubectl', '--kubeconfig=/root/.kube/config', 'get', 'service', '--namespace', 'kube-system', 'k8s-keystone-auth-service', '--output', 'json']
2022-07-12 17:34:27 WARNING unit.kubernetes-control-plane/0.aws-relation-changed logger.go:60 The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
2022-07-12 17:34:27 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: Unable to find k8s-keystone-auth-service. Will retry
2022-07-12 17:34:28 INFO unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: status-set: maintenance: Restarting snap.kube-apiserver.daemon service
2022-07-12 17:34:28 DEBUG unit.kubernetes-control-plane/0.juju-log server.go:319 aws:10: tracer>
tracer: set flag kubernetes-control-plane.apiserver.configured
tracer: ++ queue handler reactive/kubernetes_control_plane.py:2471:check_apiserver
tracer: ++ queue handler reactive/kubernetes_control_plane.py:3087:generate_keystone_configmap
tracer: ++ queue handler reactive/kubernetes_control_plane.py:3571:configure_kubelet
tracer: -- dequeue handler reactive/kubernetes_control_plane.py:2265:configure_apiserver
2022-07-12 17:34:28 WARNING unit.kubernetes-control-plane/0.aws-relation-changed logger.go:60 Failed to get unit file state for kube-apiserver.service: No such file or directory
2022-07-12 17:34:28 WARNING unit.kubernetes-control-plane/0.aws-relation-changed logger.go:60 Failed to get unit file state for snap.kube-apiserver.service: No such file or directory

Test run:

https://solutions.qa.canonical.com/testruns/testRun/d04c6917-a25d-4b7a-b709-db2d589e71a2

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
tags: added: cdo-qa foundations-engine
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :

Seen once more, this time stuck restarting snap.kube-controller-manager.

kubernetes-control-plane/2 maintenance executing 7/kvm/0 10.246.200.183 6443/tcp Restarting snap.kube-controller-manager.daemon service
  cilium/10 active idle 10.246.200.183 Ready
  containerd/10 active idle 10.246.200.183 Container runtime available
  ntp/13 active idle 10.246.200.183 123/udp chrony: Ready

Run: https://solutions.qa.canonical.com/testruns/0f2d4d07-8cd8-4836-925c-3e2ffab25759
Artifacts: https://oil-jenkins.canonical.com/artifacts/0f2d4d07-8cd8-4836-925c-3e2ffab25759/index.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.