[1.29 candidate] kubernetes-control-plane charm stuck waiting for 6 kube-system pods to start on aws

Bug #2055337 reported by Jeffrey Chang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
Medium
Adam Dyess
Kubernetes Worker Charm
Fix Released
Medium
Adam Dyess

Bug Description

SolQA see this in 3 testruns -
https://solutions.qa.canonical.com/testruns/dfdcfec7-e351-4804-8d26-cdfd1199f067
https://solutions.qa.canonical.com/testruns/63742612-0359-4699-b74c-b0088e5cb52f
https://solutions.qa.canonical.com/testruns/8ed828e1-e7dc-46bf-8069-172fd50d2b0b

The kubernetes-control-plane units get stuck waiting for kube-system pods to start. The testrun times out after ~4h of waiting:

kubernetes-control-plane/0 waiting idle 6 35.175.239.254 6443/tcp Waiting for 6 kube-system pods to start
  calico/4 active idle 35.175.239.254 Ready
  containerd/4 active idle 35.175.239.254 Container runtime available
  ntp/5 active idle 35.175.239.254 123/udp chrony: Ready
  ubuntu-advantage/5 active idle 35.175.239.254 Attached (esm-apps,esm-infra,livepatch)
kubernetes-control-plane/1* waiting idle 7 18.232.116.222 6443/tcp Waiting for 6 kube-system pods to start
  calico/1 active idle 18.232.116.222 Ready
  containerd/1 active idle 18.232.116.222 Container runtime available
  ntp/2 active idle 18.232.116.222 123/udp chrony: Ready
  ubuntu-advantage/2 active idle 18.232.116.222 Attached (esm-apps,esm-infra,livepatch)

2024-02-28 08:07:57 INFO unit.kubernetes-control-plane/0.juju-log server.go:325 Checking system pods status: aws-cloud-controller-manager-bzd7z=Running, aws-cloud-controller-manager-qtm9m=Running, calico-kube-controllers-56d9dd65b4-bvv8p=Pending, calico-node-57wrn=Running, calico-node-7vz28=Running, calico-node-h2dq7=Running, calico-node-r65h2=Running, calico-node-wh82d=Running, coredns-bddfd76d7-l4px9=Pending, ebs-csi-controller-bdf755867-ppr56=Pending, ebs-csi-controller-bdf755867-rmvjs=Pending, ebs-csi-node-26gpk=Running, ebs-csi-node-2jxtk=Running, ebs-csi-node-rgmjr=Running, ebs-csi-node-rgs98=Running, ebs-csi-node-svt99=Running, kube-state-metrics-78c475f58b-2l276=Pending, metrics-server-v0.6.3-69d7fbfdf8-lbqtm=Pending
2024-02-28 08:07:57 INFO unit.kubernetes-control-plane/0.juju-log server.go:325 Status context closed with: [WaitingStatus('Waiting for 6 kube-system pods to start')]

Could not find any log for calico-kube-controllers, coredns, ebs-csi-controller, ...

from cdk.master.auth-webhook.log

[2024-02-28 03:50:38 +0000] [18080] [INFO] Starting gunicorn 20.1.0
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.25.8) or chardet (4.0.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/__main__.py", line 7, in <module>
    run()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/app/base.py", line 231, in run
    super().run()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/arbiter.py", line 198, in run
    self.start()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/arbiter.py", line 155, in start
    self.LISTENERS = sock.create_sockets(self.cfg, self.log, fds)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/venv/gunicorn/sock.py", line 162, in create_sockets
    raise ValueError('certfile "%s" does not exist' % conf.certfile)
ValueError: certfile "/root/cdk/server.crt" does not exist

Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-master:
milestone: none → 1.29+ck1
status: New → In Progress
tags: added: backport-needed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Changed in charm-kubernetes-worker:
milestone: none → 1.29+ck1
Changed in charm-kubernetes-master:
assignee: nobody → Adam Dyess (addyess)
importance: Undecided → Medium
Changed in charm-kubernetes-worker:
importance: Undecided → Medium
assignee: nobody → Adam Dyess (addyess)
status: New → In Progress
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.