devstack build error for k8s v1.25.2

Bug #1991757 reported by Ai Hamano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Released
Undecided
Roman Dobosz

Bug Description

When I specified "KURYR_KUBERNETES_VERSION=1.25.2" in local.conf and installed with devstack, the following error occurred and the build failed.

```
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
 timed out waiting for the condition

This error is likely caused by:
 - The kubelet is not running
 - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
 - 'systemctl status kubelet'
 - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
 - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
 Once you have found the failing container, you can inspect its logs with:
 - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
+/opt/stack/kuryr-kubernetes/devstack/lib/kubernetes:kubeadm_init:1 exit_trap
+./stack.sh:exit_trap:516 local r=1
++./stack.sh:exit_trap:517 jobs -p
+./stack.sh:exit_trap:517 jobs=
+./stack.sh:exit_trap:520 [[ -n '' ]]
+./stack.sh:exit_trap:526 '[' -f /tmp/tmp.PQsJHZDDGJ ']'
+./stack.sh:exit_trap:527 rm /tmp/tmp.PQsJHZDDGJ
+./stack.sh:exit_trap:531 kill_spinner
+./stack.sh:kill_spinner:426 '[' '!' -z '' ']'
+./stack.sh:exit_trap:533 [[ 1 -ne 0 ]]
+./stack.sh:exit_trap:534 echo 'Error on exit'
Error on exit
+./stack.sh:exit_trap:536 type -p generate-subunit
+./stack.sh:exit_trap:537 generate-subunit 1664524389 1882 fail
```

The kubelet was not active
```
$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Mon 2022-10-03 02:06:10 UTC; 8s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 255151 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 255151 (code=exited, status=1/FAILURE)
```

and the following was output to the journal.
```
$ sudo journalctl -xeu kubelet | less
...
-- A start job for unit kubelet.service has finished successfully.
--
-- The job identifier is 36037.
Oct 03 01:35:06 vagrant kubelet[248251]: E1003 01:35:06.240987 248251 run.go:74] "command failed" err="failed to parse kubelet flag: unknown flag: --cni-bin-dir"
Oct 03 01:35:06 vagrant systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit kubelet.service has exited.
...
```

The cause here seems to be the use of flags that were removed in v.1.24.
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md
```
Kubelet: the following dockershim related flags are also removed along with dockershim --experimental-dockershim-root-directory, --docker-endpoint, --image-pull-progress-deadline, --network-plugin, --cni-conf-dir, --cni-bin-dir, --cni-cache-dir, --network-plugin-mtu. (#106907, @cyclinder)
```

If v1.24 or later versions are specified, a fix is required to avoid using "cni-bin-dir" and "cni-conf-dir".
The relevant source codes are below.
* https://opendev.org/openstack/kuryr-kubernetes/src/branch/stable/zed/devstack/lib/kubernetes#L109-L110
* https://opendev.org/openstack/kuryr-kubernetes/src/branch/stable/zed/devstack/lib/kubernetes#L179-L180

Changed in kuryr-kubernetes:
assignee: nobody → Roman Dobosz (roman-dobosz)
Changed in kuryr-kubernetes:
status: New → Confirmed
Changed in kuryr-kubernetes:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (master)

Reviewed: https://review.opendev.org/c/openstack/kuryr-kubernetes/+/861630
Committed: https://opendev.org/openstack/kuryr-kubernetes/commit/45d8b5fbad6b5b22e90302adc392ffaa1a540158
Submitter: "Zuul (22348)"
Branch: master

commit 45d8b5fbad6b5b22e90302adc392ffaa1a540158
Author: Roman Dobosz <email address hidden>
Date: Fri Oct 14 11:42:51 2022 +0200

    Support for kubernetes version >1.24.

    Starting from 1.24 kubernetes started to use different registry for it's
    images. That results with inability for kuryr to use the newer versions.
    In this commit support for both registry is added.

    Closes-Bug: #1991757
    Change-Id: I3576159e5afbeb788369519fee12788260b0555f

Changed in kuryr-kubernetes:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 8.0.0.0rc1

This issue was fixed in the openstack/kuryr-kubernetes 8.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.