kubelet unable to start because of missing cgroup path

Bug #1828270 reported by Allain Legacy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Jim Gauld

Bug Description

Brief Description
-----------------
After a controller reboot the kubelet process is unable to start. A required k8s cgroup sysfs path is missing.

This was on a 2+2 system that was configured using the Ansible playbook method rather than config_controller.

Severity
--------
Critical

Steps to Reproduce
------------------
This was observed after a manual reboot of the active controller (i.e., sudo reboot) on a 2+2 system. The standby controller never took over control (not sure why), but when the active controller came back up kubelet was not starting. The logs indicated a missing cgroup directory (see logs below)

Expected Behavior
------------------
kubelet should start automatically on node reboots.

Actual Behavior
----------------
kubelet process did not start following the reboot

Reproducibility
---------------
100%

System Configuration
--------------------
2+2

Branch/Pull Time/Commit
-----------------------
Private build based on May 6th rebase + some Ansible and networking related fixes.

Last Pass
---------
Unknown

Timestamp/Logs
--------------
2019-05-08T11:35:44.076 controller-0 kubelet[61186]: info I0508 11:35:44.076454 61186 server.go:407] Version: v1.13.5
2019-05-08T11:35:44.076 controller-0 kubelet[61186]: info I0508 11:35:44.076685 61186 plugins.go:103] No cloud provider specified.
2019-05-08T11:35:44.080 controller-0 kubelet[61186]: info I0508 11:35:44.080185 61186 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
2019-05-08T11:35:44.121 controller-0 kubelet[61186]: info F0508 11:35:44.121197 61186 server.go:261] failed to run Kubelet: invalid configuration: cgroup-root ["k8s-infra"] doesn't exist: <nil>

Test Activity
-------------
Developer testing

Changed in starlingx:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Jim Gauld (jgauld)
tags: added: stx.2.0
Ghada Khalil (gkhalil)
tags: added: stx.containers
Jim Gauld (jgauld)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/658823

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/658823
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=6bd45c96dd05e819d70140f6fe4d27e073b3988a
Submitter: Zuul
Branch: master

commit 6bd45c96dd05e819d70140f6fe4d27e073b3988a
Author: Jim Gauld <email address hidden>
Date: Fri May 10 12:57:37 2019 -0400

    Create k8s-infra cgroup path before kubelet launch

    This adds a kubelet ExecStartPre script to ensure cgroup is setup
    prior to kubelet launch. This creates k8s-infra cgroup for a minimal
    set of resource controllers, and configures cpuset attributes to span
    all online cpus and nodes. This will do nothing if the k8s-infra cgroup
    already exists (i.e., assume already configured).

    NOTE: The creation of directories under /sys/fs/cgroup is volatile, and
    does not persist reboots. The cpuset.mems and cpuset.cpus is later
    updated by puppet kubernetes.pp manifest.

    Tests performed:
    Standard system: system install, lock/unlock controller & computes,
    forced reboot: active/standby controller, computes.

    Change-Id: I6a7aad5c40fe8225e9e16c8d8b40a0cffd76715d
    Closes-Bug: 1828270
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Yatindra Shashi (yshashi) wrote :

Hi Team,

I am trying to add my own yocto build machine (also in ubuntu) to the STX 4.0 K8s cluster but kubelete service fails on my machine telling me below issue. I tried to run the file

 systemd[1]: Started Kubernetes systemd probe.
 kubelet[8723]: I0216 11:07:06.581479 8723 server.go:417] Version: v1.18.12
 kubelet[8723]: I0216 11:07:06.582547 8723 plugins.go:100] No cloud provider specified.
 kubelet[8723]: I0216 11:07:06.582637 8723 server.go:838] Client rotation is on, will bootstrap in background
 systemd[1]: run-r5abec12f758b45a98daf496b33d9964d.scope: Succeeded.
 kubelet[8723]: I0216 11:07:06.600482 8723 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
 kubelet[8723]: I0216 11:07:06.602408 8723 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
 kubelet[8723]: F0216 11:07:06.647233 8723 server.go:274] failed to run Kubelet: invalid configuration: cgroup-root ["k8s-infra"] doesn't exist
 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
 systemd[1]: kubelet.service: Failed with result 'exit-code'.

- I run the script "https://review.opendev.org/c/starlingx/integ/+/658823/4/kubernetes/kubernetes/centos/files/kubelet-cgroup-setup.sh#1"
still it shows same error after kubelet service restart.

Revision history for this message
Yatindra Shashi (yshashi) wrote :

Hi team,

It gets solved by enabling kernel config related to k8s-infra cgroup and restarting machine as yocto machine kernel does not have kernel config enabled by default.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.