On AIO hosts, kuberenetes is starting before key resources are initialized
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Bin Qian |
Bug Description
Brief Description
-----------------
On AIO hosts, kubernetes (i.e. the kubelet) is started by the controller manifests. This causes pods to be launched before the worker manifests are applied, resulting in pods starting before the worker manifests have configured key resources that may be required by these pods. Some examples: SRIOV, huge pages, cgroups, PTP, FPGA and more. All of these items can impact the startup of the application pods.
This results in pods being launched in a broken state (e.g. LP1896631). We have done some terrible workarounds (e.g. to restart pods after they come up) to deal with this but we need a proper fix to ensure that all the necessary platform configuration has been completed before kubernetes is started.
Severity
--------
Major: System/Feature is usable but degraded
Steps to Reproduce
------------------
Install an AIO system and create pods that use SRIOV, PTP, huge pages, etc...
Reboot the controller(s)
Expected Behavior
------------------
All platform resources are initialized before kubernetes is started (e.g. SRIOV, huge pages, cgroups, PTP, FPGA). Pods using these resources are not started until the resources have been configured.
Actual Behavior
----------------
See above
Reproducibility
---------------
Intermittent - even with the workarounds pods will occasionally fail to come up properly
System Configuration
-------
AIO-SX and AIO-DX
Branch/Pull Time/Commit
-------
stx.4.0
Last Pass
---------
Never - day one issue
Timestamp/Logs
--------------
N/A
Test Activity
-------------
Other
Workaround
----------
Reboot the host and hope for better results
CVE References
Changed in starlingx: | |
assignee: | nobody → Bin Qian (bqian20) |
tags: | added: stx.config |
tags: | added: stx.5.0 |
Changed in starlingx: | |
status: | Triaged → In Progress |
Changed in starlingx: | |
status: | Fix Released → In Progress |
stx.5.0 / medium - robustness fixes to better handle startup sequence on AIO