CPU isolation doesn't work in StarlingX 5.0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
In Progress
|
Undecided
|
Unassigned |
Bug Description
_______
From: Gaur, Shubham <Shubham.Gaur at commscope.com>
Sent: Wednesday, October 20, 2021 11:05 AM
To: starlingx-discuss at lists.starlingx.io <starlingx-discuss at lists.starlingx.io>
Subject: Re: CPU Isolation over AIO Controller Nodes
Hi All,
CPU isolation is not working in starlingx 5.0. Static CPU manager policy has been enabled on all the nodes and the Isol-CPU resource plugin is up &running but there is no isolated CPU resource pool visible. Could not see any isolated CPU annotations (windriver.
[sysadmin at controller-0 ~(keystone_admin)]$ system host-cpu-list controller-0 | grep Application-
| bb7e63db-
| 94f9f44e-
| 51eb3110-
| 7073bc5a-
| 609c1870-
| 47208322-
| 79da3786-
| 42d5a6d0-
-------
controller-0:~$ kubectl get node controller-0 -o yaml | grep -E " allocatable:" -A 15
allocatable:
cpu: "64"
ephemeral-
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 106113496Ki
pods: "110"
capacity:
cpu: "64"
ephemeral-
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 131303896Ki
pods: "110"
=======
SYSTEM: edgecloud
=======
controller-0:~$ cat /etc/build.info
###
### StarlingX
### Release 21.05
###
OS="centos"
SW_VERSION="21.05"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
JOB="STX_
BUILD_BY=
BUILD_NUMBER="37"
BUILD_HOST=
BUILD_DATE=
FLOCK_OS="centos"
FLOCK_JOB=
FLOCK_BUILD_
FLOCK_BUILD_
FLOCK_BUILD_
FLOCK_BUILD_
DISTRO_OS="centos"
DISTRO_
DISTRO_
DISTRO_
DISTRO_
DISTRO_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
Regards,
Shubham Gaur
_______
From: Gaur, Shubham
Sent: Friday, October 8, 2021 2:07 PM
To: starlingx-discuss at lists.starlingx.io <starlingx-discuss at lists.starlingx.io>
Subject: CPU Isolation over AIO Controller Nodes
Does the CPU isolation feature work over AIO controller nodes in an edge distributed cloud environment?
=======
SYSTEM: edgecloud
=======
controller-0:~$ cat /etc/build.info
###
### StarlingX
### Release 21.05
###
OS="centos"
SW_VERSION="21.05"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
JOB="STX_
BUILD_BY=
BUILD_NUMBER="37"
BUILD_HOST=
BUILD_DATE=
FLOCK_OS="centos"
FLOCK_JOB=
FLOCK_BUILD_
FLOCK_BUILD_
FLOCK_BUILD_
FLOCK_BUILD_
DISTRO_OS="centos"
DISTRO_
DISTRO_
DISTRO_
DISTRO_
DISTRO_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
COMPILER_
Regards,
Shubham Gaur
Please use the template below when opening StarlingX bugs.
Brief Description
-----------------
CPU isolation is not working in starlingx 5.0. Static CPU manager policy has been enabled on all the nodes and the Isol-CPU resource plugin is up &running but there is no isolated CPU resource pool visible. Could not see any isolated CPU annotations (windriver.
Severity
--------
Critical: System/Feature is not usable due to the defect>
Steps to Reproduce
------------------
system host-cpu-list controller-0 | grep Application-
| bb7e63db-
| 94f9f44e-
| 51eb3110-
| 7073bc5a-
| 609c1870-
| 47208322-
| 79da3786-
| 42d5a6d0-
-------
controller-0:~$ kubectl get node controller-0 -o yaml | grep -E " allocatable:" -A 15
allocatable:
cpu: "64"
ephemeral-
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 106113496Ki
pods: "110"
capacity:
cpu: "64"
ephemeral-
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 131303896Ki
pods: "110"
Expected Behavior
------------------
windriver.
Actual Behavior
----------------
windriver.
Reproducibility
---------------
Reproducible
issue is 100% reproducible
System Configuration
-------
AIO Duplex distributed cloud
Branch/Pull Time/Commit
-------
Branch and the time when code was pulled or git commit or cengn load info
Last Pass
---------
Did this test scenario pass previously? If so, please indicate the load/pull time info of the last pass.
Use this section to also indicate if this is a new test scenario.
-->No
Timestamp/Logs
--------------
Attach the logs for debugging (use attachments in Launchpad or for large collect files use: https:/
Provide a snippet of logs here and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem
Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]
Workaround:
Hi,
After a clean install no isolated cpu are listed under /sys/devices/
controller-0:~$ cat /proc/cmdline
BOOT_IMAGE=
After further debug we found keeping rcu_nocbs, nohz_full equal to isolcpu resolved this problem. Looking at the scripts those parameters are calculated separately.
* /usr/lib64/
582,583c582,583
< rcu_nocbs_cpuset = host_cpuset - platform_cpuset
< rcu_nocbs_ranges = utils.format_
---
> # rcu_nocbs_cpuset = host_cpuset - platform_cpuset
> # rcu_nocbs_ranges = utils.format_
589a590,593
> # non-platform logical cpus
> rcu_nocbs_cpuset = host_cpuset - platform_cpuset
> #rcu_nocbs_ranges = utils.format_
> rcu_nocbs_ranges = utils.format_
Is there any specific reason these parameters are calculated separately? Are there any side effects if the following modification has being introduced in platform.py?
Thanks and regards,
Shubham
Changed in starlingx: | |
status: | New → In Progress |
screening: This should be looked at by the containers subproject team