Large cpu discrepancy between the schedstat platform cores and cgroup cpuacct based measurements
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Jim Gauld |
Bug Description
Brief Description
-----------------
There's a large discrepancy between cpu Usage reading and cpu Platform reading on worker nodes in a Standard system. Here's a sample:
2019-10-
2019-10-
2019-10-
2019-10-
2019-10-
Severity
--------
Major
Steps to Reproduce
------------------
Install Oct. 24th build in a standard lab
Check host cpu usage in daemon.log
Expected Behavior
------------------
The overall "Usage" reading from schedstats should be in close agreement or higher than the "Platform" reading from cgroup cpuacct depending whether k8s-addon is included in or excluded from "Platform" respectively.
Actual Behavior
----------------
The Platform cpu reading is considerably higher than the overall Usage reading. The issue appears to stem from logic flaw in cpu usage accounting when --cpu-manager-
Reproducibility
---------------
100% reproducible
System Configuration
-------
Standard
Branch/Pull Time/Commit
-------
Oct. 24th or newer
Last Pass
---------
N/A
Timestamp/Logs
--------------
N/A
Test Activity
-------------
Evaluation
Changed in starlingx: | |
assignee: | nobody → Jim Gauld (jgauld) |
Changed in starlingx: | |
importance: | Undecided → Medium |
status: | New → Triaged |
tags: | added: stx.3.0 stx.config |
tags: | added: in-r-stx30 |
tags: | added: in-r-stx30 |
When --cpu-manager- policy is set to none, it appears that some k8s platform processes are floating across all cores and not exclusively running on just the platform cores.
As such marking this medium priority & stx.3.0 gating.