Large cpu discrepancy between the schedstat platform cores and cgroup cpuacct based measurements

Bug #1850242 reported by Tee Ngo on 2019-10-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
There's a large discrepancy between cpu Usage reading and cpu Platform reading on worker nodes in a Standard system. Here's a sample:

2019-10-28T01:50:58.550 compute-0 collectd[79959]: info platform cpu usage plugin Usage: 19.3% (avg per cpu); cpus: 1, Platform: 28.0% (Base: 17.3, k8s-system: 8.1, k8s-addon: 2.6)
2019-10-28T01:51:08.550 compute-0 collectd[79959]: info platform cpu usage plugin Usage: 21.7% (avg per cpu); cpus: 1, Platform: 33.2% (Base: 18.1, k8s-system: 8.5, k8s-addon: 6.7)
2019-10-28T01:51:18.550 compute-0 collectd[79959]: info platform cpu usage plugin Usage: 14.8% (avg per cpu); cpus: 1, Platform: 22.5% (Base: 11.5, k8s-system: 8.5, k8s-addon: 2.5)
2019-10-28T01:51:28.550 compute-0 collectd[79959]: info platform cpu usage plugin Usage: 12.7% (avg per cpu); cpus: 1, Platform: 22.0% (Base: 10.7, k8s-system: 7.4, k8s-addon: 3.9)
2019-10-28T01:51:38.550 compute-0 collectd[79959]: info platform cpu usage plugin Usage: 21.0% (avg per cpu); cpus: 1, Platform: 27.3% (Base: 17.6, k8s-system: 7.6, k8s-addon: 2.2)

Severity
--------
Major

Steps to Reproduce
------------------
Install Oct. 24th build in a standard lab
Check host cpu usage in daemon.log

Expected Behavior
------------------
The overall "Usage" reading from schedstats should be in close agreement or higher than the "Platform" reading from cgroup cpuacct depending whether k8s-addon is included in or excluded from "Platform" respectively.

Actual Behavior
----------------
The Platform cpu reading is considerably higher than the overall Usage reading. The issue appears to stem from logic flaw in cpu usage accounting when --cpu-manager-policy is set to none.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Standard

Branch/Pull Time/Commit
-----------------------
Oct. 24th or newer

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Evaluation

Frank Miller (sensfan22) on 2019-10-29
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Ghada Khalil (gkhalil) on 2019-11-01
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0 stx.config
Frank Miller (sensfan22) wrote :

When --cpu-manager-policy is set to none, it appears that some k8s platform processes are floating across all cores and not exclusively running on just the platform cores.

As such marking this medium priority & stx.3.0 gating.

Fix proposed to branch: master
Review: https://review.opendev.org/697718

Changed in starlingx:
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers