Add C-state info into collect tool

Bug #1996238 reported by Andrew Tan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Andrew Tan

Bug Description

Brief Description
-----------------
Add cpupower command output to collect tool's host.info to help debug CPU performance related issues.

Certain zt-proteus servers in our labs had C6 state enabled which led to very poor performance and timeouts occurring during bootstrap process. The added information will help troubleshooting effort.

Severity
--------
Minor

Steps to Reproduce
------------------
NA

Expected Behavior
------------------
Execute 'collect all' will run 'cpupower monitor' which produces CPU stats in /var/extra/host.info

Actual Behavior
----------------
NA

Reproducibility
---------------
NA

System Configuration
--------------------
All

Branch/Pull Time/Commit
-----------------------
NA

Last Pass
---------
New enhancement

Timestamp/Logs
--------------
Example of AIO-SX:
model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

sysadmin@controller-0:~$ sudo cpupower monitor
Password:
              | Nehalem || SandyBridge || Mperf || Idle_Stats
 PKG|CORE| CPU| C3 | C6 | PC3 | PC6 || C7 | PC2 | PC7 || C0 | Cx | Freq || POLL | C1
   0| 0| 0| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 78.07| 21.93| 2293|| 0.06| 21.99
   0| 1| 1| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 58.60| 41.40| 2294|| 0.06| 41.54
   0| 2| 2| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 1.27| 98.73| 2295|| 0.00| 98.74
   0| 3| 3| 0.00| 0.00| 0.00| 0.00|| 0.00| 0.00| 0.00|| 0.55| 99.45| 2297|| 0.00| 99.46

Idle_Stats shows the statistics of the cpuidle kernel subsystem. The kernel updates these values every time an idle state is entered or left. Therefore there can be some inaccuracy when cores are in an idle state for some time when the measure starts or ends.

Test Activity
-------------
System Engineering

Workaround
----------
NA

Andrew Tan (atan2)
Changed in starlingx:
assignee: nobody → Andrew Tan (atan2)
Andrew Tan (atan2)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/864584

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/864584
Committed: https://opendev.org/starlingx/utilities/commit/072ec781e82ff84cb89146eae8609272573a67d7
Submitter: "Zuul (22348)"
Branch: master

commit 072ec781e82ff84cb89146eae8609272573a67d7
Author: Andrew Tan <email address hidden>
Date: Wed Nov 9 17:39:07 2022 +0000

    Added CPU C-state to collect output

    Added 'cpupower monitor' to collect_host
    Output is saved to /var/extra/host.info.
    Certain zt-proteus servers in our labs had C6 state enabled
    which led to poor performance and timeouts occurring during bootstrap
    process. The added information will help troubleshooting effort.

    Test Plan:
    PASS: Install and bootstrap AIO-SX, ran 'colllect all'.
    PASS: Verify /var/extra/host.info file is updated on controller nodes.
    PASS: Verified 'cpupower monitor' command on zt-proteus servers.

    Closes-Bug: 1996238
    Signed-off-by: Andrew Tan <email address hidden>
    Change-Id: I46970ee0ed032b508ced820feb5df142bd954d60

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.8.0 stx.tools
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.