collectd: monitoring incorrect CPU list after AIO-DX install
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Bin Qian |
Bug Description
Brief Description
-----------------
After installing an AIO-DX system, the collectd on controller-1 is monitoring the wrong CPU list. This means CPU alarms for controller-1 are not being raised when they should.
Severity
--------
Major: user will not see CPU usage alarms for controller-1
Steps to Reproduce
------------------
Install an AIO-DX system. Cause high CPU usage on controller-1.
Expected Behavior
------------------
The collectd on controller-1 should be monitoring the platform CPUs.
Actual Behavior
----------------
The collectd on controller-1 is monitoring all CPUs:
2019-07-
2019-07-
2019-07-
I suspect this might be because the /etc/platform/
2019-07-
But it looks like the file was updated after that:
[root@controller-1 ~(keystone_admin)]# stat /etc/platform/
File: \u2018/
Size: 3229 Blocks: 8 IO Block: 4096 regular file
Device: 823h/2083d Inode: 796754 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-07-21 19:17:28.806585159 +0000
Modify: 2019-07-19 19:16:15.893321091 +0000
Change: 2019-07-19 19:16:15.895321091 +0000
Birth: -
This may have something to do with recent changes to update the list of platform CPUs on the fly.
Eric MacDonald (collectd SME) indicated two options to fix the problem:
1. Force restart of collectd if the reserved file is changed
2. Have collectd re-read the reserved file every monitor interval
Option 1 is preferred as option 2 might involve additional enhancements to the collectd plugin to handle or deal with on the fly core allocation changes and conflict over a potentially changing file.
Reproducibility
---------------
Unsure
System Configuration
-------
AIO-DX (two node system)
Branch/Pull Time/Commit
-------
Designer built load:
BUILD_DATE=
Last Pass
---------
Unsure
Timestamp/Logs
--------------
Collect logs will be attached
Test Activity
-------------
Developer testing
Marking as stx.2.0 - cpu alarms not monitored as expected on controller-1