After DOR test on AIO-DX, alarm for CPU level above 90% was not cleared for more than 5 mins
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Bin Qian |
Bug Description
Bug Description : After the DOR test CPU level for both controller is going above 90% . Standby controller (controller-0) CPU level was not cleared for longer time more than 5mins. This was observed AIO-duplex system .
: Executing command...
[2018-10-09 21:10:29,207] 262 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2018-10-09 21:10:30,628] 382 DEBUG MainThread ssh.expect :: Output:
+------
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 84bc0544-
| 6d631cbd-
+------
controller-1:~$
[2018-10-09 21:14:24,098] 419 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2018-10-09 21:14:24,098] 262 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
[2018-10-09 21:14:25,684] 382 DEBUG MainThread ssh.expect :: Output:
+------
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 6d631cbd-
+------
controller-1:~$
Severity
--------
Major
Steps to Reproduce
------------------
1. Launch VMs
2. Power off all the hosts
3. Power on all the hosts
4. Wait for all the hosts to enabled active
5. Verify alarms . CPU for both controller was 90-95%
6. Controller-0 alarm was not cleared for CPU level for more than 5 mins
Expected Behavior
------------------
Alarm clearing with in 5mins .
Actual Behavior
----------------
As per description
Reproducibility
---------------
50% reproducible . Alarm is not clearing within 5 minutes
System Configuration
-------
Duplex system
Branch/Pull Time/Commit
-------
2018-10-08_01-52-01
Timestamp/Logs
--------------
2018-10-09 21:10:29,207
summary: |
- STX: After DOR test on AIO-DX alarm for CPU level above 90% was not - cleared for more than 5 mins + After DOR test on AIO-DX, alarm for CPU level above 90% was not cleared + for more than 5 mins |
Changed in starlingx: | |
assignee: | nobody → Bin Qian (bqian20) |
Changed in starlingx: | |
status: | Triaged → In Progress |
tags: |
added: stx.1.0 removed: stx.2018.10 |
Suspecting this is a result of an SM CPU hog introduced in the last few weeks. Gating for stx.2018.10