Platform CPU threshold exceeded in compute after lock/unlock a different compute host (Storage System)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Invalid
|
High
|
ChenjieXu |
Bug Description
Brief Description
-----------------
Platform CPU threshold exceeded 95% during compute node lock & unlock in a storage system. This is reproduced multiple times in daily sanity.
Severity
--------
Provide the severity of the defect.
Critical: The compute remain in the degraded state after unlock
Steps to Reproduce
------------------
1. system host-lock compute
2. system host-unlock compute
This alarm is reproduced only when lock/unlock on controller executed prior to this test case.
PASS test_horizon_
PASS test_lock_
PASS test_lock_
FAIL test_lock_
Expected Behavior
------------------
The compute expected to be unlocked and in the available state with no alarm raised for CPU threshold.
Actual Behavior
----------------
| Platform CPU threshold exceeded; threshold 95.00%, actual 99.99%
Reproducibility
---------------
CPU threshold issue is reproduced multiple times in daily sanity. Also seen in load 201907290421.
System Configuration
-------
Storage System
Branch/Pull Time/Commit
-------
20190804T233000Z
Last Pass
---------
20190728T013000Z
Timestamp/Logs
--------------
=======
[2019-08-05 08:30:30,888] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
=======
[2019-08-05 08:31:15,904] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
Test Activity
-------------
Sanity
description: | updated |
tags: | added: stx.retestneeded |
Changed in starlingx: | |
assignee: | Cindy Xie (xxie1) → Lin Shuicheng (shuicheng) |
Changed in starlingx: | |
importance: | Medium → High |
Changed in starlingx: | |
assignee: | Lin Shuicheng (shuicheng) → ChenjieXu (midone) |
tags: |
added: stx.networking removed: stx.storage |
tags: | removed: stx.retestneeded |
Some characters of this issue:
- It is seen 5/5 times since 20190728T233000Z load.
- The host got lock/unlocked was compute-0, but the alarm was against compute-1
- The alarm seems to a stale alarm, there was no vm hosted on compute-1 and the alarm stays uncleared
- The alarm on compute-1 eventually got cleared after another lock/unlock of compute-0
Title is updated based on above observations.