Failure of rabbit results long delay of recovery
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Bin Qian |
Bug Description
Brief Description
-----------------
In the situation that rabbit fail to disable, it takes long time before SM initiates reboot to recover the system.
Severity
--------
Provide the severity of the defect.
Major: system could be out of service for long time
Steps to Reproduce
------------------
The root cause is that the system running slowly with significant scheduling delay. It requires to reproduce the root cause to reproduce the issue. In which case, rabbit disable action timeout continually.
Expected Behavior
------------------
As final resort, SM should reboot the impact controller to recover the system reasonably short period of time.
Actual Behavior
----------------
It take very long time (40+ minutes) before SM reboot the impact controller.
Reproducibility
---------------
This issue is always reproducible when system runs very slowly with significant scheduling delay.
System Configuration
-------
DX
tags: | added: stx.ha |
Changed in starlingx: | |
assignee: | nobody → Bin Qian (bqian20) |
importance: | Undecided → Medium |
tags: | added: stx.9.0 |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ha/+/880343
Review: https:/