In rare cases, when system running slowly with significant scheduling
delay, rabbit disable action timeout continually. As final resort sm
reboots the impacted controller for recovery after failure count reaches
MAX_TRANSITION_FAILURES. As rabbit service disable timeout is set to 60
seconds, this result a significant delay before reboot for recovery.
This change updates MAX_TRANSITION_FAILURES of rabbit service from
16 to 5 to reduce the delay of recovery of rabbit failure.
TCs passed:
Install a DX system
Observed service group recovery escalated to reboot after 5 forced
rabbit disable failure.
Closes-bug: 2016168
Signed-off-by: Bin Qian <email address hidden>
Change-Id: I660a64f0e78b6564456eb26245b672d2549f9a3b
Reviewed: https:/ /review. opendev. org/c/starlingx /ha/+/880343 /opendev. org/starlingx/ ha/commit/ a85ffc695ed7a1f 42f39ecfe0f76e5 4db958389a
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit a85ffc695ed7a1f 42f39ecfe0f76e5 4db958389a
Author: Bin Qian <email address hidden>
Date: Thu Apr 13 16:44:05 2023 +0000
Shorten rabbit failure recovery delay
In rare cases, when system running slowly with significant scheduling TRANSITION_ FAILURES. As rabbit service disable timeout is set to 60
delay, rabbit disable action timeout continually. As final resort sm
reboots the impacted controller for recovery after failure count reaches
MAX_
seconds, this result a significant delay before reboot for recovery.
This change updates MAX_TRANSITION_ FAILURES of rabbit service from
16 to 5 to reduce the delay of recovery of rabbit failure.
TCs passed:
Install a DX system
Observed service group recovery escalated to reboot after 5 forced
rabbit disable failure.
Closes-bug: 2016168 64456eb26245b67 2d2549f9a3b
Signed-off-by: Bin Qian <email address hidden>
Change-Id: I660a64f0e78b65