Debian: Host unlock sometimes fails with heartbeat loss
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Eric MacDonald |
Bug Description
Brief Description
-----------------
Unlock of a system node is seen to sometimes fail with a heartbeat loss.
Initial analysis shows that a slower shutdown sequence in Debian can
trick the offline handler into seeing the node as recovered when in
fact it's still shutting down.
This early recover leads to the enable heartbeat soak test failure
Severity
--------
Major: Unlock takes longer than it should due to retry
Steps to Reproduce
------------------
In a Debian installed and provisioned system ...
system host-lock <hostname>
sleep 60
system host-unlock <hostname>
Expected Behavior
------------------
Host goes enabled after single unlock reboot.
Actual Behavior
----------------
Host unlock fails and is retried. This leads to 2 more reboots ; and possibly but rarely more.
Reproducibility
---------------
less than 5% reproducible
System Configuration
-------
All-In-One IPV4 and IPV6, possibly standard as well.
Branch/Pull Time/Commit
-------
2022-09-01_18-00-06
Last Pass
---------
CentOS
Timestamp/Logs
--------------
2022-09-
Test Activity
-------------
Automated Regression Testing
Workaround
----------
System auto recovers after first failure.
Lock and Unlock if the node enters Auto Recovery Disable state due to 2 back to back unlock enable failures in a row.
Changed in starlingx: | |
assignee: | nobody → Eric MacDonald (rocksolidmtce) |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.8.0 stx.metal |
Fix proposed to branch: master /review. opendev. org/c/starlingx /metal/ +/862161
Review: https:/