Replacing OSD hard disk on controller node fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Paul-Ionut Vaduva |
Bug Description
Brief Description
-----------------
Testing HDD replacement feature failed. After a simulated HDD failure and a subsequent replacement of the HDD used for OSD, controller node failed to unlock.
Severity
Major
Steps to Reproduce
------------------
1. Controller-1 Node was locked and shutdown
2. HDD was removed and replaced with a new one.
3. Node was booted.
Expected Behavior
------------------
After a reboot it is expected for sysinv to update node inventory and replace OSD usage
Actual Behavior
----------------
When node booted with a new disk, udevd began to segfault every minute and system inventory was broken, preventing a correct update. Unlocking the node was also unsuccessful and node entered a reboot loop.
First segfault message:
2019-10-
I was unable to pull core dumps as they were not present in /var/crash. It seems that systemd segfault behavior is complicated on CentOS-based systems
System Configuration
-------
Multi-node system
Branch/Pull Time/Commit
-------
BUILD_ID=
Last Pass
---------
No
Timestamp/Logs
--------------
First boot after disk replacement occurred around 2019-10-
First segfault observed at 2019-10-30T11:42:21
Test Activity
-------------
Feature Testing
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.retestneeded |
Changed in starlingx: | |
assignee: | Ovidiu Poncea (ovidiu.poncea) → Paul-Ionut Vaduva (pvaduva) |
tags: | removed: stx.retestneeded |
Marking gating for stx.4.0. User should be able to replace OSD disks.