Controller failing to mount /opt/platform
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Takamasa Takenaka |
Bug Description
Brief Description
-----------------
When controller-0 is removed from system and reboot controller-1,
controller-1 is failing to become active - Unable to mount /opt/platform.
Severity
--------
Provide the severity of the defect.
<Critical: System/Feature is not usable due to the defect>
Steps to Reproduce
------------------
0. Needs 2 controllers
1. Swact controller-0
system host-swact controller-0
2. Lock controller-0
system host-lock controller-0
3. Remove controller-0
system host-delete controller-0
4. Reboot controller-1
Expected Behavior
------------------
controller-1 should start as normal.
Actual Behavior
----------------
controller-1 fails to become active.
We can login to controller-1 by controller-1 IP (not OAM)
but system is unusable (Error to execute system commands,
for example)
Reproducibility
---------------
<Reproducible:100%>
System Configuration
-------
<Two node system (any system which has two controllers)>
Branch/Pull Time/Commit
-------
master
Last Pass
---------
N/A
Timestamp/Logs
--------------
/var/log/daemon.log
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
2022-03-
Test Activity
-------------
[Evaluation]
Workaround
----------
No workaround is available
Changed in starlingx: | |
assignee: | nobody → Takamasa Takenaka (ttakenak) |
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.7.0 stx.config |
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/831980 /opendev. org/starlingx/ config/ commit/ bedfb054faa3213 351018c806a0a84 4d74ca4a3e
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit bedfb054faa3213 351018c806a0a84 4d74ca4a3e
Author: Takamasa Takenaka <email address hidden>
Date: Fri Mar 4 13:51:27 2022 -0300
Add retrying mount manually on /opt/platform
There are two issues: var/log/ damon.log
- When controller-1 is locked/unlocked with controller-0 removed,
controller-1 does not start properly. No drbd related disks
are mounted. The message "mount: can't find /opt/platform in
/etc/fstab" in /var/log/daemon.log
- The error message "Device is held open by someone" in
/
The first issue occurred because there is no entry for platform
n /opt/fstab as the message.
The second issue occurred because "drbdadm primary drbd-platform"
and "drbdadm secondary drbd-platform" are called one another.
(This happens when mount in platform fails)
This fix is:
- Add and call mount with parameter when mount commands fails
- Add sleep before "drbdadm secondary drbd-platform" is called
Closes-bug: 1966110
TEST PLAN:
Addtional message is not shown in /var/log/daemon.log
message is observed in /var/log/daemon.log
PASS: Fresh install SX and DX
PASS: Path test for patch in SX
1. Lock/Unlock controller-0
2. Confirm controller-0 started properly and no error
PASS: Path test for patch in DX
1. Delete controller-0
2. Reboot controller-1
3. Confirm controller-1 started properly and no error
Signed-off-by: Takamasa Takenaka <email address hidden> bbc33207f7dba70 cd0a45eca93
Change-Id: I238b9c13c9b66e