Controller failing to mount /opt/platform

Bug #1966110 reported by Takamasa Takenaka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Takamasa Takenaka

Bug Description

Brief Description
-----------------
When controller-0 is removed from system and reboot controller-1,
controller-1 is failing to become active - Unable to mount /opt/platform.

Severity
--------
Provide the severity of the defect.
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
0. Needs 2 controllers
1. Swact controller-0
   system host-swact controller-0
2. Lock controller-0
   system host-lock controller-0
3. Remove controller-0
   system host-delete controller-0
4. Reboot controller-1

Expected Behavior
------------------
controller-1 should start as normal.

Actual Behavior
----------------
controller-1 fails to become active.
We can login to controller-1 by controller-1 IP (not OAM)
but system is unusable (Error to execute system commands,
for example)

Reproducibility
---------------
<Reproducible:100%>

System Configuration
--------------------
<Two node system (any system which has two controllers)>

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
N/A

Timestamp/Logs
--------------

/var/log/daemon.log
2022-03-22T14:53:42.714 controller-1 controller_config[9141]: info Configuring controller node...
2022-03-22T14:53:44.711 controller-1 controller_config[9141]: info mount: can't find /opt/platform in /etc/fstab
2022-03-22T14:53:44.738 controller-1 controller_config[9141]: info 2: State change failed: (-12) Device is held open by someone
2022-03-22T14:53:44.739 controller-1 controller_config[9141]: info Command 'drbdsetup-84 secondary 2' terminated with exit code 11
2022-03-22T14:53:44.779 controller-1 controller_config[9141]: info drbd-platform: State change failed: (-12) Device is held open by someone
2022-03-22T14:53:44.780 controller-1 controller_config[9141]: info additional info from kernel:
2022-03-22T14:53:44.787 controller-1 controller_config[9141]: info failed to demote
2022-03-22T14:53:44.791 controller-1 controller_config[9141]: info Command 'drbdsetup-84 down drbd-platform' terminated with exit code 11
2022-03-22T14:53:47.042 controller-1 controller_config[9141]: info *****************************************************
2022-03-22T14:53:47.043 controller-1 controller_config[9141]: info *****************************************************
2022-03-22T14:53:47.044 controller-1 controller_config[9141]: info Unable to mount /opt/platform
2022-03-22T14:53:47.045 controller-1 controller_config[9141]: info *****************************************************
2022-03-22T14:53:47.045 controller-1 controller_config[9141]: info *****************************************************
2022-03-22T14:53:47.046 controller-1 controller_config[9141]: info Pausing for 5 seconds...

Test Activity
-------------
[Evaluation]

Workaround
----------
No workaround is available

Changed in starlingx:
assignee: nobody → Takamasa Takenaka (ttakenak)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/831980
Committed: https://opendev.org/starlingx/config/commit/bedfb054faa3213351018c806a0a844d74ca4a3e
Submitter: "Zuul (22348)"
Branch: master

commit bedfb054faa3213351018c806a0a844d74ca4a3e
Author: Takamasa Takenaka <email address hidden>
Date: Fri Mar 4 13:51:27 2022 -0300

    Add retrying mount manually on /opt/platform

    There are two issues:
    - When controller-1 is locked/unlocked with controller-0 removed,
      controller-1 does not start properly. No drbd related disks
      are mounted. The message "mount: can't find /opt/platform in
      /etc/fstab" in /var/log/daemon.log
    - The error message "Device is held open by someone" in
      /var/log/damon.log
    The first issue occurred because there is no entry for platform
    n /opt/fstab as the message.
    The second issue occurred because "drbdadm primary drbd-platform"
    and "drbdadm secondary drbd-platform" are called one another.
    (This happens when mount in platform fails)

    This fix is:
    - Add and call mount with parameter when mount commands fails
    - Add sleep before "drbdadm secondary drbd-platform" is called

    Closes-bug: 1966110

    TEST PLAN:
    PASS: Fresh install SX and DX
    PASS: Path test for patch in SX
          1. Lock/Unlock controller-0
          2. Confirm controller-0 started properly and no error
             Addtional message is not shown in /var/log/daemon.log
    PASS: Path test for patch in DX
          1. Delete controller-0
          2. Reboot controller-1
          3. Confirm controller-1 started properly and no error
             message is observed in /var/log/daemon.log

    Signed-off-by: Takamasa Takenaka <email address hidden>
    Change-Id: I238b9c13c9b66ebbc33207f7dba70cd0a45eca93

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.