controller-1 show offline during system initial on multi-node config

Bug #1889427 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Ovidiu Poncea

Bug Description

Brief Description
-----------------
During system initial, after boot up controller-1, controller-1 does not show online.
controller-1 console log shows, after install is done, it stuck at "Performing post-installation
setup tasks"

Severity
--------
Critical

Steps to Reproduce
------------------
system initialize

TC-name: system install

Expected Behavior
------------------
controller-1 show online after boot up

Actual Behavior
----------------
controller-1 show offline

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Multi-node system
Lab-name: WCP_71-75, WP_8-12

Issue doesn't happen on AIO systems

Branch/Pull Time/Commit
-----------------------
2020-07-29_00-00-00

Last Pass
---------
2020-07-28_00-00-00

Timestamp/Logs
--------------

console log shows:
Installing qemu-kvm-tools-ev (1180/1181)
Installing python-vswitchclient-bash-completion (1181/1181)
Performing post-installation setup tasks
Installing boot loader
.
Performing post-installation setup tasks
.

Configuring installed system
.
Writing network configuration
.
Creating users
.
Configuring addons
.
Generating initramfs
.
Running post-installation scripts
Mirroring software repository (may take several minutes)...
Done

Test Activity
-------------
Sanity

Peng Peng (ppeng)
tags: added: stx.retestneeded
Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / high priority - issue introduced in stx master by recent code changes: https://review.opendev.org/#/c/743246/

tags: added: stx.5.0 stx.config
Changed in starlingx:
status: New → Triaged
summary: - controller-1 show offline during system initial
+ controller-1 show offline during system initial on standard config
description: updated
summary: - controller-1 show offline during system initial on standard config
+ controller-1 show offline during system initial on multi-node config
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/743948

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/743948
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=7a0a2dac1a566d4a1efd0ff875f77c34a553522a
Submitter: Zuul
Branch: master

commit 7a0a2dac1a566d4a1efd0ff875f77c34a553522a
Author: Ovidiu Poncea <email address hidden>
Date: Thu Jul 30 13:25:41 2020 +0300

    Fix issues with controller node Anaconda hang

    On some deployments install fails as we keep one FD open
    during install. This leads to hangs when Anaconda
    'post' stage returns.

    On other deployments install fails as udev sometimes creates
    multiple links to the same devices in /dev/disk/by-path.
    We iterate through this list and, because they are not unique,
    we try to run flocks multiple times for the same device.
    Locking a device multiple times doesn't work, the second
    flock waits for first lock to release.

    This commit:
     o removes 'exec {stdout}>&1' from ks-functions.sh so it no
       longer opens FDs in 'post' stage. For the pre stage we open
       it only when needed;
     o makes sure that list of storage devices is unique;
     o increases timeout of udevadm settle from its default of 180s
       to 300s, the value used throughout Anaconda. This helps
       with slower hardware.

    Closes-Bug: 1889427
    Change-Id: I348f10d96a78ea2c1c25fe6cf48462b0bc31fb84
    Signed-off-by: Ovidiu Poncea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

We have not seen this issue recently.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.