AIO-SX install failing in virtual box (non-kubernetes)

Bug #1816764 reported by Bart Wensley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Bin Qian

Bug Description

Title
-----
AIO-SX install failing in virtual box (non-kubernetes)

Brief Description
-----------------
My AIO-SX install on virtual box is failing. This is with a load built on February 19th from master. It is a regular installation (not kubernetes). The issue is that none of the SM controlled services are being started after the host is unlocked.

The first sign of trouble is here:
2019-02-20T01:36:25.000 controller-0 sm: debug time[487.042] log<15> ERROR: sm[31727]: sm_hw.c(238): Failed to find thread information.
2019-02-20T01:36:25.000 controller-0 sm: debug time[487.042] log<16> ERROR: sm[31727]: sm_service_domain_interface_unknown_state.c(75): Failed to audit hardware state of interface (lo), error=FAILED
2019-02-20T01:36:25.000 controller-0 sm: debug time[487.042] log<17> ERROR: sm[31727]: sm_service_domain_interface_fsm.c(421): Service domain (controller) interface (management-interface) unable to handle event (not-in-use) in state (unknown), error=FAILED.
2019-02-20T01:36:25.000 controller-0 sm: debug time[487.042] log<18> ERROR: sm[31727]: sm_service_domain_interface_api.c(186): Event (not-in-use) not handled for service domain (controller) interface (management-interface), error=FAILED.

This is followed by logs like this that repeat forever:

2019-02-20T01:36:27.000 controller-0 sm: debug time[488.879] log<3> INFO: sm_service_hb[31768]: sm_timer.c(288): Not scheduling on time, elapsed=499 ms.
2019-02-20T01:36:27.000 controller-0 sm: debug time[488.879] log<47> INFO: sm[31727]: sm_timer.c(288): Not scheduling on time, elapsed=408 ms.
2019-02-20T01:36:31.000 controller-0 sm: debug time[493.000] log<48> INFO: sm[31727]: sm_timer.c(300): Now scheduling on time.
2019-02-20T01:36:32.000 controller-0 sm: debug time[493.460] log<49> INFO: sm[31727]: sm_timer.c(288): Not scheduling on time, elapsed=460 ms.
2019-02-20T01:36:32.000 controller-0 sm: debug time[493.461] log<50> INFO: sm[31727]: sm_service_domain_waiting_state.c(60): Not scheduling on time in the last 5010 ms, waiting another 5010 ms for service domain (controller).
2019-02-20T01:36:34.000 controller-0 sm: debug time[495.875] log<51> INFO: sm[31727]: sm_timer.c(300): Now scheduling on time.

Severity
--------
Critical - although I'm not sure if this issue is virtual box specific

Steps to Reproduce
------------------
Attempt to install an AIO-SX system in virtual box.

Expected Behavior
------------------
Installation succeeds.

Actual Behavior
----------------
After running config_controller and then unlocking the controller (after appropriate config is done), the SM controlled services do not start.

Reproducibility
---------------
Reproducible (at least in my vbox environment)

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
Load built from master branch from pull on February 19 (morning).

Timestamp/Logs
--------------
See above

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue introduced by recent commit

Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
importance: Undecided → High
status: New → Triaged
tags: added: stx.2019.05 stx.ha
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-ha (master)

Fix proposed to branch: master
Review: https://review.openstack.org/638664

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/638665

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-ha (master)

Fix proposed to branch: master
Review: https://review.openstack.org/638669

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-ha (master)

Reviewed: https://review.openstack.org/638669
Committed: https://git.openstack.org/cgit/openstack/stx-ha/commit/?id=f86e8160dde50df864094d917dc9900c53aebf9a
Submitter: Zuul
Branch: master

commit f86e8160dde50df864094d917dc9900c53aebf9a
Author: Bin Qian <email address hidden>
Date: Fri Feb 22 09:15:49 2019 -0500

    Initialize sm_hw earlier

    The sm_hw is initialized too late to cause a few error log messages:

    Failed to find thread information.
    Failed to audit hardware state of interface (lo), error=FAILED

    Change-Id: Ie7f813ff9a7900785e6d2af0ad5a75edc0cbf7c0
    Partial-Bug: 1816764
    Signed-off-by: Bin Qian <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/638664
Committed: https://git.openstack.org/cgit/openstack/stx-ha/commit/?id=720232befe69f9e1524a2507e51ed07855d571b3
Submitter: Zuul
Branch: master

commit 720232befe69f9e1524a2507e51ed07855d571b3
Author: Bin Qian <email address hidden>
Date: Thu Feb 21 14:43:32 2019 -0500

    Enable configurable sm process priority through sm-configure

    In some cases sm will need to adjust its process priority. This
    change enables the configuring sm priority as part of sm at runtime

    Partial-Bug: 1816764
    Change-Id: I860759621c0d1389ca5a3c947d7973c185274bdd
    Signed-off-by: Bin Qian <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/638665
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=a6934ac9d27e0357d0025018077441d989679409
Submitter: Zuul
Branch: master

commit a6934ac9d27e0357d0025018077441d989679409
Author: Bin Qian <email address hidden>
Date: Thu Feb 21 14:46:34 2019 -0500

    Boost sm process priority in VBox environment

    There is an instance that sm claimed its main thread ran sluggish
    as some critical timer run behind the scheuled timing.
    The issue could prevent the sm from scheduling services.
    As the result, the controller could fail to enable.

    The issue was found only on vbox labs on AIO-SX, the fix is to boost
    sm process priority to nice value -10 from current -2.

    Closes-Bug: 1816764
    Depends-On: https://review.openstack.org/638664
    Change-Id: Iafa17b1c47d65cc7394552ea1c8e7a78398e4869
    Signed-off-by: Bin Qian <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (f/stein)

Reviewed: https://review.openstack.org/639130
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=7471ef852b7c37c742ef273f0df6b8ccce3bd928
Submitter: Zuul
Branch: f/stein

commit 7471ef852b7c37c742ef273f0df6b8ccce3bd928
Author: Bin Qian <email address hidden>
Date: Thu Feb 21 14:46:34 2019 -0500

    Boost sm process priority in VBox environment

    There is an instance that sm claimed its main thread ran sluggish
    as some critical timer run behind the scheuled timing.
    The issue could prevent the sm from scheduling services.
    As the result, the controller could fail to enable.

    The issue was found only on vbox labs on AIO-SX, the fix is to boost
    sm process priority to nice value -10 from current -2.

    Closes-Bug: 1816764
    Depends-On: https://review.openstack.org/638664
    Change-Id: Iafa17b1c47d65cc7394552ea1c8e7a78398e4869
    Signed-off-by: Bin Qian <email address hidden>
    (cherry picked from commit a6934ac9d27e0357d0025018077441d989679409)

tags: added: in-f-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/639397

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (f/stein)
Download full text (3.6 KiB)

Reviewed: https://review.openstack.org/639397
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=bf0aa2c78d2397073baa12c2efcf3bcf2cc9d84b
Submitter: Zuul
Branch: f/stein

commit 611a68a96ab915dc4e97d39dffa5c379bbffef3d
Author: Mingyuan Qi <email address hidden>
Date: Wed Jan 30 09:41:27 2019 +0800

    Allow user specified registries for config_controller

    Currently docker images were pulled from public registries during
    config_controller. For some users, the connection to the public
    docker registry may be slow such that installing the containerized
    services images may timeout or the system simply does not have
    access to the public internet.

    This change allows users to specify alternative public/private
    registries to replace k8s.gcr.io, gcr.io, quay.io and docker.io.
    Insecure registry is supported if all default registries were
    replaced by one unified registry. It lowers the complexity for
    those who build his own registry without internet access.

    Docker doesn't support ipv6 addr as registry name, instead
    hostname or domain name in ipv6 network is allowed.

    Test:
    AIO-SX/AIO-DX/Standard(2+2):
      Alternative public registry (ipv4/domain) with proxy
        - config_controller pass
      Private registry (ipv4/ipv6/domain) without internet
        - config_controller pass
      Default registry with/without proxy
        - config_controller pass

    Story: 2004711
    Task: 28742

    Change-Id: I4fee3f4e0637863b9b5ef4ef556082ac75f62a1d
    Signed-off-by: Mingyuan Qi <email address hidden>

commit cb4b30bf56195456ac6b8bd11abf7e23f90f81a4
Author: Angie Wang <email address hidden>
Date: Fri Feb 22 01:21:07 2019 -0500

    Solve the stx-openstack reapply issue on controller-1

    After stx-openstack applied, the stx-openstack reapply shouldn't
    trigger the charts reinstallation if there has no overrides changed
    for charts. However, the reinstallation happens after swacting active
    controller to controller-1 due to the generated images overrides on
    controller-1 are different from before. The images overrides generation
    requires walking through the stx-openstack charts stored under
    /scratch, but charts do not exist on controller-1's /scratch as it's
    an unreplicated filesystem. This causes the images overrides to differ
    between controller-1 and controller-0.

    This commit updates to walk through charts and get the images for
    charts during application-upload, then save the images list for each
    chart into the existing images file under aramda directory
    /opt/platform/armada. The images file would be used for retrieving
    the images for charts to generate images overrides.

    Closes-Bug: 1816173
    Change-Id: I4f00c3031decb063f8f126d0c837acd4dde56fc3
    Signed-off-by: Angie Wang <email address hidden>

commit a6934ac9d27e0357d0025018077441d989679409
Author: Bin Qian <email address hidden>
Date: Thu Feb 21 14:46:34 2019 -0500

    Boost sm process priority in VBox environment

    There is an instance that sm claimed its main thread ran sluggish
    as some crit...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/640464

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)
Download full text (15.0 KiB)

Reviewed: https://review.openstack.org/640464
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=1b22b5313d0618792732066a8fe47460d8ef06de
Submitter: Zuul
Branch: master

commit 654c05df0e45aa47d18ce72e5ba003195872790f
Author: Al Bailey <email address hidden>
Date: Fri Feb 22 16:35:12 2019 -0600

    The --kubernetes flag no longer has an effect.

    kubernetes mode is always enabled, the flag cannot be used to
    enable or disable it.

    The option in the CLI will be removed completely once the wiki
    and any test tools are updated.

    The code that handles the "else" will also be updated in a
    later commit

    Story: 2004751
    Task: 29756
    Change-Id: I75a81ab852252ee108fefeca5682e5b1a9d7374e
    Signed-off-by: Al Bailey <email address hidden>

commit 03b08b9722e83597797de93abef54f787b93bab5
Author: Mingyuan Qi <email address hidden>
Date: Wed Jan 30 09:41:27 2019 +0800

    Allow user specified registries for config_controller

    Currently docker images were pulled from public registries during
    config_controller. For some users, the connection to the public
    docker registry may be slow such that installing the containerized
    services images may timeout or the system simply does not have
    access to the public internet.

    This change allows users to specify alternative public/private
    registries to replace k8s.gcr.io, gcr.io, quay.io and docker.io.
    Insecure registry is supported if all default registries were
    replaced by one unified registry. It lowers the complexity for
    those who build his own registry without internet access.

    Docker doesn't support ipv6 addr as registry name, instead
    hostname or domain name in ipv6 network is allowed.

    Test:
    AIO-SX/AIO-DX/Standard(2+2):
      Alternative public registry (ipv4/domain) with proxy
        - config_controller pass
      Private registry (ipv4/ipv6/domain) without internet
        - config_controller pass
      Default registry with/without proxy
        - config_controller pass

    Story: 2004711
    Task: 28742

    Change-Id: I4fee3f4e0637863b9b5ef4ef556082ac75f62a1d
    Signed-off-by: Mingyuan Qi <email address hidden>
    (cherry picked from commit 611a68a96ab915dc4e97d39dffa5c379bbffef3d)

commit 7471ef852b7c37c742ef273f0df6b8ccce3bd928
Author: Bin Qian <email address hidden>
Date: Thu Feb 21 14:46:34 2019 -0500

    Boost sm process priority in VBox environment

    There is an instance that sm claimed its main thread ran sluggish
    as some critical timer run behind the scheuled timing.
    The issue could prevent the sm from scheduling services.
    As the result, the controller could fail to enable.

    The issue was found only on vbox labs on AIO-SX, the fix is to boost
    sm process priority to nice value -10 from current -2.

    Closes-Bug: 1816764
    Depends-On: https://review.openstack.org/638664
    Change-Id: Iafa17b1c47d65cc7394552ea1c8e7a78398e4869
    Signed-off-by: Bin Qian <email address hidden>
    (cherry picked from commit a6934ac9d27e0357d0025018077441d989679409)

commit 5e61519ac92822b959dffe63b76956cf0...

Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.