in Virtualbox, sm services taking excessively long to start up after unlock

Bug #1986702 reported by Iago Filipe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Unassigned

Bug Description

Brief Description

On an AIO-SX installed manually in virtualbox, after the initial unlock it took 24 minutes for sm to enable all the services even though mtcClient knew that "goenabled" had passed at 7 minutes after startup.

Severity

Major

Steps to Reproduce

Install the latest developer load in virtualbox, run the ansible playbook to do the initial configuration, unlock the host.

Expected Behavior

Host should come up and all services should be started automatically.

Actual Behavior

"sm" services are not started until ~24 minutes after bootup.

Reproducibility

Reproducible.

System Configuration

One node system

Load info (eg: 2022-03-10_20-00-07)

StarlingX master, CentOS version.

SW_VERSION="22.06"
BUILD_HOST="yow-cgts4-lx.wrs.com"
BUILD_DATE="2022-06-02 12:57:40 -0400"

Last Pass

Haven't tested it in vbox in a while. Seems to be working in the hardware labs.

Timestamp/Logs

controller-0:~$ sudo sm-dump

Service_Groups-----------------------------------------------------------------------
oam-services initial initial
controller-services initial initial
cloud-services initial initial
patching-services initial initial
<snip>

The following "uptime" command was executed shortly after the above "sm-dump".

controller-0:~$ uptime
 17:33:09 up 22 min, 1 user, load average: 0.64, 0.62, 0.65

controller-0:~$ sudo sm-dump
Password:

Service_Groups-----------------------------------------------------------------------
oam-services active active
controller-services active active
cloud-services active active
patching-services active active

<snip>

The following "uptime" command was executed shortly after the above "sm-dump".

controller-0:~$ uptime
 17:38:39 up 27 min, 2 users, load average: 1.75, 1.38, 0.99

controller-0:/home/sysadmin# tail /var/log/puppet/latest/puppet.log
2022-06-03T17:17:10.787 Debug: 2022-06-03 17:17:10 +0000 Finishing transaction 73012680
2022-06-03T17:17:10.825 Debug: 2022-06-03 17:17:10 +0000 Storing state
2022-06-03T17:17:10.982 Debug: 2022-06-03 17:17:10 +0000 Stored state in 0.20 seconds
2022-06-03T17:17:10.984 Notice: 2022-06-03 17:17:10 +0000 Applied catalog in 58327.98 seconds
2022-06-03T17:17:11.064 Debug: 2022-06-03 17:17:11 +0000 Applying settings catalog for sections reporting, metrics
2022-06-03T17:17:11.067 Debug: 2022-06-03 17:17:11 +0000 Finishing transaction 134454460
2022-06-03T17:17:11.069 Debug: 2022-06-03 17:17:11 +0000 Received report to process from controller-0
2022-06-03T17:17:11.076 Debug: 2022-06-03 17:17:11 +0000 Evicting cache entry for environment 'production'
2022-06-03T17:17:11.078 Debug: 2022-06-03 17:17:11 +0000 Caching environment 'production' (ttl = 0 sec)
2022-06-03T17:17:11.312 Debug: 2022-06-03 17:17:11 +0000 Processing report from controller-0 with processor Puppet::Reports::Store

controller-0:/home/sysadmin# grep goenabled /var/log/mtcClient.log
2022-06-03T17:17:24.237 [95471.00024] controller-0 mtcClient — mtcNodeComp.cpp (1858) run_goenabled_scripts : Info : GoEnabled Scripts : Main-Function Context
2022-06-03T17:17:24.237 [95471.00026] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/config_goenabled_check.sh start
2022-06-03T17:17:24.237 [95471.00027] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/patch_check_goenabled.sh start
2022-06-03T17:17:24.237 [95471.00028] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/sysinv_goenabled_check.sh start
2022-06-03T17:17:24.237 [95471.00029] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/virt-support-goenabled.sh start
2022-06-03T17:17:24.237 [95471.00030] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/worker-goenabled.sh start
2022-06-03T17:17:24.273 [95471.00032] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/config_goenabled_check.sh (0.000 secs)
2022-06-03T17:17:24.281 [95471.00033] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/patch_check_goenabled.sh (0.000 secs)
2022-06-03T17:17:24.288 [95471.00034] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/worker-goenabled.sh (0.000 secs)
2022-06-03T17:17:25.582 [95471.00036] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/virt-support-goenabled.sh (1.000 secs)
2022-06-03T17:18:14.174 [95471.00037] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/sysinv_goenabled_check.sh (49.000 secs)
2022-06-03T17:18:14.360 [95471.00038] controller-0 mtcClient — mtcNodeComp.cpp ( 906) _manage_goenabled_tests : Info : GoEnabled Testing Complete ; all tests passed

Alarms

[root@controller-0 sysadmin(keystone_admin)]# fm alarm-list
-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------

Alarm ID Reason Text Entity ID Severity Time Stamp
-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------

750.001 Application Upload Failure k8s_application=rook-ceph-apps warning 2022-06-03T17:37:11.
                                                                                176214

-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------

From sysinv.log:

sysinv 2022-06-03 17:37:11.139 102690 ERROR sysinv.conductor.kube_app [-] Upload of application rook-ceph-apps (1.0-17) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application rook-ceph-apps (1.0-17) failed: Failed to validate application manifest.

Test Activity

Developer Testing

Workaround

Manually start etcd after unlock.

or

system host-label-assign <host> disable-nohz-full=enabled (prior to the first host unlock)

Tags: stx.ha
Iago Filipe (ifest1)
description: updated
description: updated
Ghada Khalil (gkhalil)
tags: added: stx.ha
Revision history for this message
John Kung (john-kung) wrote :
Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

Fixed by 1890323

Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :
Changed in starlingx:
status: New → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.