Brief Description
On an AIO-SX installed manually in virtualbox, after the initial unlock it took 24 minutes for sm to enable all the services even though mtcClient knew that "goenabled" had passed at 7 minutes after startup.
Severity
Major
Steps to Reproduce
Install the latest developer load in virtualbox, run the ansible playbook to do the initial configuration, unlock the host.
Expected Behavior
Host should come up and all services should be started automatically.
Actual Behavior
"sm" services are not started until ~24 minutes after bootup.
Reproducibility
Reproducible.
System Configuration
One node system
Load info (eg: 2022-03-10_20-00-07)
StarlingX master, CentOS version.
SW_VERSION="22.06"
BUILD_HOST="yow-cgts4-lx.wrs.com"
BUILD_DATE="2022-06-02 12:57:40 -0400"
Last Pass
Haven't tested it in vbox in a while. Seems to be working in the hardware labs.
Timestamp/Logs
controller-0:~$ sudo sm-dump
Service_Groups-----------------------------------------------------------------------
oam-services initial initial
controller-services initial initial
cloud-services initial initial
patching-services initial initial
<snip>
The following "uptime" command was executed shortly after the above "sm-dump".
controller-0:~$ uptime
17:33:09 up 22 min, 1 user, load average: 0.64, 0.62, 0.65
controller-0:~$ sudo sm-dump
Password:
Service_Groups-----------------------------------------------------------------------
oam-services active active
controller-services active active
cloud-services active active
patching-services active active
<snip>
The following "uptime" command was executed shortly after the above "sm-dump".
controller-0:~$ uptime
17:38:39 up 27 min, 2 users, load average: 1.75, 1.38, 0.99
controller-0:/home/sysadmin# tail /var/log/puppet/latest/puppet.log
2022-06-03T17:17:10.787 Debug: 2022-06-03 17:17:10 +0000 Finishing transaction 73012680
2022-06-03T17:17:10.825 Debug: 2022-06-03 17:17:10 +0000 Storing state
2022-06-03T17:17:10.982 Debug: 2022-06-03 17:17:10 +0000 Stored state in 0.20 seconds
2022-06-03T17:17:10.984 Notice: 2022-06-03 17:17:10 +0000 Applied catalog in 58327.98 seconds
2022-06-03T17:17:11.064 Debug: 2022-06-03 17:17:11 +0000 Applying settings catalog for sections reporting, metrics
2022-06-03T17:17:11.067 Debug: 2022-06-03 17:17:11 +0000 Finishing transaction 134454460
2022-06-03T17:17:11.069 Debug: 2022-06-03 17:17:11 +0000 Received report to process from controller-0
2022-06-03T17:17:11.076 Debug: 2022-06-03 17:17:11 +0000 Evicting cache entry for environment 'production'
2022-06-03T17:17:11.078 Debug: 2022-06-03 17:17:11 +0000 Caching environment 'production' (ttl = 0 sec)
2022-06-03T17:17:11.312 Debug: 2022-06-03 17:17:11 +0000 Processing report from controller-0 with processor Puppet::Reports::Store
controller-0:/home/sysadmin# grep goenabled /var/log/mtcClient.log
2022-06-03T17:17:24.237 [95471.00024] controller-0 mtcClient — mtcNodeComp.cpp (1858) run_goenabled_scripts : Info : GoEnabled Scripts : Main-Function Context
2022-06-03T17:17:24.237 [95471.00026] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/config_goenabled_check.sh start
2022-06-03T17:17:24.237 [95471.00027] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/patch_check_goenabled.sh start
2022-06-03T17:17:24.237 [95471.00028] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/sysinv_goenabled_check.sh start
2022-06-03T17:17:24.237 [95471.00029] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/virt-support-goenabled.sh start
2022-06-03T17:17:24.237 [95471.00030] controller-0 mtcClient — mtcNodeComp.cpp (1583) _launch_all_scripts : Info : ... /etc/goenabled.d/worker-goenabled.sh start
2022-06-03T17:17:24.273 [95471.00032] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/config_goenabled_check.sh (0.000 secs)
2022-06-03T17:17:24.281 [95471.00033] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/patch_check_goenabled.sh (0.000 secs)
2022-06-03T17:17:24.288 [95471.00034] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/worker-goenabled.sh (0.000 secs)
2022-06-03T17:17:25.582 [95471.00036] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/virt-support-goenabled.sh (1.000 secs)
2022-06-03T17:18:14.174 [95471.00037] controller-0 mtcClient — mtcNodeComp.cpp (1954) daemon_sigchld_hdlr : Info : PASSED: /etc/goenabled.d/sysinv_goenabled_check.sh (49.000 secs)
2022-06-03T17:18:14.360 [95471.00038] controller-0 mtcClient — mtcNodeComp.cpp ( 906) _manage_goenabled_tests : Info : GoEnabled Testing Complete ; all tests passed
Alarms
[root@controller-0 sysadmin(keystone_admin)]# fm alarm-list
-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------
Alarm ID Reason Text Entity ID Severity Time Stamp
-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------
750.001 Application Upload Failure k8s_application=rook-ceph-apps warning 2022-06-03T17:37:11.
176214
-------{}{}----------------------{}{}{}{}{}--------------------------{}{}{}{}----{}{}{}{}{}--------------------
From sysinv.log:
sysinv 2022-06-03 17:37:11.139 102690 ERROR sysinv.conductor.kube_app [-] Upload of application rook-ceph-apps (1.0-17) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application rook-ceph-apps (1.0-17) failed: Failed to validate application manifest.
Test Activity
Developer Testing
Workaround
Manually start etcd after unlock.
or
system host-label-assign <host> disable-nohz-full=enabled (prior to the first host unlock)
https:/ /opendev. org/starlingx/ config/ commit/ 1a29a9ea728ee41 1df8c60ac2041bc e6fcba25e7
"Disable nohz_full in Virtual Env"