StarlingX

Bug #1841670
Comment #4

Comment 4 for bug 1841670

Revision history for this message

Bin Qian (bqian20) wrote on 2019-09-04:

The reboot was caused by a mismatch of kernel arguments configured in the initial boot was detected during the worker manifest.
2019-08-27T07:36:25.556 ^[[mNotice: 2019-08-27 07:36:24 +0000 Scope(Class[Platform::Compute::Grub::Audit]): Kernel Boot Argument Mismatch^[[0m
2019-08-27T07:36:35.389 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Class[Platform::Compute::Grub::Update]: The container Stage[main] will propagate my refresh event^[[0m
2019-08-27T07:36:35.390 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Exec[reboot-recovery](provider=posix): Executing 'reboot'^[[0m
2019-08-27T07:36:35.392 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Executing: 'reboot'^[[0m

The initial boot kernal args:
BOOT_IMAGE=/vmlinuz-3.10.0-957.12.2.el7.3.tis.x86_64 root=UUID=920d8268-2ea6-4f60-acb7-627eba080fb6 ro security_profile=standard module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto biosdevname=0 console=ttyS0,115200n8 iommu=pt usbcore.autosuspend=-1 hugepagesz=2M hugepages=0 default_hugepagesz=2M isolcpus=2,22,3,23 rcu_nocbs=2-19,22-39 kthread_cpus=0,20,1,21 irqaffinity=0,20,1,21 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 intel_iommu=on user_namespace.enable=1 nopti nospectre_v2

kernel args for final boot
BOOT_IMAGE=/vmlinuz-3.10.0-957.12.2.el7.3.tis.x86_64 root=UUID=920d8268-2ea6-4f60-acb7-627eba080fb6 ro security_profile=standard module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto biosdevname=0 console=ttyS0,115200n8 iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 intel_iommu=on user_namespace.enable=1 hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=0 default_hugepagesz=2M irqaffinity=0-1,20-21 rcu_nocbs=2-19,22-39 isolcpus=2-3,22-23 kthread_cpus=0-1,20-21 nopti nospectre_v2

Apparently during the worker manifest, hugepagesz=1G hugepages=2 was configured. The changes for the hugepage required a reboot to apply.

As the cpus on the lab do support 1G huge pages the final config is correct.
The 1G hugepage arg updated only when vswitch type is configured, which happened in the controller manifest after initial unlock reboot. Therefor, a 2nd reboot is required to update kernel command line arg after initial unlock reboot.

This is not a bug, the behavior is by designed.

To avoid the 2nd reboot, the vswitch type needs to be configured after initial unlock before reboot starts.

The reboot was caused by a mismatch of kernel arguments configured in the initial boot was detected during the worker manifest. 
2019-08-27T07:36:25.556 ^[[mNotice: 2019-08-27 07:36:24 +0000 Scope(Class[Platform::Compute::Grub::Audit]): Kernel Boot Argument Mismatch^[[0m
2019-08-27T07:36:35.389 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Class[Platform::Compute::Grub::Update]: The container Stage[main] will propagate my refresh event^[[0m
2019-08-27T07:36:35.390 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Exec[reboot-recovery](provider=posix): Executing 'reboot'^[[0m
2019-08-27T07:36:35.392 ^[[0;36mDebug: 2019-08-27 07:36:35 +0000 Executing: 'reboot'^[[0m

kernel args for final boot 
BOOT_IMAGE=/vmlinuz-3.10.0-957.12.2.el7.3.tis.x86_64 root=UUID=920d8268-2ea6-4f60-acb7-627eba080fb6 ro security_profile=standard module_blacklist=integrity,ima audit=0 tboot=false crashkernel=auto biosdevname=0 console=ttyS0,115200n8 iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 intel_iommu=on user_namespace.enable=1 hugepagesz=1G hugepages=2 hugepagesz=2M hugepages=0 default_hugepagesz=2M irqaffinity=0-1,20-21 rcu_nocbs=2-19,22-39 isolcpus=2-3,22-23 kthread_cpus=0-1,20-21 nopti nospectre_v2

Apparently during the worker manifest, hugepagesz=1G hugepages=2 was configured. The changes for the hugepage required a reboot to apply.

As the cpus on the lab do support 1G huge pages the final config is correct. 
The 1G hugepage arg updated only when vswitch type is configured, which happened in the controller manifest after initial unlock reboot. Therefor, a 2nd reboot is required to update kernel command line arg after initial unlock reboot.

This is not a bug, the behavior is by designed.

To avoid the 2nd reboot, the vswitch type needs to be configured after initial unlock before reboot starts.