hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit hang in ESXi
Reproduce:
---------------------
1. create VM with EFI firewall in ESXi
2. install geustOS with ubuntu16.04.4-dsktop-64bit image.
3. reboot it after finish installation
4. edit VM setting and enable hot-add CPU and memory
5. edit VM setting and set default 1 vCPU to 2
6. open a terminal and run script to enable vcpu1: "sudo ~/rescanCpu.sh"
7. check the cpu number with command "cat /proc/cpuinfo" in terminal. but there is not output message for this command. click anywhere on VM desktop, there is no response. It seems like VM hang.
From vmware developer's analysis:
------------------------------------
Looks to me like Ubuntu's problem. It noticed at 566 seconds after boot that CPU1 was hot-added. Then perhaps you run code to online CPU, and doing so ended up with warning at blk-mq.c:
Mar 6 17:41:33 vmware-virtual-machine kernel: [ 566.583896] CPU1 has been hot-added
Mar 6 17:42:17 vmware-virtual-machine CommAmqpListener[2376]: Initializing CommAmqpListener
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.431990] SMP alternatives: switching to SMP code
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.487612] x86: Booting SMP configuration:
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.487616] smpboot: Booting Node 0 Processor 1 APIC 0x2
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.489517] Disabled fast string operations
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.492082] smpboot: CPU 1 Converting physical 2 to logical package 1
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.493162] Will online and init hotplugged CPU: 1
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524713] ------------[ cut here ]------------
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524726] WARNING: CPU: 1 PID: 2402 at /build/linux-hwe-4GXcua/linux-hwe-4.13.0/block/blk-mq.c:1106 __blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524727] Modules linked in: vmw_vsock_vmci_transport vsock nls_iso8859_1 vmw_balloon sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helpe
r cryptd intel_rapl_perf joydev input_leds serio_raw shpchp vmw_vmci i2c_piix4 nfit tpm_crb mac_hid parport_pc ppdev lp parport autofs4 vmw_pvscsi vmwgfx ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm
mptbase nvme nvme_core vmxnet3 scsi_transport_spi ahci libahci pata_acpi floppy
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525006] CPU: 1 PID: 2402 Comm: kworker/1:0H Not tainted 4.13.0-36-generic #40~16.04.1-Ubuntu
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525008] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.7915097.B64.1802282254 02/28/2018
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525013] Workqueue: kblockd blk_mq_run_work_fn
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525016] task: ffff8ab5326b0000 task.stack: ffffa694441d8000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525021] RIP: 0010:__blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525023] RSP: 0018:ffffa694441dbe38 EFLAGS: 00010202
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525026] RAX: 0000000000000001 RBX: ffff8ab50b4edc00 RCX: ffff8ab53c662760
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525028] RDX: ffff8ab50b9cba60 RSI: ffff8ab50b4edc40 RDI: ffff8ab50b4edc00
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525029] RBP: ffffa694441dbe50 R08: 0000000000000000 R09: 0000000000000001
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525031] R10: 00000000000002e2 R11: 000000000000029b R12: ffff8ab5310d69c0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525032] R13: ffff8ab53c662740 R14: ffff8ab53c668300 R15: 0000000000000000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525036] FS: 0000000000000000(0000) GS:ffff8ab53c640000(0000) knlGS:0000000000000000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525039] CR2: 00007ff14dd7ab00 CR3: 000000001180a002 CR4: 00000000001606e0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525140] Call Trace:
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525148] blk_mq_run_work_fn+0x2c/0x30
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525156] process_one_work+0x15b/0x410
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525161] worker_thread+0x4b/0x460
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525164] kthread+0x10c/0x140
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525168] ? process_one_work+0x410/0x410
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525172] ? kthread_create_on_node+0x70/0x70
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525178] ret_from_fork+0x35/0x40
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525181] Code: 00 e8 aa fc 4d 00 4c 89 e7 e8 92 a4 cb ff 48 89 df 41 89 c5 e8 07 5c 00 00 44 89 ee 4c 89 e7 e8 ac a4 cb ff 5b 41 5c 41 5d 5d c3 <0f> ff f6 83 b0 00 00 00 20 75 c4 48 89 df e8 e2
5b 00 00 5b 41
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525233] ---[ end trace d41f096b2f6750c5 ]---
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089082] ------------[ cut here ]------------
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089096] WARNING: CPU: 1 PID: 2407 at /build/linux-hwe-4GXcua/linux-hwe-4.13.0/block/blk-mq.c:1106 __blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089097] Modules linked in: vmw_vsock_vmci_transport vsock nls_iso8859_1 vmw_balloon sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helpe
r cryptd intel_rapl_perf joydev input_leds serio_raw shpchp vmw_vmci i2c_piix4 nfit tpm_crb mac_hid parport_pc ppdev lp parport autofs4 vmw_pvscsi vmwgfx ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm
mptbase nvme nvme_core vmxnet3 scsi_transport_spi ahci libahci pata_acpi floppy
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089234] CPU: 1 PID: 2407 Comm: kworker/1:1H Tainted: G W 4.13.0-36-generic #40~16.04.1-Ubuntu
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089236] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.7915097.B64.1802282254 02/28/2018
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089241] Workqueue: kblockd blk_mq_run_work_fn
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089244] task: ffff8ab535c71740 task.stack: ffffa694401b8000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089249] RIP: 0010:__blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089251] RSP: 0000:ffffa694401bbe38 EFLAGS: 00010202
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089254] RAX: 0000000000000001 RBX: ffff8ab50b4edc00 RCX: ffff8ab53c662760
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089255] RDX: ffff8ab50b9cba60 RSI: ffff8ab50b4edc40 RDI: ffff8ab50b4edc00
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089257] RBP: ffffa694401bbe50 R08: 0000000000000000 R09: 0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089259] R10: 0000000000000289 R11: 0000000000000217 R12: ffff8ab53119e000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089260] R13: ffff8ab53c662740 R14: ffff8ab53c668300 R15: 0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089263] FS: 0000000000000000(0000) GS:ffff8ab53c640000(0000) knlGS:0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089265] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089267] CR2: 0000000000436690 CR3: 000000001180a004 CR4: 00000000001606e0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089374] Call Trace:
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089383] blk_mq_run_work_fn+0x2c/0x30
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089391] process_one_work+0x15b/0x410
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089395] worker_thread+0x4b/0x460
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089399] kthread+0x10c/0x140
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089403] ? process_one_work+0x410/0x410
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089406] ? kthread_create_on_node+0x70/0x70
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089412] ret_from_fork+0x35/0x40
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089415] Code: 00 e8 aa fc 4d 00 4c 89 e7 e8 92 a4 cb ff 48 89 df 41 89 c5 e8 07 5c 00 00 44 89 ee 4c 89 e7 e8 ac a4 cb ff 5b 41 5c 41 5d 5d c3 <0f> ff f6 83 b0 00 00 00 20 75 c4 48 89 df e8 e2
5b 00 00 5b 41
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089467] ---[ end trace d41f096b2f6750c6 ]---
After this no more userspace ever run. Apparently kernel somehow ended up with believing CPU1 is in interrupt, breaking all stuff.
hit the same issue with ubuntu 18.04 desktop 64bit.