hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit and ubuntu-18.04-desktop hang

Bug #1755393 reported by vmware-gos-Yuhua on 2018-03-13
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit hang in ESXi

Reproduce:
---------------------
1. create VM with EFI firewall in ESXi

2. install geustOS with ubuntu16.04.4-dsktop-64bit image.

3. reboot it after finish installation

4. edit VM setting and enable hot-add CPU and memory

5. edit VM setting and set default 1 vCPU to 2

6. open a terminal and run script to enable vcpu1: "sudo ~/rescanCpu.sh"

7. check the cpu number with command "cat /proc/cpuinfo" in terminal. but there is not output message for this command. click anywhere on VM desktop, there is no response. It seems like VM hang.

From vmware developer's analysis:
------------------------------------

Looks to me like Ubuntu's problem. It noticed at 566 seconds after boot that CPU1 was hot-added. Then perhaps you run code to online CPU, and doing so ended up with warning at blk-mq.c:

Mar 6 17:41:33 vmware-virtual-machine kernel: [ 566.583896] CPU1 has been hot-added
Mar 6 17:42:17 vmware-virtual-machine CommAmqpListener[2376]: Initializing CommAmqpListener
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.431990] SMP alternatives: switching to SMP code
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.487612] x86: Booting SMP configuration:
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.487616] smpboot: Booting Node 0 Processor 1 APIC 0x2
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.489517] Disabled fast string operations
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.492082] smpboot: CPU 1 Converting physical 2 to logical package 1
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.493162] Will online and init hotplugged CPU: 1
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524713] ------------[ cut here ]------------
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524726] WARNING: CPU: 1 PID: 2402 at /build/linux-hwe-4GXcua/linux-hwe-4.13.0/block/blk-mq.c:1106 __blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.524727] Modules linked in: vmw_vsock_vmci_transport vsock nls_iso8859_1 vmw_balloon sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helpe
r cryptd intel_rapl_perf joydev input_leds serio_raw shpchp vmw_vmci i2c_piix4 nfit tpm_crb mac_hid parport_pc ppdev lp parport autofs4 vmw_pvscsi vmwgfx ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm
 mptbase nvme nvme_core vmxnet3 scsi_transport_spi ahci libahci pata_acpi floppy
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525006] CPU: 1 PID: 2402 Comm: kworker/1:0H Not tainted 4.13.0-36-generic #40~16.04.1-Ubuntu
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525008] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.7915097.B64.1802282254 02/28/2018
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525013] Workqueue: kblockd blk_mq_run_work_fn
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525016] task: ffff8ab5326b0000 task.stack: ffffa694441d8000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525021] RIP: 0010:__blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525023] RSP: 0018:ffffa694441dbe38 EFLAGS: 00010202
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525026] RAX: 0000000000000001 RBX: ffff8ab50b4edc00 RCX: ffff8ab53c662760
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525028] RDX: ffff8ab50b9cba60 RSI: ffff8ab50b4edc40 RDI: ffff8ab50b4edc00
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525029] RBP: ffffa694441dbe50 R08: 0000000000000000 R09: 0000000000000001
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525031] R10: 00000000000002e2 R11: 000000000000029b R12: ffff8ab5310d69c0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525032] R13: ffff8ab53c662740 R14: ffff8ab53c668300 R15: 0000000000000000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525036] FS: 0000000000000000(0000) GS:ffff8ab53c640000(0000) knlGS:0000000000000000
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525038] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525039] CR2: 00007ff14dd7ab00 CR3: 000000001180a002 CR4: 00000000001606e0
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525140] Call Trace:
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525148] blk_mq_run_work_fn+0x2c/0x30
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525156] process_one_work+0x15b/0x410
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525161] worker_thread+0x4b/0x460
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525164] kthread+0x10c/0x140
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525168] ? process_one_work+0x410/0x410
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525172] ? kthread_create_on_node+0x70/0x70
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525178] ret_from_fork+0x35/0x40
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525181] Code: 00 e8 aa fc 4d 00 4c 89 e7 e8 92 a4 cb ff 48 89 df 41 89 c5 e8 07 5c 00 00 44 89 ee 4c 89 e7 e8 ac a4 cb ff 5b 41 5c 41 5d 5d c3 <0f> ff f6 83 b0 00 00 00 20 75 c4 48 89 df e8 e2
 5b 00 00 5b 41
Mar 6 17:42:17 vmware-virtual-machine kernel: [ 610.525233] ---[ end trace d41f096b2f6750c5 ]---
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089082] ------------[ cut here ]------------
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089096] WARNING: CPU: 1 PID: 2407 at /build/linux-hwe-4GXcua/linux-hwe-4.13.0/block/blk-mq.c:1106 __blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089097] Modules linked in: vmw_vsock_vmci_transport vsock nls_iso8859_1 vmw_balloon sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helpe
r cryptd intel_rapl_perf joydev input_leds serio_raw shpchp vmw_vmci i2c_piix4 nfit tpm_crb mac_hid parport_pc ppdev lp parport autofs4 vmw_pvscsi vmwgfx ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi mptscsih drm
 mptbase nvme nvme_core vmxnet3 scsi_transport_spi ahci libahci pata_acpi floppy
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089234] CPU: 1 PID: 2407 Comm: kworker/1:1H Tainted: G W 4.13.0-36-generic #40~16.04.1-Ubuntu
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089236] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.7915097.B64.1802282254 02/28/2018
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089241] Workqueue: kblockd blk_mq_run_work_fn
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089244] task: ffff8ab535c71740 task.stack: ffffa694401b8000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089249] RIP: 0010:__blk_mq_run_hw_queue+0x7b/0xa0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089251] RSP: 0000:ffffa694401bbe38 EFLAGS: 00010202
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089254] RAX: 0000000000000001 RBX: ffff8ab50b4edc00 RCX: ffff8ab53c662760
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089255] RDX: ffff8ab50b9cba60 RSI: ffff8ab50b4edc40 RDI: ffff8ab50b4edc00
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089257] RBP: ffffa694401bbe50 R08: 0000000000000000 R09: 0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089259] R10: 0000000000000289 R11: 0000000000000217 R12: ffff8ab53119e000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089260] R13: ffff8ab53c662740 R14: ffff8ab53c668300 R15: 0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089263] FS: 0000000000000000(0000) GS:ffff8ab53c640000(0000) knlGS:0000000000000000
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089265] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089267] CR2: 0000000000436690 CR3: 000000001180a004 CR4: 00000000001606e0
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089374] Call Trace:
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089383] blk_mq_run_work_fn+0x2c/0x30
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089391] process_one_work+0x15b/0x410
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089395] worker_thread+0x4b/0x460
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089399] kthread+0x10c/0x140
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089403] ? process_one_work+0x410/0x410
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089406] ? kthread_create_on_node+0x70/0x70
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089412] ret_from_fork+0x35/0x40
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089415] Code: 00 e8 aa fc 4d 00 4c 89 e7 e8 92 a4 cb ff 48 89 df 41 89 c5 e8 07 5c 00 00 44 89 ee 4c 89 e7 e8 ac a4 cb ff 5b 41 5c 41 5d 5d c3 <0f> ff f6 83 b0 00 00 00 20 75 c4 48 89 df e8 e2
 5b 00 00 5b 41
Mar 6 17:42:22 vmware-virtual-machine kernel: [ 615.089467] ---[ end trace d41f096b2f6750c6 ]---

After this no more userspace ever run. Apparently kernel somehow ended up with believing CPU1 is in interrupt, breaking all stuff.

vmware-gos-Yuhua (yhzou) on 2018-04-24
Changed in kernel-package (Ubuntu):
status: New → Confirmed

hit the same issue with ubuntu 18.04 desktop 64bit.

summary: - hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit hang
+ hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit / ubuntu 18.04
+ desktop hang
summary: - hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit / ubuntu 18.04
- desktop hang
+ hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit /
+ ubuntu-18.04-desktop hang

comments from VMware's Developers after do analysis to ubuntu 18.04 desktop:

  Linux keeps complaining about "run queue from wrong CPU". According to https://lkml.org/lkml/2018/4/25/520 this is harmless and was recently demoted from a WARN_ON to a printk.

no longer affects: kernel-package (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1755393

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful

This bug was initially opened for Artful, but it is now EOL as is the 4.13 kernel. Can you confirm this bug still exists in Bionic(18.04)?

Changed in linux (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → High
tags: added: bionic
removed: artful
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
vmware-gos-Yuhua (yhzou) on 2018-08-21
summary: - hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit /
+ hot-add CPU cause VM with ubuntu-16.04.4-desktop-64bit and
ubuntu-18.04-desktop hang
Joseph Salisbury (jsalisbury) wrote :

The Bionic Ubuntu-4.15.0-14 kernel has the following commit:

540111689a84 blk-mq: turn WARN_ON in __blk_mq_run_hw_queue into printk

Does this bug exist with that kernel version or newer?

Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
vmware-gos-Yuhua (yhzou) wrote :

This issue doesn't exist when check ubuntu18.04.1-desktop64.
Ubuntu-18.04.1-desktop has kernel 4.15.0-32-generic.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers