[Impact]
It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env.
The crashes started after the above move (5.5->6.5).
Notes:
* The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5
* The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5.
Here's the stack trace took from the .vmss converted to be readable by Linux debugger:
I have identified a patch series[2] which seems to fix the exact same situation.
A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware.
[1] - VM details :
Release: Trusty
Kernel: Ubuntu-3.13.0-135
[2] Upstream patch series
8ab18d7 VSOCK: Detach QP check should filter out non matching QPs.
8566b86 VSOCK: Fix lockdep issue.
4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context
[Impact]
It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env.
The crashes started after the above move (5.5->6.5).
Notes:
* The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5
* The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5.
Here's the stack trace took from the .vmss converted to be readable by Linux debugger:
[17007961.187411] BUG: scheduling while atomic: swapper/ 3/0/0x00000100 netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_ vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi 959>] dump_stack+ 0x64/0x82 d8c>] __schedule_ bug+0x4c/ 0x5a c8f>] __schedule+ 0x6af/0x7f0 df9>] schedule+0x29/0x70 049>] schedule_ timeout+ 0x279/0x310 57b>] ? select_ task_rq_ fair+0x56b/ 0x6f0 852>] ? enqueue_ task_fair+ 0x422/0x6d0 de5>] ? sched_clock_ cpu+0xb5/ 0x100 906>] wait_for_ completion+ 0xa6/0x150 2a0>] ? wake_up_ state+0x20/ 0x20 ce0>] ? __call_ rcu+0x2d0/ 0x2d0 2eb>] wait_rcu_ gp+0x4b/ 0x60 280>] ? ftrace_ raw_output_ rcu_utilization +0x50/0x50 45a>] synchronize_ sched+0x3a/ 0x50 936>] vmci_event_ unsubscribe+ 0x76/0xb0 [vmw_vmci] 5f1>] vmci_transport_ destruct+ 0x21/0xe0 [vmw_vsock_ vmci_transport] 837>] vsock_sk_ destruct+ 0x17/0x60 [vsock] 9df>] __sk_free+ 0x1f/0x180 b59>] sk_free+0x19/0x20 2c0>] vmci_transport_ recv_stream_ cb+0x200/ 0x2f0 [vmw_vsock_ vmci_transport] efc>] vmci_datagram_ invoke_ guest_handler+ 0xbc/0xf0 [vmw_vmci] dbf>] vmci_dispatch_ dgs+0xcf/ 0x230 [vmw_vmci] 8ee>] tasklet_ action+ 0x11e/0x130 d8c>] __do_softirq+ 0xfc/0x310 315>] irq_exit+ 0x105/0x110 7e6>] do_IRQ+0x56/0xc0 e6d>] common_ interrupt+ 0x6d/0x6d 586>] ? native_ safe_halt+ 0x6/0x10 b7f>] default_ idle+0x1f/ 0x100 496>] arch_cpu_ idle+0x26/ 0x30 b91>] cpu_startup_ entry+0xc1/ 0x2b0 7fd>] start_secondary +0x21d/ 0x2d0
[17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_
[17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu
[17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013
[17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30
[17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00
[17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8
[17007961.189885] Call Trace:
[17007961.189889] <IRQ> [<ffffffff8172d
[17007961.189913] [<ffffffff81726
[17007961.189922] [<ffffffff81731
[17007961.189929] [<ffffffff81731
[17007961.189935] [<ffffffff81731
[17007961.189947] [<ffffffff810a3
[17007961.189955] [<ffffffff810a9
[17007961.189962] [<ffffffff810a0
[17007961.189971] [<ffffffff81732
[17007961.189977] [<ffffffff8109e
[17007961.189987] [<ffffffff810cc
[17007961.189993] [<ffffffff810ca
[17007961.189999] [<ffffffff810ca
[17007961.190006] [<ffffffff810cc
[17007961.190047] [<ffffffffa01a8
[17007961.190063] [<ffffffffa0189
[17007961.190078] [<ffffffffa017f
[17007961.190087] [<ffffffff8161a
[17007961.190092] [<ffffffff8161a
[17007961.190102] [<ffffffffa018a
[17007961.190114] [<ffffffffa01a7
[17007961.190126] [<ffffffffa01a8
[17007961.190138] [<ffffffff8106f
[17007961.190145] [<ffffffff8106f
[17007961.190153] [<ffffffff81070
[17007961.190161] [<ffffffff81740
[17007961.190170] [<ffffffff81735
[17007961.190173] <EOI> [<ffffffff81051
[17007961.190190] [<ffffffff8101d
[17007961.190197] [<ffffffff8101e
[17007961.190205] [<ffffffff810c2
[17007961.190214] [<ffffffff81042
[17007961.190221] bad: scheduling from the idle thread!
[Other infos]
I have identified a patch series[2] which seems to fix the exact same situation.
A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware.
[1] - VM details :
Release: Trusty
Kernel: Ubuntu-3.13.0-135
[2] Upstream patch series
8ab18d7 VSOCK: Detach QP check should filter out non matching QPs.
8566b86 VSOCK: Fix lockdep issue.
4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context
[3] - https:/ /patchwork. kernel. org/patch/ 9948741/