Activity log for bug #1780470

Date Who What changed Old value New value Message
2018-07-06 17:56:52 Eric Desrochers bug added bug
2018-07-06 18:00:05 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-07-06 18:00:08 Ubuntu Kernel Bot tags trusty
2018-07-06 18:04:30 Eric Desrochers nominated for series Ubuntu Trusty
2018-07-06 18:04:30 Eric Desrochers bug task added linux (Ubuntu Trusty)
2018-07-06 18:04:40 Eric Desrochers linux (Ubuntu): status Incomplete Fix Released
2018-07-06 18:04:45 Eric Desrochers linux (Ubuntu Trusty): importance Undecided Medium
2018-07-06 18:04:54 Eric Desrochers linux (Ubuntu Trusty): assignee Eric Desrochers (slashd)
2018-07-06 18:04:56 Eric Desrochers linux (Ubuntu Trusty): status New In Progress
2018-07-06 18:05:21 Eric Desrochers summary BUG: scheduling while atomic (v3.13 + VMware 6.0 and late) BUG: scheduling while atomic (Kernel : Ubuntu-3.13 + VMware: 6.0 and late)
2018-07-06 18:11:06 eschwab bug added subscriber oedstero
2018-07-06 18:11:56 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. The crashes started after the above move (5.5->6.5). Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. The crashes started after the above move (5.5->6.5). Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 586fbeba3fa97614d9124f8d56b4000c81368ff4 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700 VSOCK: sock_put wasn't safe to call in interrupt context ... Multiple customers have been hitting this issue when using VMware tools on vSphere 2015. ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-07-06 18:12:42 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. The crashes started after the above move (5.5->6.5). Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 586fbeba3fa97614d9124f8d56b4000c81368ff4 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700 VSOCK: sock_put wasn't safe to call in interrupt context ... Multiple customers have been hitting this issue when using VMware tools on vSphere 2015. ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. The crashes started after the above move (5.5->6.5). Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-07-11 01:24:19 Eric Desrochers tags trusty sts trusty
2018-07-11 01:25:15 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. The crashes started after the above move (5.5->6.5). Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-07-11 01:25:42 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-07-11 20:16:54 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Test Case] - Run a Trusty kernel (3.13 kernel series) VMware guests hosted on VMware 6.0 and later (with VMCI (Virtual Machine Communication Interface) turn on) - Wait for the crash to happen. It is happening randomly with no clear reproducible steps on how to trigger the crash for now. [Regression Potential] A test kernel is now being tested on VMware 5.5 (where the situation doesn't happen but we are testing anyway in order to look for potential regression in earlier VMware environment version) and Vmware 6.5 (where the situation happens). We basically want to make sure the test package will work good on both VMware env before thinking of proposing the patch series to the Ubuntu kernel ML. [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-07-30 17:51:25 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Test Case] - Run a Trusty kernel (3.13 kernel series) VMware guests hosted on VMware 6.0 and later (with VMCI (Virtual Machine Communication Interface) turn on) - Wait for the crash to happen. It is happening randomly with no clear reproducible steps on how to trigger the crash for now. [Regression Potential] A test kernel is now being tested on VMware 5.5 (where the situation doesn't happen but we are testing anyway in order to look for potential regression in earlier VMware environment version) and Vmware 6.5 (where the situation happens). We basically want to make sure the test package will work good on both VMware env before thinking of proposing the patch series to the Ubuntu kernel ML. [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Test Case] - Run a Trusty kernel (3.13 kernel series) VMware guests hosted on VMware 6.0 and later (with VMCI (Virtual Machine Communication Interface) turn on) - Wait for the crash to happen. It is happening randomly with no clear reproducible steps on how to trigger the crash for now. [Regression Potential] A test kernel has been tested on VMware 5.5 (where the situation doesn't happen but we are testing anyway in order to look for potential regression in earlier VMware environment version) and Vmware 6.5 (where the situation happens). I basically wanted to make sure the test package will work good on both VMware env before thinking of proposing the patch series to the Ubuntu kernel ML. And it does so far, see Comment #5 from an affected user. [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-08-17 14:33:31 Eric Desrochers description [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Test Case] - Run a Trusty kernel (3.13 kernel series) VMware guests hosted on VMware 6.0 and later (with VMCI (Virtual Machine Communication Interface) turn on) - Wait for the crash to happen. It is happening randomly with no clear reproducible steps on how to trigger the crash for now. [Regression Potential] A test kernel has been tested on VMware 5.5 (where the situation doesn't happen but we are testing anyway in order to look for potential regression in earlier VMware environment version) and Vmware 6.5 (where the situation happens). I basically wanted to make sure the test package will work good on both VMware env before thinking of proposing the patch series to the Ubuntu kernel ML. And it does so far, see Comment #5 from an affected user. [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/ [Impact] It has been brought to my attention that VMware Guest[1] randomly crashes after moving the VMs from a VMware "5.5" env to VMware 6.5 env. Notes: * The crashes wasn't present in VMware 5.5 (with the same VMs). Only started to happens with Vmware 6.5 * The Trusty HWE kernel (Ubuntu-4.4.0-X) doesn't exhibit the situation on VMware 6.5. Here's the stack trace took from the .vmss converted to be readable by Linux debugger: [17007961.187411] BUG: scheduling while atomic: swapper/3/0/0x00000100 [17007961.189794] Modules linked in: arc4 md4 nls_utf8 cifs nfsv3 nfs_acl nfsv4 nfs lockd sunrpc fscache veth ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack bridge stp llc aufs vmw_vsock_vmci_transport vsock ppdev vmwgfx serio_raw coretemp ttm drm vmw_balloon vmw_vmci shpchp i2c_piix4 parport_pc mac_hid xfs lp libcrc32c parport psmouse floppy vmw_pvscsi vmxnet3 pata_acpi [17007961.189856] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-135-generic #184-Ubuntu [17007961.189862] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 10/22/2013 [17007961.189867] 0000000000000000 ffff88042f263b90 ffffffff8172d959 ffff88042f263d30 [17007961.189874] ffff88042f273180 ffff88042f263ba0 ffffffff81726d8c ffff88042f263c00 [17007961.189879] ffffffff81731c8f ffff880428c29800 0000000000013180 ffff880428c25fd8 [17007961.189885] Call Trace: [17007961.189889] <IRQ> [<ffffffff8172d959>] dump_stack+0x64/0x82 [17007961.189913] [<ffffffff81726d8c>] __schedule_bug+0x4c/0x5a [17007961.189922] [<ffffffff81731c8f>] __schedule+0x6af/0x7f0 [17007961.189929] [<ffffffff81731df9>] schedule+0x29/0x70 [17007961.189935] [<ffffffff81731049>] schedule_timeout+0x279/0x310 [17007961.189947] [<ffffffff810a357b>] ? select_task_rq_fair+0x56b/0x6f0 [17007961.189955] [<ffffffff810a9852>] ? enqueue_task_fair+0x422/0x6d0 [17007961.189962] [<ffffffff810a0de5>] ? sched_clock_cpu+0xb5/0x100 [17007961.189971] [<ffffffff81732906>] wait_for_completion+0xa6/0x150 [17007961.189977] [<ffffffff8109e2a0>] ? wake_up_state+0x20/0x20 [17007961.189987] [<ffffffff810ccce0>] ? __call_rcu+0x2d0/0x2d0 [17007961.189993] [<ffffffff810ca2eb>] wait_rcu_gp+0x4b/0x60 [17007961.189999] [<ffffffff810ca280>] ? ftrace_raw_output_rcu_utilization+0x50/0x50 [17007961.190006] [<ffffffff810cc45a>] synchronize_sched+0x3a/0x50 [17007961.190047] [<ffffffffa01a8936>] vmci_event_unsubscribe+0x76/0xb0 [vmw_vmci] [17007961.190063] [<ffffffffa01895f1>] vmci_transport_destruct+0x21/0xe0 [vmw_vsock_vmci_transport] [17007961.190078] [<ffffffffa017f837>] vsock_sk_destruct+0x17/0x60 [vsock] [17007961.190087] [<ffffffff8161a9df>] __sk_free+0x1f/0x180 [17007961.190092] [<ffffffff8161ab59>] sk_free+0x19/0x20 [17007961.190102] [<ffffffffa018a2c0>] vmci_transport_recv_stream_cb+0x200/0x2f0 [vmw_vsock_vmci_transport] [17007961.190114] [<ffffffffa01a7efc>] vmci_datagram_invoke_guest_handler+0xbc/0xf0 [vmw_vmci] [17007961.190126] [<ffffffffa01a8dbf>] vmci_dispatch_dgs+0xcf/0x230 [vmw_vmci] [17007961.190138] [<ffffffff8106f8ee>] tasklet_action+0x11e/0x130 [17007961.190145] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310 [17007961.190153] [<ffffffff81070315>] irq_exit+0x105/0x110 [17007961.190161] [<ffffffff817407e6>] do_IRQ+0x56/0xc0 [17007961.190170] [<ffffffff81735e6d>] common_interrupt+0x6d/0x6d [17007961.190173] <EOI> [<ffffffff81051586>] ? native_safe_halt+0x6/0x10 [17007961.190190] [<ffffffff8101db7f>] default_idle+0x1f/0x100 [17007961.190197] [<ffffffff8101e496>] arch_cpu_idle+0x26/0x30 [17007961.190205] [<ffffffff810c2b91>] cpu_startup_entry+0xc1/0x2b0 [17007961.190214] [<ffffffff810427fd>] start_secondary+0x21d/0x2d0 [17007961.190221] bad: scheduling from the idle thread! [Test Case] - Run a Trusty kernel (3.13 kernel series) VMware guests hosted on VMware 6.0 and later (with VMCI (Virtual Machine Communication Interface) turn on) - Wait for the crash to happen. It is happening randomly with no clear reproducible steps on how to trigger the crash for now. [Regression Potential] * A test kernel has been tested on VMware 5.5 (where the situation doesn't happen but we are testing anyway in order to look for potential regression in earlier VMware environment version) and Vmware 6.5 (where the situation happens). I basically wanted to make sure the test package will work good on both VMware env before thinking of proposing the patch series to the Ubuntu kernel ML. And it does so far, see Comment #5 from an affected user. * Limited risk due to changes being made only to a very specific driver. [Other infos] I have identified a patch series[2] which seems to fix the exact same situation. A full discussion can be found on patchworks[3], suggesting a certain patch series[2] authored by Vmware. [1] - VM details : Release: Trusty Kernel: Ubuntu-3.13.0-135 [2] Upstream patch series 8ab18d7 VSOCK: Detach QP check should filter out non matching QPs. 8566b86 VSOCK: Fix lockdep issue. 4ef7ea9 VSOCK: sock_put wasn't safe to call in interrupt context ---------------- commit 4ef7ea9 Author: Jorgen Hansen <jhansen@vmware.com> Date: Wed Oct 21 04:53:56 2015 -0700     VSOCK: sock_put wasn't safe to call in interrupt context     ...     Multiple customers have been hitting this issue when using     VMware tools on vSphere 2015.     ... ---------------- VSphere 2015 == VMware 6.0 (release in 2015) and late. [3] - https://patchwork.kernel.org/patch/9948741/
2018-08-24 15:13:04 Kleber Sacilotto de Souza linux (Ubuntu Trusty): status In Progress Fix Committed
2018-08-27 11:05:02 Brad Figg tags sts trusty sts trusty verification-needed-trusty
2018-08-29 14:48:42 Eric Desrochers tags sts trusty verification-needed-trusty sts trusty verification-done-trusty
2018-09-10 17:53:42 Launchpad Janitor linux (Ubuntu Trusty): status Fix Committed Fix Released
2018-09-10 17:53:42 Launchpad Janitor cve linked 2018-3620
2018-09-10 17:53:42 Launchpad Janitor cve linked 2018-3646