Just upgraded OpenStack compute hosts in our public cloud (using qemu-kvm via libvirt) from Precise to Trusty (14.04.1), now on kernel 3.13.0-36-generic with qemu-kvm 2.0.0+dfsg-2ubuntu1.5.
Following the upgrade, whenever we try to start an smp/multicore Trusty guest (existing or new), we run into this panic [1] inside the guest just towards the end of boot. This happens consistently for smp guests using the Trusty kernel (i.e., it also affects earlier Ubuntus using the HWE kernel from Trusty but not their native versions). I didn't have any other distro images to hand with 3.13.x kernels, but none of the others I tested were affected (in the 3.2 - 3.16 kernel range).
There are scarce similar reports out there, but the one we did find pointed to a CPU feature as the trigger. We were running these hosts with libvirt cpu mode set to "host-passthrough" (so qemu starts with "-cpu host"), on AMD 6200 & 6300 Opteron hardware. Switching the guest domains to use cpu mode "host-model" instead works around the issue and is perfectly acceptable for most of our users.
We have various other Intel compute hosts and they don't seem to be affected.
Just upgraded OpenStack compute hosts in our public cloud (using qemu-kvm via libvirt) from Precise to Trusty (14.04.1), now on kernel 3.13.0-36-generic with qemu-kvm 2.0.0+dfsg- 2ubuntu1. 5.
Following the upgrade, whenever we try to start an smp/multicore Trusty guest (existing or new), we run into this panic [1] inside the guest just towards the end of boot. This happens consistently for smp guests using the Trusty kernel (i.e., it also affects earlier Ubuntus using the HWE kernel from Trusty but not their native versions). I didn't have any other distro images to hand with 3.13.x kernels, but none of the others I tested were affected (in the 3.2 - 3.16 kernel range).
There are scarce similar reports out there, but the one we did find pointed to a CPU feature as the trigger. We were running these hosts with libvirt cpu mode set to "host-passthrough" (so qemu starts with "-cpu host"), on AMD 6200 & 6300 Opteron hardware. Switching the guest domains to use cpu mode "host-model" instead works around the issue and is perfectly acceptable for most of our users.
We have various other Intel compute hosts and they don't seem to be affected.
(1) ffffffff8104ed5 8>] [<ffffffff8104e d58>] kvm_unlock_ kick+0xa8/ 0x100 c03c98 EFLAGS: 00010046 0(0000) GS:ffff88023fc0 0000(0000) knlGS:000000000 0000000 ed6>] __ticket_ unlock_ slowpath+ 0x24/0x34 41a>] _raw_spin_ unlock_ irqrestore+ 0x3a/0x40 eb0>] __wake_ up_sync_ key+0x50/ 0x60 a5a>] sock_def_ readable+ 0x3a/0x70 a0a>] packet_ rcv+0x2fa/ 0x430 8b0>] __netif_ receive_ skb_core+ 0x360/0x840 da8>] __netif_ receive_ skb+0x18/ 0x60 e13>] netif_receive_ skb+0x23/ 0x90 8d4>] virtnet_ poll+0x4d4/ 0x850 192>] net_rx_ action+ 0x152/0x250 bac>] __do_softirq+ 0xec/0x2c0 0f5>] irq_exit+ 0x105/0x110 2d6>] do_IRQ+0x56/0xc0 a6d>] common_ interrupt+ 0x6d/0x6d 596>] ? native_ safe_halt+ 0x6/0x10 62f>] default_ idle+0x1f/ 0xc0 ef6>] arch_cpu_ idle+0x26/ 0x30 d95>] cpu_startup_ entry+0xc5/ 0x290 a77>] rest_init+0x77/0x80 f6b>] start_kernel+ 0x433/0x43e 941>] ? repair_ env_string+ 0x5c/0x5c 120>] ? early_idt_ handlers+ 0x120/0x120 5ee>] x86_64_ start_reservati ons+0x2a/ 0x2c 733>] x86_64_ start_kernel+ 0x143/0x152 d58>] kvm_unlock_ kick+0xa8/ 0x100
[ 11.256924] divide error: 0000 [#1] SMP
[ 11.258133] Modules linked in: kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lp parport psmouse floppy
[ 11.260228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-36-generic #63-Ubuntu
[ 11.260228] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[ 11.260228] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 11.260228] RIP: 0010:[<
[ 11.260228] RSP: 0018:ffff88023f
[ 11.260228] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
[ 11.260228] RDX: ffffffff81eaf408 RSI: 0000000000000000 RDI: 0000000000000000
[ 11.260228] RBP: ffff88023fc03cb8 R08: ffffffff81eaf400 R09: 00000000ffffffff
[ 11.260228] R10: ffff880037612cc0 R11: ffffea0002eb0a00 R12: ffff8800374a33c0
[ 11.260228] R13: 0000000000000020 R14: 0000000000000001 R15: 0000000000000286
[ 11.260228] FS: 00007f1e8b53874
[ 11.260228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 11.260228] CR2: 00007f1e8ae09d50 CR3: 0000000001c0e000 CR4: 00000000000406f0
[ 11.260228] Stack:
[ 11.260228] 0000000000000286 0000000000000001 0000000000000001 00000000000000c3
[ 11.260228] ffff88023fc03cc8 ffffffff81717ed6 ffff88023fc03ce0 ffffffff8172641a
[ 11.260228] ffff8800374a33c0 ffff88023fc03d18 ffffffff810aaeb0 ffff88023295e000
[ 11.260228] Call Trace:
[ 11.260228] <IRQ>
[ 11.260228] [<ffffffff81717
[ 11.260228] [<ffffffff81726
[ 11.260228] [<ffffffff810aa
[ 11.260228] [<ffffffff8160c
[ 11.260228] [<ffffffff816fd
[ 11.260228] [<ffffffff81622
[ 11.260228] [<ffffffff81622
[ 11.260228] [<ffffffff81622
[ 11.260228] [<ffffffff81528
[ 11.260228] [<ffffffff81623
[ 11.260228] [<ffffffff8106c
[ 11.260228] [<ffffffff8106d
[ 11.260228] [<ffffffff81731
[ 11.260228] [<ffffffff81726
[ 11.260228] <EOI>
[ 11.260228] [<ffffffff8104f
[ 11.260228] [<ffffffff8101c
[ 11.260228] [<ffffffff8101c
[ 11.260228] [<ffffffff810be
[ 11.260228] [<ffffffff8170c
[ 11.260228] [<ffffffff81d35
[ 11.260228] [<ffffffff81d35
[ 11.260228] [<ffffffff81d35
[ 11.260228] [<ffffffff81d35
[ 11.260228] [<ffffffff81d35
[ 11.260228] Code: 66 44 39 e8 75 bd 0f b6 35 f6 06 e6 00 40 84 f6 75 2a 83 05 06 07 e6 00 01 48 c7 c0 6a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9
[ 11.260228] RIP [<ffffffff8104e
[ 11.260228] RSP <ffff88023fc03c98>
[ 11.260228] ---[ end trace f1c26ff24745b331 ]---
[ 11.260228] Kernel panic - not syncing: Fatal exception in interrupt
[ 11.260228] Shutting down cpus with NMI