qemu-kvm guest panic for AMD smp trusty guests

Bug #1379340 reported by Blair Bethwaite
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Trusty
Fix Released
Medium
Chris J Arges
Utopic
Fix Released
Medium
Chris J Arges

Bug Description

[Impact]
When using KVM on an AMD host with a kernel that has CONFIG_DEBUG_RODATA enabled, a guest with: multiple vCPUs, and exposing features to the guest such as tsc_adjust can cause a divide error on kvm_unlock_kick when booting the VM.

This impacts kernels 3.12+.

[Test Case]
1) Create a VM on an AMD host with appropriate features (Opteron 6xxx for example)
2) Edit virsh xml to have <cpu mode='host-passthrough'></cpu> and multiple vCPUs.
3) Boot VM with VGA console using virt-manager (I couldn't reproduce strictly monitoring via virsh console).

[Fix]
commit c1118b3602c2329671ad5ec8bdf8e374323d6343 upstream

--

Just upgraded OpenStack compute hosts in our public cloud (using qemu-kvm via libvirt) from Precise to Trusty (14.04.1), now on kernel 3.13.0-36-generic with qemu-kvm 2.0.0+dfsg-2ubuntu1.5.

Following the upgrade, whenever we try to start an smp/multicore Trusty guest (existing or new), we run into this panic [1] inside the guest just towards the end of boot. This happens consistently for smp guests using the Trusty kernel (i.e., it also affects earlier Ubuntus using the HWE kernel from Trusty but not their native versions). I didn't have any other distro images to hand with 3.13.x kernels, but none of the others I tested were affected (in the 3.2 - 3.16 kernel range).

There are scarce similar reports out there, but the one we did find pointed to a CPU feature as the trigger. We were running these hosts with libvirt cpu mode set to "host-passthrough" (so qemu starts with "-cpu host"), on AMD 6200 & 6300 Opteron hardware. Switching the guest domains to use cpu mode "host-model" instead works around the issue and is perfectly acceptable for most of our users.

We have various other Intel compute hosts and they don't seem to be affected.

(1)
[ 11.256924] divide error: 0000 [#1] SMP
[ 11.258133] Modules linked in: kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lp parport psmouse floppy
[ 11.260228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-36-generic #63-Ubuntu
[ 11.260228] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[ 11.260228] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 11.260228] RIP: 0010:[<ffffffff8104ed58>] [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[ 11.260228] RSP: 0018:ffff88023fc03c98 EFLAGS: 00010046
[ 11.260228] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
[ 11.260228] RDX: ffffffff81eaf408 RSI: 0000000000000000 RDI: 0000000000000000
[ 11.260228] RBP: ffff88023fc03cb8 R08: ffffffff81eaf400 R09: 00000000ffffffff
[ 11.260228] R10: ffff880037612cc0 R11: ffffea0002eb0a00 R12: ffff8800374a33c0
[ 11.260228] R13: 0000000000000020 R14: 0000000000000001 R15: 0000000000000286
[ 11.260228] FS: 00007f1e8b538740(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
[ 11.260228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 11.260228] CR2: 00007f1e8ae09d50 CR3: 0000000001c0e000 CR4: 00000000000406f0
[ 11.260228] Stack:
[ 11.260228] 0000000000000286 0000000000000001 0000000000000001 00000000000000c3
[ 11.260228] ffff88023fc03cc8 ffffffff81717ed6 ffff88023fc03ce0 ffffffff8172641a
[ 11.260228] ffff8800374a33c0 ffff88023fc03d18 ffffffff810aaeb0 ffff88023295e000
[ 11.260228] Call Trace:
[ 11.260228] <IRQ>
[ 11.260228] [<ffffffff81717ed6>] __ticket_unlock_slowpath+0x24/0x34
[ 11.260228] [<ffffffff8172641a>] _raw_spin_unlock_irqrestore+0x3a/0x40
[ 11.260228] [<ffffffff810aaeb0>] __wake_up_sync_key+0x50/0x60
[ 11.260228] [<ffffffff8160ca5a>] sock_def_readable+0x3a/0x70
[ 11.260228] [<ffffffff816fda0a>] packet_rcv+0x2fa/0x430
[ 11.260228] [<ffffffff816228b0>] __netif_receive_skb_core+0x360/0x840
[ 11.260228] [<ffffffff81622da8>] __netif_receive_skb+0x18/0x60
[ 11.260228] [<ffffffff81622e13>] netif_receive_skb+0x23/0x90
[ 11.260228] [<ffffffff815288d4>] virtnet_poll+0x4d4/0x850
[ 11.260228] [<ffffffff81623192>] net_rx_action+0x152/0x250
[ 11.260228] [<ffffffff8106cbac>] __do_softirq+0xec/0x2c0
[ 11.260228] [<ffffffff8106d0f5>] irq_exit+0x105/0x110
[ 11.260228] [<ffffffff817312d6>] do_IRQ+0x56/0xc0
[ 11.260228] [<ffffffff81726a6d>] common_interrupt+0x6d/0x6d
[ 11.260228] <EOI>
[ 11.260228] [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10
[ 11.260228] [<ffffffff8101c62f>] default_idle+0x1f/0xc0
[ 11.260228] [<ffffffff8101cef6>] arch_cpu_idle+0x26/0x30
[ 11.260228] [<ffffffff810bed95>] cpu_startup_entry+0xc5/0x290
[ 11.260228] [<ffffffff8170ca77>] rest_init+0x77/0x80
[ 11.260228] [<ffffffff81d35f6b>] start_kernel+0x433/0x43e
[ 11.260228] [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[ 11.260228] [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[ 11.260228] [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[ 11.260228] [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[ 11.260228] Code: 66 44 39 e8 75 bd 0f b6 35 f6 06 e6 00 40 84 f6 75 2a 83 05 06 07 e6 00 01 48 c7 c0 6a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9
[ 11.260228] RIP [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[ 11.260228] RSP <ffff88023fc03c98>
[ 11.260228] ---[ end trace f1c26ff24745b331 ]---
[ 11.260228] Kernel panic - not syncing: Fatal exception in interrupt
[ 11.260228] Shutting down cpus with NMI

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1379340] [NEW] qemu-kvm guest panic for smp trusty guests

 affects: ubuntu/qemu
 importance: high
 affects: qemu
 importance: high

no longer affects: qemu-kvm (Ubuntu)
summary: - qemu-kvm guest panic for smp trusty guests
+ qemu-kvm guest panic for AMD smp trusty guests
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug.

Would it be possible for you to test with the daily upstream kernel builds, to rule out any temporary kernel regressions? Instructions for that are here: https://wiki.ubuntu.com/Kernel/MainlineBuilds

I'll also build an uptodate daily qemu build so we can test whether it has been fixed there.

I don't currently have any amd box available (but should in a few weeks) to try and reproduce - and then hopefully bisect.

Revision history for this message
new23d (dhruvahuja) wrote :

I have more-or-less the same underlying hardware and tested with [1]. Same results. Keen to help in any possible way.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.11-trusty/linux-image-3.13.11-03131111-generic_3.13.11-03131111.201411111336_amd64.deb

Revision history for this message
new23d (dhruvahuja) wrote :

Update. Upon testing the 'daily' build as had been suggested, I found that [1] appears to work for me in a stably. I'll update this thread if I find it crashing in the near future.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/linux-image-3.18.0-999-generic_3.18.0-999.201411152105_amd64.deb

Paolo Bonzini (bonzini)
no longer affects: qemu
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Marked as affecting the kernel given tha last few commits - thank you for the information.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1379340

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Chris J Arges (arges)
Changed in qemu (Ubuntu):
assignee: nobody → Chris J Arges (arges)
status: New → In Progress
tags: added: fixed-upstream
Revision history for this message
Chris J Arges (arges) wrote :

Hi, could you provide the exact command used to launch qemu to produce this error? If using libvirt, you can find this in /var/log/libvirt/qemu/<name>.log if using virsh for example.
Thanks,
--chris

Revision history for this message
Chris J Arges (arges) wrote :

I'm able to reproduce this and currently it looks like its fixed in 3.18-rc1. The main things that need to be enabled are 'cpu mode=host-passthrough' and > 1 CPU. If 'cpu mode=host-model' is enabled and > 1 CPU things work fine.

The differences in cpu features were:
npt nrip_save tsc_adjust

I've also tried this on Intel platforms and was unable to reproduce, I was able to reproduce this on an AMD Opteron 6276 processor.

Working on finding the patch to backport.

Revision history for this message
Blair Bethwaite (blair-bethwaite) wrote :

Not keen on providing apport info as this is a production system with multiple tenants and we have no agreement with Canonical. The info here and verification from others indicates this is a simple bug to reproduce and not specific to our systems.

Revision history for this message
Blair Bethwaite (blair-bethwaite) wrote :

Here's an example of an affected qemu command, as already pointed out the relevant parts are the "-cpu host" and "-smp n" where n>1.

qemu-system-x86_64 -enable-kvm -name instance-000173d4 -S -machine pc-1.0,accel=kvm,usb=off -cpu host -m 16384 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e3ec1854-0b89-4384-*** -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=1-2013.2.2-a101-g1a048c5-1,serial=****,uuid=e3ec1854-0b89-4384-*** -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-000173d4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/e3ec1854-0b89-4384-9771-88c26764eef9/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/e3ec1854
-0b89-4384-9771-88c26764eef9/disk.local,if=none,id=drive-virtio-disk1,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=rbd:nova/volume-439f230a-12fa-4538-ab26-01c3f6617ec7:id=nova:key=***==:auth_supported=cephx\;none:mon_host=118.138***,if=none,id=drive-virtio-disk2,format=raw,serial=***,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:***,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/e3ec1854-0b89-4384-9771-88c26764eef9/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

tags: added: amd
Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: nobody → Chris J Arges (arges)
status: Incomplete → In Progress
no longer affects: qemu (Ubuntu)
no longer affects: qemu (Ubuntu Trusty)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Chris J Arges (arges)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Chris J Arges (arges)
Revision history for this message
new23d (dhruvahuja) wrote :

I am happy to run apport-collect if anybody still needs it. FYI, my Hypervisor is Centos 7 KVM. Guest is Ubuntu trusty . At which Kernel version and with how many CPUs would you like to have that command run?

Revision history for this message
Chris J Arges (arges) wrote :

@dhruvahuja
Thanks, but I can reproduce this issue, so no need to collect any more information.
Testing once I get a fix will be appreciated. : )

Revision history for this message
Chris J Arges (arges) wrote :

Marking vivid task Fix Released as I believe this is fixed in 3.18 series kernels (and Vivid will be there soon...)

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Chris J Arges (arges) wrote :
Download full text (3.8 KiB)

I think this patch fixes the issue:

https://lkml.org/lkml/2014/9/22/211

Looking at the stacktrace:

[ 4.690909] divide error: 0000 [#1] SMP
[ 4.690909] Modules linked in: dm_crypt kvm_amd kvm serio_raw isofs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy
[ 4.690909] CPU: 0 PID: 663 Comm: cloud-init Not tainted 3.13.0-40-generic #69-Ubuntu
[ 4.690909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 4.690909] task: ffff88001f373000 ti: ffff88001460a000 task.ti: ffff88001460a000
[ 4.690909] RIP: 0010:[<ffffffff8104ed58>] [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[ 4.690909] RSP: 0000:ffff88001fc03df0 EFLAGS: 00010046
[ 4.690909] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
[ 4.690909] RDX: ffffffff81eb1448 RSI: 0000000000000000 RDI: 0000000000000000
[ 4.690909] RBP: ffff88001fc03e10 R08: ffffffff81eb1440 R09: ffff880016000000
[ 4.690909] R10: 0000000000000006 R11: 561488f3089a6867 R12: ffffffff81fc66c0
[ 4.690909] R13: 0000000000000802 R14: 0000000000000001 R15: 00000000000000c2
[ 4.690909] FS: 00007fc269f46740(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[ 4.690909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.690909] CR2: 00007fc2665de050 CR3: 000000001f50f000 CR4: 00000000000406f0
[ 4.690909] Stack:
[ 4.690909] 0000000000000046 0000000000000060 0000000000000046 0000000000000020
[ 4.690909] ffff88001fc03e20 ffffffff81718b53 ffff88001fc03e38 ffffffff817270da
[ 4.690909] ffffffff81fc66c0 ffff88001fc03e70 ffffffff8146de04 ffffffff81fc66c0
[ 4.690909] Call Trace:
[ 4.690909] <IRQ>
[ 4.690909] [<ffffffff81718b53>] __ticket_unlock_slowpath+0x24/0x34
[ 4.690909] [<ffffffff817270da>] _raw_spin_unlock_irqrestore+0x3a/0x40
[ 4.690909] [<ffffffff8146de04>] serial8250_handle_irq.part.14+0x84/0xb0
[ 4.690909] [<ffffffff8146de77>] serial8250_default_handle_irq+0x27/0x30
[ 4.690909] [<ffffffff8146ce73>] serial8250_interrupt+0x63/0xe0
[ 4.690909] [<ffffffff810bf97e>] handle_irq_event_percpu+0x3e/0x1d0
[ 4.690909] [<ffffffff810bfb4d>] handle_irq_event+0x3d/0x60
[ 4.690909] [<ffffffff810c25d7>] handle_edge_irq+0x77/0x130
[ 4.690909] [<ffffffff81015dbe>] handle_irq+0x1e/0x30
[ 4.690909] [<ffffffff8173205d>] do_IRQ+0x4d/0xc0
[ 4.690909] [<ffffffff8172772d>] common_interrupt+0x6d/0x6d
[ 4.690909] <EOI>
[ 4.690909] Code: 66 44 39 e8 75 bd 0f b6 35 36 27 e6 00 40 84 f6 75 2a 83 05 46 27 e6 00 01 48 c7 c0 8a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9
[ 4.690909] RIP [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[ 4.690909] RSP <ffff88001fc03df0>

Looking at the objdump we see we get a Divide Error on a vmcall instruction.
In addition we build our kernels with CONFIG_DEBUG_RODATA and PV locking.

static void kvm_kick_cpu(int cpu)
{
        int apicid;
        unsigned long flags = 0;

        apicid = per_cpu(x86_cpu_to_apicid, cpu);
ffffffff8104ed46: 48 c7 c0 8a b0 00 00 mov $0xb08a,%rax

static inl...

Read more...

Chris J Arges (arges)
description: updated
Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → nobody
description: updated
Changed in linux (Ubuntu Trusty):
status: New → In Progress
Changed in linux (Ubuntu Utopic):
status: New → In Progress
Changed in linux (Ubuntu Trusty):
importance: Undecided → Medium
Changed in linux (Ubuntu Utopic):
importance: Undecided → Medium
Andy Whitcroft (apw)
Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

I have this issue running an Ubuntu 12.04.5 guest (running kernel 3.13.0-43-generic) on a KVM Hypervisor with the same configuration (Ubuntu 12.04.5, 3.13.0-43, CPU 4x AMD Opteron(TM) Processor 6274, Dell R915).

I'm unable to boot the guest the 80% of the trials. The error is reproducible even with a specific guest CPU configured or with the QEMU generic one. This is the output from a brand new Ubuntu 12.04.5 installation:

[ 5.136174] divide error: 0000 [#1] SMP
[ 5.139614] Modules linked in: floppy
[ 5.143686] CPU: 1 PID: 36 Comm: migration/1 Not tainted 3.13.0-43-generic #72~precise1-Ubuntu
[ 5.144868] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 5.144868] task: ffff880409699800 ti: ffff8804096a0000 task.ti: ffff8804096a0000
[ 5.144868] RIP: 0010:[<ffffffff81051e64>] [<ffffffff81051e64>] kvm_unlock_kick+0xa4/0x100
[ 5.144868] RSP: 0018:ffff8804096a1cf8 EFLAGS: 00010046
[ 5.144868] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000002
[ 5.144868] RDX: 0000000000000000 RSI: ffff88041fc53480 RDI: 0000000000000100
[ 5.144868] RBP: ffff8804096a1d18 R08: ffffffff81eb54e8 R09: 0000000000000000
[ 5.144868] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88041fc53480
[ 5.144868] R13: 000000000000166c R14: 0000000000000002 R15: ffff88041fc53480
[ 5.144868] FS: 00007f8557f1f700(0000) GS:ffff88041fc20000(0000) knlGS:0000000000000000
[ 5.144868] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 5.144868] CR2: 00007f8556b2f570 CR3: 0000000404ee2000 CR4: 00000000000006e0
[ 5.144868] Stack:
[ 5.144868] ffff880404dd0000 0000000000000002 ffff880404dd0654 ffff88041fc33480
[ 5.144868] ffff8804096a1d28 ffffffff8174b8ba ffff8804096a1d38 ffffffff817645fa
[ 5.144868] ffff8804096a1d98 ffffffff8109d37b ffff8804096a1d78 0000000000000002
[ 5.144868] Call Trace:
[ 5.144868] [<ffffffff8174b8ba>] __ticket_unlock_slowpath+0x2e/0x32
[ 5.144868] [<ffffffff817645fa>] _raw_spin_unlock+0x2a/0x30
[ 5.144868] [<ffffffff8109d37b>] __migrate_task+0xcb/0x180
[ 5.144868] [<ffffffff8109d430>] ? __migrate_task+0x180/0x180
[ 5.144868] [<ffffffff8109d453>] migration_cpu_stop+0x23/0x30
[ 5.144868] [<ffffffff810fb8d3>] cpu_stopper_thread+0x83/0x150
[ 5.144868] [<ffffffff817606be>] ? __schedule+0x38e/0x700
[ 5.144868] [<ffffffff8109704d>] smpboot_thread_fn+0xfd/0x180
[ 5.144868] [<ffffffff81096f50>] ? SyS_setgroups+0x170/0x170
[ 5.144868] [<ffffffff8108fb59>] kthread+0xc9/0xe0
[ 5.144868] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0
[ 5.144868] [<ffffffff8176d5bc>] ret_from_fork+0x7c/0xb0
[ 5.144868] [<ffffffff8108fa90>] ? flush_kthread_worker+0xb0/0xb0
[ 5.144868] Code: 08 66 44 39 ea 75 c0 0f b6 15 e9 35 e6 00 84 d2 75 2e 83 05 fa 35 e6 00 01 48 c7 c0 8a a0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 66 0f 1f 84 00 00 00 00 00 5b 41 5c 41 5d 41 5e 5d c3
[ 5.144868] RIP [<ffffffff81051e64>] kvm_unlock_kick+0xa4/0x100
[ 5.144868] RSP <ffff8804096a1cf8>
[ 5.144868] ---[ end trace 9f5442e8ee6f35f7 ]---

Attached you can find the full NMI trace (12 vCPU).

Revision history for this message
Chris J Arges (arges) wrote :

@daniele-vigano:

The fix has not yet been released, but I believe it will be in 3.13.0-44. Thanks for the report,
--chris

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-needed-utopic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

I can confirm that the issue looks fixed using 3.13.0-44.73~precise1 from precise-proposed.

I made more than ten reboots with the new kernel installed on VMs and all were successful. Then I switched back to the previous 3.13.0-43.72~precise1 and VMs boot failed at the first trial. Again back to propose 3.13.0-44.73~precise1 at VM were booting fine.

I'm not using Trusty on these AMD machines so I cannot test it. But the kernel is the same as precise so the issue should be fixed also on Trusty.

tags: added: verification-done-precise
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

During the weekend I was able to reproduce the issue using a Trusty VM with kernel 3.13.0-43.72
kernel 3.13.0-44.73 fixes the issue.

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Chris J Arges (arges) wrote :

I've verified with a Utopic guest running 3.16.0-29.39.

tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.8 KiB)

This bug was fixed in the package linux - 3.16.0-29.39

---------------
linux (3.16.0-29.39) utopic; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1402822

  [ AceLan Kao ]

  * SAUCE: Add use_native_backlight quirk for HP ProBook 6570b
    - LP: #1359010

  [ Andy Whitcroft ]

  * Revert "SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code"
    - LP: #1398596
  * [Config] updateconfigs to balance CONFIG_SCOM_DEBUGFS

  [ Paolo Pisati ]

  * [Config] armhf: VIRTIO_[BALLOON|MMIO]=y

  [ Upstream Kernel Changes ]

  * Revert "arm64: Make default dma_ops to be noncoherent"
    - LP: #1386490
  * Revert "percpu: free percpu allocation info for uniprocessor system"
    - LP: #1401079
  * ath3k: Add support of MCI 13d3:3408 bt device
    - LP: #1395465
  * x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is
    read-only
    - LP: #1379340
  * cpufreq: Allow stop CPU callback to be used by all cpufreq drivers
    - LP: #1397928
  * cpufreq: powernv: Set the pstate of the last hotplugged out cpu in
    policy->cpus to minimum
    - LP: #1397928
  * cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec
    - LP: #1397928
  * xen-netfront: Remove BUGs on paged skb data which crosses a page
    boundary
    - LP: #1275879
  * ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
    - LP: #1383589
  * iwlwifi: add device / firmware to fw-error-dump file
    - LP: #1399440
  * iwlwifi: rename iwl_mvm_fw_error_next_data
    - LP: #1399440
  * iwlwifi: pcie: add firmware monitor capabilities
    - LP: #1399440
  * iwlwifi: remove wrong comment about alignment in iwl-fw-error-dump.h
    - LP: #1399440
  * iwlwifi: mvm: don't collect logs in the interrupt thread
    - LP: #1399440
  * iwlwifi: mvm: kill iwl_mvm_fw_error_rxf_dump
    - LP: #1399440
  * iwlwifi: mvm: update layout of firmware error dump
    - LP: #1399440
  * powerpc/pseries: Fix endiannes issue in RTAS call from xmon
    - LP: #1396235
  * mmc: sdhci-pci-o2micro: Fix Dell E5440 issue
    - LP: #1346067
  * mfd: rtsx: Fix PM suspend for 5227 & 5249
    - LP: #1359052
  * samsung-laptop: Add broken-acpi-video quirk for NC210/NC110
    - LP: #1401079
  * acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
    - LP: #1401079
  * pinctrl: baytrail: show output gpio state correctly on Intel Baytrail
    - LP: #1401079
  * ALSA: hda - Add dock support for Thinkpad T440 (17aa:2212)
    - LP: #1401079
  * ALSA: hda - Add ultra dock support for Thinkpad X240.
    - LP: #1401079
  * rbd: Fix error recovery in rbd_obj_read_sync()
    - LP: #1401079
  * ds3000: fix LNB supply voltage on Tevii S480 on initialization
    - LP: #1401079
  * powerpc: do_notify_resume can be called with bad thread_info flags
    argument
    - LP: #1401079
  * powerpc/powernv: Properly fix LPC debugfs endianness
    - LP: #1401079
  * irqchip: armada-370-xp: Fix MSI interrupt handling
    - LP: #1401079
  * irqchip: armada-370-xp: Fix MPIC interrupt handling
    - LP: #1401079
  * USB: kobil_sct: fix non-atomic allocation in write path
    - LP: #1401079
  * USB: opticon: fix non-atomic allocation in write path
    - LP: #14010...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.8 KiB)

This bug was fixed in the package linux - 3.13.0-44.73

---------------
linux (3.13.0-44.73) trusty; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1402872

  [ AceLan Kao ]

  * SAUCE: Add use_native_backlight quirk for HP ProBook 6570b
    - LP: #1359010

  [ Andy Whitcroft ]

  * Revert "SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code"
    - LP: #1398596
  * [Config] updateconfigs to balance CONFIG_SCOM_DEBUGFS

  [ Upstream Kernel Changes ]

  * iwlwifi: mvm: fix merge damage
    - LP: #1393317
  * iwlwifi: remove IWL_UCODE_TLV_FLAGS_SCHED_SCAN flag
    - LP: #1393317
  * iwlwifi: mvm: disable scheduled scan to prevent firmware crash
    - LP: #1393317
  * iwlwifi: mvm: enable scheduled scan on newest firmware
    - LP: #1393317
  * x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is
    read-only
    - LP: #1379340
  * phylib: introduce PHY_INTERFACE_MODE_XGMII for 10G PHY
    - LP: #1381084
  * of: make of_get_phy_mode parse 'phy-connection-type'
    - LP: #1381084
  * xen-netfront: Remove BUGs on paged skb data which crosses a page
    boundary
    - LP: #1275879
  * ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
    - LP: #1383589
  * powerpc/pseries: Fix endiannes issue in RTAS call from xmon
    - LP: #1396235
  * mmc: sdhci-pci-o2micro: Fix Dell E5440 issue
    - LP: #1346067
  * mfd: rtsx: Fix PM suspend for 5227 & 5249
    - LP: #1359052
  * drivers:scsi:storvsc: Fix a bug in handling ring buffer failures that
    may result in I/O freeze
    - LP: #1400289
  * arm64: optimized copy_to_user and copy_from_user assembly code
    - LP: #1400349
  * net:socket: set msg_namelen to 0 if msg_name is passed as NULL in
    msghdr struct from userland.
    - LP: #1335478
  * drm/radeon: initialize sadb to NULL in the audio code
    - LP: #1402714
  * powerpc/vphn: NUMA node code expects big-endian
    - LP: #1401150
  * ALSA: usb-audio: Fix device_del() sysfs warnings at disconnect
    - LP: #1402853
  * ALSA: hda - Add mute LED pin quirk for HP 15 touchsmart
    - LP: #1334950, #1402853
  * rcu: Make callers awaken grace-period kthread
    - LP: #1402853
  * rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
    - LP: #1402853
  * net: sctp: fix NULL pointer dereference in af->from_addr_param on
    malformed packet
    - LP: #1402853
  * KVM: x86: Don't report guest userspace emulation error to userspace
    - LP: #1402853
  * [media] ttusb-dec: buffer overflow in ioctl
    - LP: #1402853
  * arm64: __clear_user: handle exceptions on strb
    - LP: #1402853
  * ARM: pxa: fix hang on startup with DEBUG_LL
    - LP: #1402853
  * samsung-laptop: Add broken-acpi-video quirk for NC210/NC110
    - LP: #1402853
  * acer-wmi: Add Aspire 5741 to video_vendor_dmi_table
    - LP: #1402853
  * acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
    - LP: #1402853
  * rbd: Fix error recovery in rbd_obj_read_sync()
    - LP: #1402853
  * [media] ds3000: fix LNB supply voltage on Tevii S480 on initialization
    - LP: #1402853
  * powerpc: do_notify_resume can be called with bad thread_info flags
    argument
    - LP: #1402853
  * USB: kobil_sct: f...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Lei Zhang (lei-zhang-s) wrote :

Has this bug been fixed in 3.13.0-44.73?
I recently upgraded some of our precise compute nodes(Dell C6145 AMD Opteron(tm) Processor 6376) to trusty, with the latest kernel 3.13.0-45-generic #74-Ubuntu, but still seeing this kernel panic in vm's console log.

Using libvirt_cpu_mode=host-model can fix it, but good to know if this bug still in the latest kernel?

Thanks,

Revision history for this message
Chris J Arges (arges) wrote :

@lei-zhang-s

This fix needs to be applied to the VM's kernel. So ensure those images are updated as well.
Thanks!

Revision history for this message
Lei Zhang (lei-zhang-s) wrote :

Hi Chris,
 I can confirm that VMs with latest kernel have no issues. Thanks a lot!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.