kernel crash (Oops) with kvm

Bug #476332 reported by pingou67
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
Undecided
Unassigned
qemu-kvm (Ubuntu)
Invalid
High
Unassigned

Bug Description

last night kernel crash on kvm host (64 bits kernel, karmic) :

Nov 6 05:39:43 bes12 kernel: [1500910.747443] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Nov 6 05:39:43 bes12 kernel: [1500910.747500] IP: [<ffffffffa01cae56>] kpit_elapsed+0x46/0x80 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.747546] PGD 6114de067 PUD 610077067 PMD 0
Nov 6 05:39:43 bes12 kernel: [1500910.747579] Oops: 0000 [#1] SMP
Nov 6 05:39:43 bes12 kernel: [1500910.747607] last sysfs file: /sys/devices/virtual/block/dm-3/uevent
Nov 6 05:39:43 bes12 kernel: [1500910.747637] CPU 6
Nov 6 05:39:43 bes12 kernel: [1500910.747659] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs tun ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp kvm_intel kvm 8021q garp radeon ttm iptable_filter bridge ip_tables drm i2c_algo_bit stp x_tables ipmi_si psmouse lp ipmi_msghandler hpilo serio_raw bnx2 parport usbhid cciss
Nov 6 05:39:43 bes12 kernel: [1500910.747906] Pid: 12831, comm: kvm Not tainted 2.6.31-14-server #48-Ubuntu ProLiant DL360 G6
Nov 6 05:39:43 bes12 kernel: [1500910.747952] RIP: 0010:[<ffffffffa01cae56>] [<ffffffffa01cae56>] kpit_elapsed+0x46/0x80 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748031] RSP: 0018:ffff880571bafb18 EFLAGS: 00010202
Nov 6 05:39:43 bes12 kernel: [1500910.748058] RAX: 0000000000000000 RBX: ffff880610915400 RCX: 0000000000000000
Nov 6 05:39:43 bes12 kernel: [1500910.748103] RDX: 0000000000000000 RSI: ffff880610915460 RDI: ffff880610114000
Nov 6 05:39:43 bes12 kernel: [1500910.748147] RBP: ffff880571bafb38 R08: 0000000000000043 R09: 0000000000000020
Nov 6 05:39:43 bes12 kernel: [1500910.748191] R10: 0000000000000000 R11: 0000000000000000 R12: 0005589e6e49f33e
Nov 6 05:39:43 bes12 kernel: [1500910.748236] R13: ffff880610915400 R14: ffff880610114000 R15: ffff880610915400
Nov 6 05:39:43 bes12 kernel: [1500910.748281] FS: 00007fa1b18b3910(0000) GS:ffffc90000c00000(0000) knlGS:0000000000000000
Nov 6 05:39:43 bes12 kernel: [1500910.748327] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Nov 6 05:39:43 bes12 kernel: [1500910.748355] CR2: 0000000000000028 CR3: 000000060f1ae000 CR4: 00000000000026e0
Nov 6 05:39:43 bes12 kernel: [1500910.748399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 6 05:39:43 bes12 kernel: [1500910.748444] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 6 05:39:43 bes12 kernel: [1500910.748518] Process kvm (pid: 12831, threadinfo ffff880571bae000, task ffff880570144410)
Nov 6 05:39:43 bes12 kernel: [1500910.748563] Stack:
Nov 6 05:39:43 bes12 kernel: [1500910.748584] 00000000dff88ff8 ffffffffff5fd0b0 ffff880610915400 0000000000000000
Nov 6 05:39:43 bes12 kernel: [1500910.748622] <0> ffff880571bafb78 ffffffffa01cb02f ffff880571bafb88 ffffffffa01b8f20
Nov 6 05:39:43 bes12 kernel: [1500910.748677] <0> ffff880610915460 0000000000000000 ffff880610114000 0000000000000000
Nov 6 05:39:43 bes12 kernel: [1500910.748748] Call Trace:
Nov 6 05:39:43 bes12 kernel: [1500910.748781] [<ffffffffa01cb02f>] pit_get_count+0x4f/0xf0 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748820] [<ffffffffa01b8f20>] ? emulator_write_emulated+0x70/0x90 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748860] [<ffffffffa01cb120>] pit_latch_count+0x50/0x90 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748898] [<ffffffffa01cb768>] pit_ioport_write+0x1c8/0x280 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748937] [<ffffffffa01c52c5>] ? x86_decode_insn+0x8a5/0xb20 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.748976] [<ffffffffa01af1b3>] ? kvm_io_bus_find_dev+0x53/0x80 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.749016] [<ffffffffa01bc83a>] kvm_emulate_pio+0x13a/0x240 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.749050] [<ffffffffa01e0ea3>] ? vmx_set_rflags+0x23/0x30 [kvm_intel]
Nov 6 05:39:43 bes12 kernel: [1500910.749089] [<ffffffffa01baff7>] ? emulate_instruction+0x2d7/0x340 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.749124] [<ffffffffa01e4257>] handle_io+0x77/0x80 [kvm_intel]
Nov 6 05:39:43 bes12 kernel: [1500910.749156] [<ffffffffa01e19ff>] vmx_handle_exit+0x9f/0x270 [kvm_intel]
Nov 6 05:39:43 bes12 kernel: [1500910.749189] [<ffffffff815268eb>] ? __down_read+0xbb/0xc6
Nov 6 05:39:43 bes12 kernel: [1500910.749220] [<ffffffffa01e54ec>] ? vmx_vcpu_run+0x21c/0x360 [kvm_intel]
Nov 6 05:39:43 bes12 kernel: [1500910.749260] [<ffffffffa01c97ab>] ? kvm_apic_has_interrupt+0x5b/0x80 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.749301] [<ffffffffa01b6aa2>] vcpu_enter_guest+0x2b2/0x5c0 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.749331] [<ffffffff815268eb>] ? __down_read+0xbb/0xc6
Nov 6 05:39:43 bes12 kernel: [1500910.750956] [<ffffffff81078620>] ? autoremove_wake_function+0x0/0x40
Nov 6 05:39:43 bes12 kernel: [1500910.750996] [<ffffffffa01b6e13>] __vcpu_run+0x63/0x320 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.751034] [<ffffffffa01bcd92>] kvm_arch_vcpu_ioctl_run+0x82/0x1c0 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.751074] [<ffffffffa01b25d3>] kvm_vcpu_ioctl+0x473/0x5c0 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.751104] [<ffffffff81526aa9>] ? _spin_lock+0x9/0x10
Nov 6 05:39:43 bes12 kernel: [1500910.751132] [<ffffffff810888b2>] ? futex_wake+0x102/0x120
Nov 6 05:39:43 bes12 kernel: [1500910.751163] [<ffffffff8112d3fd>] vfs_ioctl+0x1d/0xa0
Nov 6 05:39:43 bes12 kernel: [1500910.751191] [<ffffffff8112d589>] do_vfs_ioctl+0x79/0x400
Nov 6 05:39:43 bes12 kernel: [1500910.751219] [<ffffffff8108afe6>] ? sys_futex+0xc6/0x170
Nov 6 05:39:43 bes12 kernel: [1500910.751248] [<ffffffff8112d991>] sys_ioctl+0x81/0xa0
Nov 6 05:39:43 bes12 kernel: [1500910.751277] [<ffffffff81011fc2>] system_call_fastpath+0x16/0x1b
Nov 6 05:39:43 bes12 kernel: [1500910.751305] Code: 31 d2 48 83 bb 18 01 00 00 00 75 11 48 89 d0 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 40 00 48 8b 83 d8 00 00 00 4c 8b a3 c0 00 00 00 <ff> 50 28 48 8b 8b 18 01 00 00 49 29 c4 48 89 ca 4c 29 e2 48 89
Nov 6 05:39:43 bes12 kernel: [1500910.751554] RIP [<ffffffffa01cae56>] kpit_elapsed+0x46/0x80 [kvm]
Nov 6 05:39:43 bes12 kernel: [1500910.751595] RSP <ffff880571bafb18>
Nov 6 05:39:43 bes12 kernel: [1500910.751618] CR2: 0000000000000028
Nov 6 05:39:43 bes12 kernel: [1500910.751928] ---[ end trace 1a2c52aa5ab52148 ]---

pingou67 (pingou67)
affects: linux (Ubuntu) → qemu-kvm (Ubuntu)
affects: qemu-kvm (Ubuntu) → linux (Ubuntu)
Revision history for this message
pingou67 (pingou67) wrote :

same crash again today, after that all vm are down, rebooting the host is required

pingou67 (pingou67)
affects: linux (Ubuntu) → qemu-kvm (Ubuntu)
pingou67 (pingou67)
summary: - kernel crash with kvm
+ kernel crash (Oops) with kvm
Revision history for this message
pingou67 (pingou67) wrote :

this upstream bug looks like the same (even if the trace doesn't match exactly) :

http://bugzilla.kernel.org/show_bug.cgi?id=14376

It seems it's a regression between 2.6.30.6 and 2.6.31.2, unfortunately without solution.

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Mathias Gug (mathiaz) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Are you able to reliably reproduce the crash using a specific VM? What kind of guests are running on the system?

Changed in qemu-kvm (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
pingou67 (pingou67) wrote :

We had 5 vm on this host :
- 1 debian lenny 64 bits (virtio is used for the network and the block device with qcow2 format and 256Mo RAM is allocated)
- 2 ubuntu jaunty 64bits (virtio is used for the network and the block device with qcow2 format and 512Mo RAM is allocated)
- 1 redhat 5 64 bits (virtio is used for the network and the block device with qcow2 format and 12Go RAM is allocated)
- 1 windows 2003 32 bits (virtio is used for the network and IDE for the block device with qcow2 format and 512Mo RAM is allocated)

The host is a karmic koala with 24Go of physical RAM.

We have isolated each vm on a different host (with same config and same harware) last week to find the culprit. We are waiting the Oops...

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

This bug is fixed in Lucid's 2.6.32 kernel.

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in qemu-kvm (Ubuntu):
status: Incomplete → Invalid
Changed in linux:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.