qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle kernel NULL pointer" (sym53c8xx)

Bug #587993 reported by Maciej Gałkiewicz
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Won't Fix
Undecided
Unassigned

Bug Description

I use eucalyptus software (1.6.2) on debian squeeze with kvm 0.12.4+dfsg-1 (the same happend with 0.11.1+dfsg-1 ). Kernel 2.6.32-3-amd64. After a few days machines crash. There are no logs in host system. Guest is the same kernel and OS as host. The kvm process use 100% of cpu time. I can not even ping the guest. Everything works fine with 2.6.30-2-amd64 and 2.6.32-trunk-amd64. The problem is only with 2.6.32-3-amd64 and 2.6.32-5-amd64. Here is the log from virtual machine:

[ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
[ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
[ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
[ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
[ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
[ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
[ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
[ 3592.820056] sym0: SCSI BUS reset detected.
[ 3592.831538] sym0: SCSI BUS has been reset.
[ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at 0000000000000358
[ 3592.832003] IP: [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
[ 3592.832003] Oops: 0000 [#1] SMP
[ 3592.832003] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/host0/target0:0:0/0:0:0:0/vendor
[ 3592.832003] CPU 0
[ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
[ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P 2.6.32-3-amd64 #1 Bochs
[ 3592.832003] RIP: 0010:[<ffffffffa01147c4>] [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP: 0018:ffff880001803cb0 EFLAGS: 00010287
[ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 000000005f410090
[ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: ffffc90000a5e006
[ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 0000000000000000
[ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: ffff88005f410090
[ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000001
[ 3592.832003] FS: 0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
[ 3592.832003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 00000000000006f0
[ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task ffff88005f697880)
[ 3592.832003] Stack:
[ 3592.832003] ffff88005f3fd000 0000000000000000 0000000000000130 0000000000000000
[ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10 ffffffff81195301
[ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18 0000000000000018
[ 3592.832003] Call Trace:
[ 3592.832003] <IRQ>
[ 3592.832003] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.832003] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.832003] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.832003] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.832003] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.832003] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.832003] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.832003] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.832003] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.832003] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.832003] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.832003] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.832003] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.832003] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.832003] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.832003] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.832003] <EOI>
[ 3592.832003] [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.832003] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.832003] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.832003] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.832003] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.832003] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.832003] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.832003] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.832003] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.832003] [<ffffffff81011ba0>] ? child_rip+0x0/0x20
[ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
[ 3592.832003] RIP [<ffffffffa01147c4>] sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.832003] RSP <ffff880001803cb0>
[ 3592.832003] CR2: 0000000000000358
[ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
[ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
[ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P D 2.6.32-3-amd64 #1
[ 3592.869511] Call Trace:
[ 3592.869727] <IRQ> [<ffffffff812ed349>] ? panic+0x86/0x141
[ 3592.870225] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.870778] [<ffffffff811afbdc>] ? dummycon_dummy+0x0/0x3
[ 3592.871250] [<ffffffff81014a37>] ? oops_end+0x64/0xb4
[ 3592.871694] [<ffffffff81014a7a>] ? oops_end+0xa7/0xb4
[ 3592.872150] [<ffffffff810322b8>] ? no_context+0x1e9/0x1f8
[ 3592.872626] [<ffffffff8103246d>] ? __bad_area_nosemaphore+0x1a6/0x1ca
[ 3592.873185] [<ffffffff8106807c>] ? up+0xe/0x36
[ 3592.873576] [<ffffffff8104e219>] ? release_console_sem+0x17e/0x1af
[ 3592.874125] [<ffffffff81024d72>] ? lapic_next_event+0x18/0x1d
[ 3592.874642] [<ffffffff812ef595>] ? page_fault+0x25/0x30
[ 3592.875103] [<ffffffffa01147c4>] ? sym_int_sir+0x62f/0x14e0 [sym53c8xx]
[ 3592.875678] [<ffffffff81195301>] ? __memcpy_toio+0x9/0x19
[ 3592.876162] [<ffffffffa01164ed>] ? sym_interrupt+0x46c/0x6a3 [sym53c8xx]
[ 3592.876748] [<ffffffff8103fea0>] ? update_curr+0xa6/0x147
[ 3592.877224] [<ffffffffa010fbde>] ? sym53c8xx_intr+0x43/0x6a [sym53c8xx]
[ 3592.877800] [<ffffffff81093bfc>] ? handle_IRQ_event+0x58/0x126
[ 3592.878319] [<ffffffff810954e2>] ? handle_fasteoi_irq+0x7d/0xb5
[ 3592.878848] [<ffffffff81013957>] ? handle_irq+0x17/0x1d
[ 3592.879305] [<ffffffff81012fb1>] ? do_IRQ+0x57/0xb6
[ 3592.879744] [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
[ 3592.880237] [<ffffffff81053903>] ? __do_softirq+0x6e/0x19f
[ 3592.880723] [<ffffffff8106fa87>] ? tick_dev_program_event+0x2d/0x95
[ 3592.881284] [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[ 3592.881762] [<ffffffff81013903>] ? do_softirq+0x3f/0x7c
[ 3592.882230] [<ffffffff810537e1>] ? irq_exit+0x36/0x76
[ 3592.882691] [<ffffffff81025837>] ? smp_apic_timer_interrupt+0x87/0x95
[ 3592.883258] [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
[ 3592.883795] <EOI> [<ffffffff8118e009>] ? delay_tsc+0x0/0x73
[ 3592.884319] [<ffffffffa010f900>] ? sym_eh_handler+0x22e/0x2e2 [sym53c8xx]
[ 3592.884917] [<ffffffffa008e5de>] ? scsi_try_bus_reset+0x50/0xd9 [scsi_mod]
[ 3592.885522] [<ffffffffa008f565>] ? scsi_eh_ready_devs+0x50c/0x781 [scsi_mod]
[ 3592.886152] [<ffffffffa008fd6b>] ? scsi_error_handler+0x3c1/0x5b5 [scsi_mod]
[ 3592.886789] [<ffffffffa008f9aa>] ? scsi_error_handler+0x0/0x5b5 [scsi_mod]
[ 3592.887398] [<ffffffff81064789>] ? kthread+0x79/0x81
[ 3592.887836] [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 3592.888290] [<ffffffff81064710>] ? kthread+0x0/0x81
[ 3592.888721] [<ffffffff81011ba0>] ? child_rip+0x0/0x20

Unfortunatelly I have no idea how to reproduce the problem.

Log from /var/log/libvirt/qemu/
lsi_scsi: error: Unimplemented message 0x0c

What is more I had 7 vm running. Today four of them crashed at the same time. The rest survived with something like this in syslog:

[651330.816043] sd 0:0:0:0: [sda] ABORT operation started
[651335.860027] sd 0:0:0:0: ABORT operation timed-out.
[651335.860600] sd 0:0:0:0: [sda] ABORT operation started
[651337.019355] sd 0:0:0:0: ABORT operation complete.
[651337.038506] sd 0:0:0:0: [sda] ABORT operation started
[651337.039100] sd 0:0:0:0: ABORT operation failed.
[651337.039624] sd 0:0:0:0: [sda] ABORT operation started
[651337.040303] sd 0:0:0:0: ABORT operation failed.
[651337.040834] sd 0:0:0:0: [sda] ABORT operation started
[651337.041417] sd 0:0:0:0: ABORT operation failed.
[651337.041949] sd 0:0:0:0: [sda] ABORT operation started
[651337.042534] sd 0:0:0:0: ABORT operation failed.
[651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
[651337.043834] scsi target0:0:0: control msgout: c.
[651337.520075] scsi target0:0:0: has been reset
[651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
[651337.522495] sd 0:0:0:0: M_REJECT received (0:0).

It looks like the problem is in host system and has influence on all machines at the same time. I have found the same pattern in syslog on machines which crashed. It was 3 days before crash. There is no information in host log files at all. Is this possible that eucalyptus (1.6.2) caused this? With 1.6.1 I didin't have these problems. Eucalyptus runs kvm (0.12 and 0.11) with commands:

/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin HOME=/root USER=root LOGNAME=root /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name i-35B80630 -uuid 7e9b2fc1-9a9d-7114-3cb4-f4fdb3d51a3a -nographic -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/i-35B80630.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -kernel /var/lib/eucalyptus/instances/winnie/i-35B80630/kernel -initrd /var/lib/eucalyptus/instances/winnie/i-35B80630/ramdisk -append root=/dev/sda1 console=ttyS0 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive file=/var/lib/eucalyptus/instances/winnie/i-35B80630/disk,if=none,id=drive-scsi0-0-0,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0 -device e1000,vlan=0,id=net0,mac=d0:0d:35:b8:06:30,bus=pci.0,addr=0x4 -net tap,fd=43,vlan=0,name=hostnet0 -chardev file,id=serial0,path=/var/lib/eucalyptus/instances/winnie/i-35B80630/console.log -device isa-serial,chardev=serial0 -usb -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3 -uuid b2dc266e-a62a-4e13-3847-f9104eba4135 -nographic -monitor unix:/var/lib/libvirt/qemu/i-492407F3.monitor,server,nowait -boot c -kernel /var/lib/eucalyptus/instances/admin/i-492407F3/kernel -initrd /var/lib/eucalyptus/instances/admin/i-492407F3/ramdisk -append root=/dev/sda1 console=ttyS0 -drive file=/var/lib/eucalyptus/instances/admin/i-492407F3/disk,if=scsi,bus=0,unit=0,boot=on -net nic,macaddr=d0:0d:49:24:07:f3,vlan=0,model=e1000,name=net0 -net tap,fd=118,vlan=0,name=hostnet0 -serial file:/var/lib/eucalyptus/instances/admin/i-492407F3/console.log -parallel none -usb -vga none -balloon virtio

I can give the access to vm.

Revision history for this message
Anthony Liguori (anthony-codemonkey) wrote :

If you can recreate the issue, please update the bug report with information about to how recreate the problem.

Changed in qemu:
status: New → Incomplete
description: updated
description: updated
description: updated
Revision history for this message
Jes Sorensen (jes-sorensen) wrote :

Looks like the SCSI driver is causing problems. QEMU's SCSI emulation is known to be broken, please use IDE
or virtio-blk.

Jes

Revision history for this message
Jes Sorensen (jes-sorensen) wrote :

Looks a duplicate of https://sourceforge.net/tracker/index.php?func=detail&aid=2042889&group_id=180599&atid=893831

Closed the SF bug, lets focus on this issue here.

Jes

Revision history for this message
Thomas Huth (th-huth) wrote :

QEMU 0.12 is way outdated nowadays, so I assume this problem has been fixed sometime in the last years... so I'm closing this ticket now. If you still have problems with the latest version of QEMU, please feel free to open this bug again.

Changed in qemu:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.