qemu-kvm 0.12.4+dfsg-1 from debian squeeze crashes "BUG: unable to handle kernel NULL pointer" (sym53c8xx)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I use eucalyptus software (1.6.2) on debian squeeze with kvm 0.12.4+dfsg-1 (the same happend with 0.11.1+dfsg-1 ). Kernel 2.6.32-3-amd64. After a few days machines crash. There are no logs in host system. Guest is the same kernel and OS as host. The kvm process use 100% of cpu time. I can not even ping the guest. Everything works fine with 2.6.30-2-amd64 and 2.6.32-trunk-amd64. The problem is only with 2.6.32-3-amd64 and 2.6.32-5-amd64. Here is the log from virtual machine:
[ 3577.816666] sd 0:0:0:0: [sda] ABORT operation started
[ 3582.816047] sd 0:0:0:0: ABORT operation timed-out.
[ 3582.816781] sd 0:0:0:0: [sda] ABORT operation started
[ 3587.816649] sd 0:0:0:0: ABORT operation timed-out.
[ 3587.817379] sd 0:0:0:0: [sda] DEVICE RESET operation started
[ 3592.816062] sd 0:0:0:0: DEVICE RESET operation timed-out.
[ 3592.816882] sd 0:0:0:0: [sda] BUS RESET operation started
[ 3592.820056] sym0: SCSI BUS reset detected.
[ 3592.831538] sym0: SCSI BUS has been reset.
[ 3592.831968] BUG: unable to handle kernel NULL pointer dereference at 0000000000000358
[ 3592.832003] IP: [<ffffffffa0114
[ 3592.832003] PGD 5f73e067 PUD 5fa53067 PMD 0
[ 3592.832003] Oops: 0000 [#1] SMP
[ 3592.832003] last sysfs file: /sys/devices/
[ 3592.832003] CPU 0
[ 3592.832003] Modules linked in: dm_mod openafs(P) ext2 snd_pcsp snd_pcm snd_timer serio_raw i2c_piix4 snd virtio_balloon evdev i2c_core soundcore psmouse button processor snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif ata_generic libata ide_pci_generic sym53c8xx scsi_transport_spi thermal piix uhci_hcd ehci_hcd floppy thermal_sys scsi_mod virtio_pci virtio_ring virtio e1000 ide_core usbcore nls_base [last unloaded: scsi_wait_scan]
[ 3592.832003] Pid: 193, comm: scsi_eh_0 Tainted: P 2.6.32-3-amd64 #1 Bochs
[ 3592.832003] RIP: 0010:[<
[ 3592.832003] RSP: 0018:ffff880001
[ 3592.832003] RAX: 000000000000000a RBX: 000000000000000b RCX: 000000005f410090
[ 3592.832003] RDX: 0000000000000000 RSI: ffff88005c450800 RDI: ffffc90000a5e006
[ 3592.832003] RBP: ffff88005f410000 R08: 0000000000000000 R09: 0000000000000000
[ 3592.832003] R10: 000000000000003a R11: ffffffff813b871e R12: ffff88005f410090
[ 3592.832003] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000001
[ 3592.832003] FS: 000000000000000
[ 3592.832003] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 3592.832003] CR2: 0000000000000358 CR3: 000000005e269000 CR4: 00000000000006f0
[ 3592.832003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3592.832003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3592.832003] Process scsi_eh_0 (pid: 193, threadinfo ffff88005f6fa000, task ffff88005f697880)
[ 3592.832003] Stack:
[ 3592.832003] ffff88005f3fd000 0000000000000000 0000000000000130 0000000000000000
[ 3592.832003] <0> ffff88005f407710 ffffc90000a64710 ffffffffffffff10 ffffffff81195301
[ 3592.832003] <0> 0000000000000010 0000000000010212 ffff880001803d18 0000000000000018
[ 3592.832003] Call Trace:
[ 3592.832003] <IRQ>
[ 3592.832003] [<ffffffff81195
[ 3592.832003] [<ffffffffa0116
[ 3592.832003] [<ffffffff8103f
[ 3592.832003] [<ffffffffa010f
[ 3592.832003] [<ffffffff81093
[ 3592.832003] [<ffffffff81095
[ 3592.832003] [<ffffffff81013
[ 3592.832003] [<ffffffff81012
[ 3592.832003] [<ffffffff81011
[ 3592.832003] [<ffffffff81053
[ 3592.832003] [<ffffffff8106f
[ 3592.832003] [<ffffffff81011
[ 3592.832003] [<ffffffff81013
[ 3592.832003] [<ffffffff81053
[ 3592.832003] [<ffffffff81025
[ 3592.832003] [<ffffffff81011
[ 3592.832003] <EOI>
[ 3592.832003] [<ffffffff8118e
[ 3592.832003] [<ffffffffa010f
[ 3592.832003] [<ffffffffa008e
[ 3592.832003] [<ffffffffa008f
[ 3592.832003] [<ffffffffa008f
[ 3592.832003] [<ffffffffa008f
[ 3592.832003] [<ffffffff81064
[ 3592.832003] [<ffffffff81011
[ 3592.832003] [<ffffffff81064
[ 3592.832003] [<ffffffff81011
[ 3592.832003] Code: 48 c7 c7 82 92 11 a0 eb 63 48 8b 98 38 01 00 00 48 8d b8 28 01 00 00 e8 df d5 0f e1 48 89 da 48 89 c6 48 c7 c7 bc 92 11 a0 eb 6d <49> 8b 96 58 03 00 00 48 8b 82 80 00 00 00 48 8b a8 b0 00 00 00
[ 3592.832003] RIP [<ffffffffa0114
[ 3592.832003] RSP <ffff880001803cb0>
[ 3592.832003] CR2: 0000000000000358
[ 3592.867935] ---[ end trace 06f90ebbdbd172ee ]---
[ 3592.868360] Kernel panic - not syncing: Fatal exception in interrupt
[ 3592.868906] Pid: 193, comm: scsi_eh_0 Tainted: P D 2.6.32-3-amd64 #1
[ 3592.869511] Call Trace:
[ 3592.869727] <IRQ> [<ffffffff812ed
[ 3592.870225] [<ffffffff81011
[ 3592.870778] [<ffffffff811af
[ 3592.871250] [<ffffffff81014
[ 3592.871694] [<ffffffff81014
[ 3592.872150] [<ffffffff81032
[ 3592.872626] [<ffffffff81032
[ 3592.873185] [<ffffffff81068
[ 3592.873576] [<ffffffff8104e
[ 3592.874125] [<ffffffff81024
[ 3592.874642] [<ffffffff812ef
[ 3592.875103] [<ffffffffa0114
[ 3592.875678] [<ffffffff81195
[ 3592.876162] [<ffffffffa0116
[ 3592.876748] [<ffffffff8103f
[ 3592.877224] [<ffffffffa010f
[ 3592.877800] [<ffffffff81093
[ 3592.878319] [<ffffffff81095
[ 3592.878848] [<ffffffff81013
[ 3592.879305] [<ffffffff81012
[ 3592.879744] [<ffffffff81011
[ 3592.880237] [<ffffffff81053
[ 3592.880723] [<ffffffff8106f
[ 3592.881284] [<ffffffff81011
[ 3592.881762] [<ffffffff81013
[ 3592.882230] [<ffffffff81053
[ 3592.882691] [<ffffffff81025
[ 3592.883258] [<ffffffff81011
[ 3592.883795] <EOI> [<ffffffff8118e
[ 3592.884319] [<ffffffffa010f
[ 3592.884917] [<ffffffffa008e
[ 3592.885522] [<ffffffffa008f
[ 3592.886152] [<ffffffffa008f
[ 3592.886789] [<ffffffffa008f
[ 3592.887398] [<ffffffff81064
[ 3592.887836] [<ffffffff81011
[ 3592.888290] [<ffffffff81064
[ 3592.888721] [<ffffffff81011
Unfortunatelly I have no idea how to reproduce the problem.
Log from /var/log/
lsi_scsi: error: Unimplemented message 0x0c
What is more I had 7 vm running. Today four of them crashed at the same time. The rest survived with something like this in syslog:
[651330.816043] sd 0:0:0:0: [sda] ABORT operation started
[651335.860027] sd 0:0:0:0: ABORT operation timed-out.
[651335.860600] sd 0:0:0:0: [sda] ABORT operation started
[651337.019355] sd 0:0:0:0: ABORT operation complete.
[651337.038506] sd 0:0:0:0: [sda] ABORT operation started
[651337.039100] sd 0:0:0:0: ABORT operation failed.
[651337.039624] sd 0:0:0:0: [sda] ABORT operation started
[651337.040303] sd 0:0:0:0: ABORT operation failed.
[651337.040834] sd 0:0:0:0: [sda] ABORT operation started
[651337.041417] sd 0:0:0:0: ABORT operation failed.
[651337.041949] sd 0:0:0:0: [sda] ABORT operation started
[651337.042534] sd 0:0:0:0: ABORT operation failed.
[651337.043072] sd 0:0:0:0: [sda] DEVICE RESET operation started
[651337.043834] scsi target0:0:0: control msgout: c.
[651337.520075] scsi target0:0:0: has been reset
[651337.521726] sd 0:0:0:0: DEVICE RESET operation complete.
[651337.522495] sd 0:0:0:0: M_REJECT received (0:0).
It looks like the problem is in host system and has influence on all machines at the same time. I have found the same pattern in syslog on machines which crashed. It was 3 days before crash. There is no information in host log files at all. Is this possible that eucalyptus (1.6.2) caused this? With 1.6.1 I didin't have these problems. Eucalyptus runs kvm (0.12 and 0.11) with commands:
/usr/local/
/usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 512 -smp 1 -name i-492407F3 -uuid b2dc266e-
I can give the access to vm.
If you can recreate the issue, please update the bug report with information about to how recreate the problem.