qemu-kvm

e1000 irq problems after live migration with qemu-kvm 0.12.4

Bug #584510 reported by Peter Lieven on 2010-05-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	qemu-kvm	New	Undecided	Unassigned

Bug Description

After live migrating ubuntu 9.10 server (2.6.31-14-server) and suse linux 10.1 (2.6.16.13-4-smp)
it happens sometimes that the guest runs into irq problems. i mention these 2 guest oss
since i have seen the error there. there are likely others around with the same problem.

on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4.

i started a vm with:
/usr/bin/qemu-kvm-0.12.4 -net tap,vlan=141,script=no,downscript=no,ifname=tap0 -net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72 -drive file=/dev/sdb,if=ide,boot=on,cache=none,aio=native -m 1024 -cpu qemu64,model_id='Intel(R) Xeon(R) CPU E5430 @ 2.66GHz' -monitor tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-9-10' -boot order=dc,menu=on -k de -incoming tcp:172.21.55.22:5001 -pidfile /var/run/qemu/vm-155.pid -mem-path /hugepages -mem-prealloc -rtc base=utc,clock=host -usb -usbdevice tablet

for testing i have a clean ubuntu 9.10 server 64-bit install and created a small script with fetches a dvd iso from a local server and checking md5sum in an endless loop.

the download performance is approx. 50MB/s on that vm.

to trigger the error i did several migrations of the vm throughout the last days. finally I ended up in the following oops in the guest:

[64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option)
[64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server #48-Ubuntu
[64442.299179] Call Trace:
[64442.299185] <IRQ> [<ffffffff810b4b96>] __report_bad_irq+0x26/0xa0
[64442.299227] [<ffffffff810b4d9c>] note_interrupt+0x18c/0x1d0
[64442.299232] [<ffffffff810b5415>] handle_fasteoi_irq+0xd5/0x100
[64442.299244] [<ffffffff81014bdd>] handle_irq+0x1d/0x30
[64442.299246] [<ffffffff810140b7>] do_IRQ+0x67/0xe0
[64442.299249] [<ffffffff810129d3>] ret_from_intr+0x0/0x11
[64442.299266] [<ffffffff810b3234>] ? handle_IRQ_event+0x24/0x160
[64442.299269] [<ffffffff810b529f>] ? handle_edge_irq+0xcf/0x170
[64442.299271] [<ffffffff81014bdd>] ? handle_irq+0x1d/0x30
[64442.299273] [<ffffffff810140b7>] ? do_IRQ+0x67/0xe0
[64442.299275] [<ffffffff810129d3>] ? ret_from_intr+0x0/0x11
[64442.299290] [<ffffffff81526b14>] ? _spin_unlock_irqrestore+0x14/0x20
[64442.299302] [<ffffffff8133257c>] ? scsi_dispatch_cmd+0x16c/0x2d0
[64442.299307] [<ffffffff8133963a>] ? scsi_request_fn+0x3aa/0x500
[64442.299322] [<ffffffff8125fafc>] ? __blk_run_queue+0x6c/0x150
[64442.299324] [<ffffffff8125fcbb>] ? blk_run_queue+0x2b/0x50
[64442.299327] [<ffffffff8133899f>] ? scsi_run_queue+0xcf/0x2a0
[64442.299336] [<ffffffff81339a0d>] ? scsi_next_command+0x3d/0x60
[64442.299338] [<ffffffff8133a21b>] ? scsi_end_request+0xab/0xb0
[64442.299340] [<ffffffff8133a50e>] ? scsi_io_completion+0x9e/0x4d0
[64442.299348] [<ffffffff81036419>] ? default_spin_lock_flags+0x9/0x10
[64442.299351] [<ffffffff8133224d>] ? scsi_finish_command+0xbd/0x130
[64442.299353] [<ffffffff8133aa95>] ? scsi_softirq_done+0x145/0x170
[64442.299356] [<ffffffff81264e6d>] ? blk_done_softirq+0x7d/0x90
[64442.299368] [<ffffffff810651fd>] ? __do_softirq+0xbd/0x200
[64442.299370] [<ffffffff810131ac>] ? call_softirq+0x1c/0x30
[64442.299372] [<ffffffff81014b85>] ? do_softirq+0x55/0x90
[64442.299374] [<ffffffff81064f65>] ? irq_exit+0x85/0x90
[64442.299376] [<ffffffff810140c0>] ? do_IRQ+0x70/0xe0
[64442.299379] [<ffffffff810129d3>] ? ret_from_intr+0x0/0x11
[64442.299380] <EOI> [<ffffffff810356f6>] ? native_safe_halt+0x6/0x10
[64442.299390] [<ffffffff8101a20c>] ? default_idle+0x4c/0xe0
[64442.299395] [<ffffffff815298f5>] ? atomic_notifier_call_chain+0x15/0x20
[64442.299398] [<ffffffff81010e02>] ? cpu_idle+0xb2/0x100
[64442.299406] [<ffffffff815123c6>] ? rest_init+0x66/0x70
[64442.299424] [<ffffffff81838047>] ? start_kernel+0x352/0x35b
[64442.299427] [<ffffffff8183759a>] ? x86_64_start_reservations+0x125/0x129
[64442.299429] [<ffffffff81837698>] ? x86_64_start_kernel+0xfa/0x109
[64442.299433] handlers:
[64442.299840] [<ffffffffa0000b80>] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10

After this the guest is still allive, but download performance is down to approx. 500KB/s

This error is definetly not triggerable with option -no-kvm-irqchip. I have seen this error occasionally
since my first experiments with qemu-kvm-88 and also without hugetablefs.

Help appreciated.

Tags:

Revision history for this message

Peter Lieven (plieven) wrote on 2010-05-24:

I did 2 additional tests

1) Stop VM, Live Migrate, Continue -> Triggers BUG
2) Stop VM, Continue -> Does NOT trigger BUG.

My guess it seems that pending interrupts are incorrectly transferred with kernel irqchip.
As said earlier userspace irqchip does not trigger the bug.

affects:

qemu → qemu-kvm

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.