e1000 irq problems after live migration with qemu-kvm 0.12.4

Bug #584510 reported by Peter Lieven
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu-kvm
New
Undecided
Unassigned

Bug Description

After live migrating ubuntu 9.10 server (2.6.31-14-server) and suse linux 10.1 (2.6.16.13-4-smp)
it happens sometimes that the guest runs into irq problems. i mention these 2 guest oss
since i have seen the error there. there are likely others around with the same problem.

on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4.

i started a vm with:
/usr/bin/qemu-kvm-0.12.4 -net tap,vlan=141,script=no,downscript=no,ifname=tap0 -net nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72 -drive file=/dev/sdb,if=ide,boot=on,cache=none,aio=native -m 1024 -cpu qemu64,model_id='Intel(R) Xeon(R) CPU E5430 @ 2.66GHz' -monitor tcp:0:4001,server,nowait -vnc :1 -name 'migration-test-9-10' -boot order=dc,menu=on -k de -incoming tcp:172.21.55.22:5001 -pidfile /var/run/qemu/vm-155.pid -mem-path /hugepages -mem-prealloc -rtc base=utc,clock=host -usb -usbdevice tablet

for testing i have a clean ubuntu 9.10 server 64-bit install and created a small script with fetches a dvd iso from a local server and checking md5sum in an endless loop.

the download performance is approx. 50MB/s on that vm.

to trigger the error i did several migrations of the vm throughout the last days. finally I ended up in the following oops in the guest:

[64442.298521] irq 10: nobody cared (try booting with the "irqpoll" option)
[64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server #48-Ubuntu
[64442.299179] Call Trace:
[64442.299185] <IRQ> [<ffffffff810b4b96>] __report_bad_irq+0x26/0xa0
[64442.299227] [<ffffffff810b4d9c>] note_interrupt+0x18c/0x1d0
[64442.299232] [<ffffffff810b5415>] handle_fasteoi_irq+0xd5/0x100
[64442.299244] [<ffffffff81014bdd>] handle_irq+0x1d/0x30
[64442.299246] [<ffffffff810140b7>] do_IRQ+0x67/0xe0
[64442.299249] [<ffffffff810129d3>] ret_from_intr+0x0/0x11
[64442.299266] [<ffffffff810b3234>] ? handle_IRQ_event+0x24/0x160
[64442.299269] [<ffffffff810b529f>] ? handle_edge_irq+0xcf/0x170
[64442.299271] [<ffffffff81014bdd>] ? handle_irq+0x1d/0x30
[64442.299273] [<ffffffff810140b7>] ? do_IRQ+0x67/0xe0
[64442.299275] [<ffffffff810129d3>] ? ret_from_intr+0x0/0x11
[64442.299290] [<ffffffff81526b14>] ? _spin_unlock_irqrestore+0x14/0x20
[64442.299302] [<ffffffff8133257c>] ? scsi_dispatch_cmd+0x16c/0x2d0
[64442.299307] [<ffffffff8133963a>] ? scsi_request_fn+0x3aa/0x500
[64442.299322] [<ffffffff8125fafc>] ? __blk_run_queue+0x6c/0x150
[64442.299324] [<ffffffff8125fcbb>] ? blk_run_queue+0x2b/0x50
[64442.299327] [<ffffffff8133899f>] ? scsi_run_queue+0xcf/0x2a0
[64442.299336] [<ffffffff81339a0d>] ? scsi_next_command+0x3d/0x60
[64442.299338] [<ffffffff8133a21b>] ? scsi_end_request+0xab/0xb0
[64442.299340] [<ffffffff8133a50e>] ? scsi_io_completion+0x9e/0x4d0
[64442.299348] [<ffffffff81036419>] ? default_spin_lock_flags+0x9/0x10
[64442.299351] [<ffffffff8133224d>] ? scsi_finish_command+0xbd/0x130
[64442.299353] [<ffffffff8133aa95>] ? scsi_softirq_done+0x145/0x170
[64442.299356] [<ffffffff81264e6d>] ? blk_done_softirq+0x7d/0x90
[64442.299368] [<ffffffff810651fd>] ? __do_softirq+0xbd/0x200
[64442.299370] [<ffffffff810131ac>] ? call_softirq+0x1c/0x30
[64442.299372] [<ffffffff81014b85>] ? do_softirq+0x55/0x90
[64442.299374] [<ffffffff81064f65>] ? irq_exit+0x85/0x90
[64442.299376] [<ffffffff810140c0>] ? do_IRQ+0x70/0xe0
[64442.299379] [<ffffffff810129d3>] ? ret_from_intr+0x0/0x11
[64442.299380] <EOI> [<ffffffff810356f6>] ? native_safe_halt+0x6/0x10
[64442.299390] [<ffffffff8101a20c>] ? default_idle+0x4c/0xe0
[64442.299395] [<ffffffff815298f5>] ? atomic_notifier_call_chain+0x15/0x20
[64442.299398] [<ffffffff81010e02>] ? cpu_idle+0xb2/0x100
[64442.299406] [<ffffffff815123c6>] ? rest_init+0x66/0x70
[64442.299424] [<ffffffff81838047>] ? start_kernel+0x352/0x35b
[64442.299427] [<ffffffff8183759a>] ? x86_64_start_reservations+0x125/0x129
[64442.299429] [<ffffffff81837698>] ? x86_64_start_kernel+0xfa/0x109
[64442.299433] handlers:
[64442.299840] [<ffffffffa0000b80>] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10

After this the guest is still allive, but download performance is down to approx. 500KB/s

This error is definetly not triggerable with option -no-kvm-irqchip. I have seen this error occasionally
since my first experiments with qemu-kvm-88 and also without hugetablefs.

Help appreciated.

Revision history for this message
Peter Lieven (plieven) wrote :

I did 2 additional tests

1) Stop VM, Live Migrate, Continue -> Triggers BUG
2
) Stop VM, Continue -> Does NOT trigger BUG.

My guess it seems that pending interrupts are incorrectly transferred with kernel irqchip.
As said earlier userspace irqchip does not trigger the bug.

affects: qemu → qemu-kvm
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.