2k3/xp guests w/virtio-net randomly DHCP fail on boot

Bug #1736655 reported by Patrick Schweiger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned
qemu (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Host:
Debian GNU/Linux 9 with Linux 4.13.0-0.bpo.1-amd64
QEMU 2.10.1

Guest:
Windows 2003 Standard SP2 (x64)
Windows XP SP3 (i386)

QEMU command line:
http://cfp.vim-cn.com/cbdF3

Description:
After upgrading from QEMU 2.8 to 2.10.1, my Windows 2003 x64 and Windows XP guests with "virtio-net-pci" NIC would randomly fail to aquire DHCP address on boot. When it fails, cycle disable/enable the connection in Control Panel could make it connect successfully. As an immediate workaround, I switched to the RTL8139 NIC which works fine. Further investigation showed that manually reverting commit '4a3f03ba8dbf53fce36d0c1dd5d0cc0f340fe5f3' on top of 2.10.1 "fixed" the problem.

Here are the bisect test results:
git branch 97cd965: 17 boots, 5 failures (Feb 18)
git branch 77424a4: 27 boots, 0 failure (Jan 09)
git branch c25d97c: 7 boots, 2 failures (Feb 01)
git branch 4a3f03b: 25 boots, 8 failures (Jan 19)
git branch 39f099e: 60 boots, 0 failure (Jan 18)
2.10.1 with revert: 194 boots, 0 failure
2.10.1 without revert: 30 boots, 4 failures

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Patrick,
thank you so much for the report and your help to make Ubuntu better.
Also for all the bisecting that is very helpful.

While backporting the change itself would be very trivial we need to find what the final solution has to be first.

I checked latest upstream which is about to release 2.11 these days and there was not related change.
- no revert to this commit
- no other major disabling/fixes for ioeventfd in these cases

That would imply that this is an issue unknown to upstream - and that means that we need to make it known there to not end up carrying a Delta forever and deriving more and more.
So we should bring it up there and find a solution for everyone.

First of all let me try if I can reproduce it over here, then we can decide on next steps.
Very likely one of them would be to report it to upstream as well - which since you have a great bug description already they track Launchpad I can easily do for you by adding a bug Task.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well I see you already opened it against upstream - sorry (too early for me) - great.
Adding an Ubuntu task then to track potential fixes to take eventually.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I wanted an XP guest testbed anyway for some time, so this was the perfect reason to create one.
# virtio drivers from [1] and otherwise a "normal" virt-manager setup of a winxp sp3 (32bit) guest

- Modified to use virtio for net
- Modified to have the virtio drivers as floppy
- Installed virtio drivers for network afer base install

So after that I assume I'm at the same stage you are.
- WinXP Guest
- Virtio Driver for network
- Host provides DHCP to a bridged network

I set that to a rebot loop and checked the internet connection through the dhcp virtiop net after reboot. This worked through 10 reboots without an issue, but then IIRC with vhost ioeventfd was on all the time.

I realized the change identified by you should only make a differenc on non-vhost virtio-net.
But libvirt will make use of vhost automatically if available, so I set the interface to use driver qemu explicitly.

Another 10 reboots without issue - the virtio version is of 7/19/2017 version 51.75.104.14100
There must be something more to your case, so please if you have more info how to trigger the case let us know.

I really think I have the same case you were describing as affected by the change you identified:
$ virsh qemu-monitor-command --hmp winxp 'info network'
net0: index=0,type=nic,model=virtio-net-pci,macaddr=52:54:00:ce:ca:72
 \ hostnet0: index=0,type=tap,fd=26
Here from qtree on the device [2] ioeventfd is on.

Could you try and run it with vhost and see if this changes behavior it for you?

[1]: https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win_amd64.vfd
[2]: http://paste.ubuntu.com/26123654/

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for qemu (Ubuntu) because there has been no activity for 60 days.]

Changed in qemu (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.