libvirt-bin on latest lucid: heavy packet loss

Bug #571408 reported by Fionn
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I have to report a very strange bug here which I came upon today:

I am running a quad core server with 4 KVM guests. Host and all guests are all lucid latest. Guests run in a bridged setup for outside communication as well as in the default KVM network for guest-to-guest communications.
This setup is complete and worked already while I setup the guests, until yesterday the Host and all guests started to show heavy packet loss on their network interfaces. The host computer did not expose any packet loss if libvirt was not started on boot and when libvirt was started, packets began to fall into oblivion. Overall, the loss rate was about 75-95%.
I first thought that one of the latest libvirt or kernels updates might be the the culprit but downgrading all libvirt and kvm packages as well as going back one kernel version did not change the situation.

Then I began to experiment with iptables and network settings and after a couple of hours found out:
When the packet loss occurs, doing
sysctl -w net.ipv4.ip_forward=0

stops the packet loss problem immediately. But even more interestingly, setting ip_forward back to 1 does NOT cause the packet loss problem to reappear! The system does now run as expected again and I made a little checker script that will execute the two sysctls with some seconds in between, should packet loss be detected again.

Nevertheless, I thought this might be worth reporting and maybe one of you developers wants to have additional information.

Package information:
linux-image-2.6.32-21-server 2.6.32-21.32
qemu-kvm 0.12.3+noroms-0ubuntu9
libvirt-bin 0.7.5-5ubuntu27
libvirt0 0.7.5-5ubuntu27
python-libvirt 0.7.5-5ubuntu27
ubuntu-virt-server 1.2
qemu-common 0.12.3+noroms-0ubuntu9

Revision history for this message
C de-Avillez (hggdh2) wrote :

Thank you for opening this bug and helping make Ubuntu better. This is interesting... are there log entries for libvirt? In fact, any log entries related to this packet loss?

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Fionn (fbe) wrote : Re: [Bug 571408] Re: libvirt-bin on latest lucid: heavy packet loss

Am Montag, den 03.05.2010, 14:09 +0000 schrieb C de-Avillez:

> Thank you for opening this bug and helping make Ubuntu better. This is
> interesting... are there log entries for libvirt? In fact, any log
> entries related to this packet loss?

Actually not. I tried syslog, dmesg, then xtail on /var/log but no
relevant entry seemed to appear. The packets just fell off the wagon. It
looks like some in-kernel issue to me, especially regarding the "fix" I
found. I just wonder why it did not happen earlier...

Revision history for this message
Philipp A. Baer (phbaer) wrote :

Same strange bug here (Lucid, 2.6.32-22 | 2.6.32.13, libvirt-bin 0.7.5, lxc | Xen, amd64, Intel i7) but unfortunately the trick doesn't work for me, i.e. disabling and enabling ip forwarding. As soon as ip forwarding is enabled, packets are "dropped" again arbitrarily.

Absolutely no supportive traces in syslog, dmesg, etc. tcpdump on icmp packets only revealed that there no icmp packets are received in case of packet loss... I'll try to downgrade to Karmic or switch over to Debian.

Changed in libvirt (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Philipp A. Baer (phbaer) wrote :

I'm still not quite sure what the exact reason for this behaviour is. After testing several kernels -- 2.6.26 (Lenny), 2.6.31 (self built), 2.6.32 (Lucid, Squeeze, and self built) --, distributions -- Lenny, Squeeze, Karmic, Lucid --, and two NICs -- RT, Intel -- I can't make head of tail of it; always the same behaviour.

It really seems somehow related to the hardware without any indication of one specific cause. I installed Lenny, Xen, and libvirt-bin earlier today. So far, there seems to be no packet loss any more -- on a system that actually idles all the time. I'll do some testing next week.

Thank you so far, I'll keep you in the loop.

Revision history for this message
Fionn (fbe) wrote :

I've got the problem back on the same server. This time I cant switch it off, whatever I do.

However, this forced me to strip down the issue to the following fact:

Switching on ip_forwarding starts the packet loss. Apparently it was only by accident that launching libvirt-bin switched it on. And I totally fail to see, how I could switch it off and on again to have it work. As it is now, forwarding on means packet loss, forwarding off does not.
Even using a kernel that works flawlessly on a similar computer with a similar networking card in the very same computing center fails to solve the issue.
The Hoster's netboot rescue system, however, does NOT expose the problem. I am totally lost on this and will problably have to reinstall the whole system.

So, just to clarify, it seems not to be a libvirt-bin problem but some sort of general networking/hardware problem.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for libvirt (Ubuntu) because there has been no activity for 60 days.]

Changed in libvirt (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.