kvm network instability

Bug #228163 reported by Paolo Losi
8
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: kvm

Info:
Stock Ubuntu 8.04 amd64 server.

Symptom:

network goes periodically down or becomes slow.
It seems to happen regularly with 60 minutes period.

The kernel host reports:

[848228.108418] Call Trace:
[848228.108425] [<ffffffff882bee69>] :kvm:gfn_to_memslot+0x9/0x20
[848228.108433] [<ffffffff882c72df>] :kvm:kvm_mmu_get_page+0x41f/0x490
[848228.108443] [<ffffffff882c87fe>] :kvm:kvm_mmu_load+0x10e/0x220
[848228.108452] [<ffffffff882c48cf>] :kvm:kvm_arch_vcpu_ioctl_run+0x2df/0x620
[848228.108461] [<ffffffff882bfddd>] :kvm:kvm_vcpu_ioctl+0x33d/0x350
[848228.108465] [<ffffffff803ef2b9>] netif_rx_ni+0x19/0x20
[848228.108469] [<ffffffff8832ffd2>] :tun:tun_chr_aio_write+0x142/0x250
[848228.108472] [<ffffffff8025eef0>] futex_wake+0x50/0xf0
[848228.108477] [<ffffffff8025fde4>] do_futex+0x134/0xbc0
[848228.108480] [<ffffffff802499fd>] __dequeue_signal+0x2d/0x1e0
[848228.108489] [<ffffffff882bf715>] :kvm:kvm_vm_ioctl+0x85/0x200
[848228.108491] [<ffffffff802496ee>] recalc_sigpending+0xe/0x40
[848228.108494] [<ffffffff8024af49>] dequeue_signal+0x59/0x150
[848228.108498] [<ffffffff8024bf7f>] sys_rt_sigtimedwait+0x11f/0x2c0
[848228.108501] [<ffffffff802c2a9f>] do_ioctl+0x2f/0xa0
[848228.108504] [<ffffffff802c2d30>] vfs_ioctl+0x220/0x2c0
[848228.108506] [<ffffffff802b558e>] vfs_write+0x14e/0x190
[848228.108510] [<ffffffff802c2e61>] sys_ioctl+0x91/0xb0
[848228.108513] [<ffffffff8020c37e>] system_call+0x7e/0x83

Revision history for this message
Neal McBurnett (nealmcb) wrote :

Thank you for reporting your experiences with Ubuntu. Please provide more information on your hardware and networking configuration. Where did the traceback come from and when does it happen in relation to the slowdowns?
See also e.g. https://wiki.ubuntu.com/DebuggingHardwareDetection

Changed in kvm:
status: New → Incomplete
Revision history for this message
Soren Hansen (soren) wrote :

What is your guest OS?

Could you paste the entire error from dmesg? Including all the registers and all that? Thanks.

Revision history for this message
Michele Cella (mcella82) wrote :

Hi Neal and Soren,

I'm replying on Paolo's behalf, first of all thanks for your attention and sorry for our late reply but we've been experimenting different things to resolve or detect the real problem we're experiencing.

We are running hardy with a livbirt + kvm setup on a 8 CPUs machine with 16 GB RAM, we're using different guests operating system, some newly installed (hardy jeos), two directly ported from our old qemu setup (running dapper), a winxppro machine and a centos 5 guest ported from a vmware image.

Shortly after porting our new setup to kvm we started noticing some issues with the old dapper machine running the zabbix monitoring system, periodically (60 minutes circa) the network was going down, we noticed this by looking at zabbix cpu utilization graphs presenting periodically gaps of about 15 minutes, we also noticed some network problems with the other machines but we've not been able to reproduce them programmatically, the centos machine (running trixbox) sometimes requires a networking restart to come back to work...

Our problems seems pretty similar (but not identical) to the ones described here:

https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/194304

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/13537

ATM we are running kvm-68 on hardy, I've backported Soren's intrepid package to hardy, basically nothing changed with kvm-68 just the fact that we're not seeing the dmesg error anymore (so I'm wondering if the dmesg error and the problem are really related), today I've finally found a working combination for the zabbix guest, I've ported it to run on a jeos machine with virtio as network device and it seems to be working properly (until now).

I'm going to attach the information Neal requested, regarding dmesg entire error and registers, the error posted by Paolo is everything we've found repeated on our dmesg, no other information were reported there by kvm.

We'll promptly report back with additional informations as we found them.

Thanks again for your attention.

Revision history for this message
Michele Cella (mcella82) wrote :
Revision history for this message
Michele Cella (mcella82) wrote :
Revision history for this message
Michele Cella (mcella82) wrote :
Revision history for this message
Paolo Losi (paolo-enuan) wrote :

We confirm that the problem has been solved using virtio.

there should be a problem in qemu/kvm network emulation and/or ubuntu guest network drivers.

Revision history for this message
Soren Hansen (soren) wrote :

Just to clarify: So with the kvm version shipped in Hardy on the host, and with virtio_net in the guest, it works?

Revision history for this message
Michele Cella (mcella82) wrote :

Hi Soren,

No, ATM we're using a backported deb of your kvm-68 intrepid package on our host... we're using it in production and everything is running smoothly since two weeks, we want to try again with the hardy package but we don't know yet "when" that will be we will report back our findings.

Thanks again for your attention.

Revision history for this message
Soren Hansen (soren) wrote :

Did you recompile the kernel modules or are you using the ones from Hardy?

Revision history for this message
Michele Cella (mcella82) wrote :

Hi Soren,

Good news...

We're back to kvm-62 (stock hardy setup) and everything seems to be working fine this time, we're running seven guests and we haven't noticed any kernel trace or network problem so far.

This leaves us with two variables that could explain the problems we've been experiencing originally:

- zabbix server > 1.4.2
- dapper as guest

We were running a compiled version of zabbix 1.4.5 (latest release) on a dapper guest, now we're running zabbix 1.4.2 (from deb) on a hardy/jeos host.

Please note that we've also removed virtio as the network card model and everything still works right...

At this point we think the problem was (is) caused by a zabbix, the dapper guest or a combination of the two.

Thanks for your attention, feel free to invalidate this bug report if the problems come back we will eventually reopen it and add more details.

PS
Regarding kvm-68, I recompiled the modules using "module-assistant auto-install --force kvm" right after having installed all the debs generated from your intrepid sources, that's what was installed:

# dpkg -l | grep kvm
ii kvm 1:68+dfsg-0ubuntu1 Full virtualization on x86 hardware
ii kvm-data 1:68+dfsg-0ubuntu1 Data files for the KVM package
ii kvm-modules-2.6.24-16-server 1:68+dfsg-0ubuntu1+2.6.24-16.30 kvm modules for Linux (kernel 2.6.24-16-serv
ii kvm-source 1:68+dfsg-0ubuntu1 Source for the KVM driver

Note that I had to manually put kvm*.ko modules inside "/lib/modules/2.6.24-16-server/kernel/arch/x86/kvm" since the generated package puts them inside "/lib/modules/2.6.24-16-server/kernel/misc" and modprobe ignores them.

That was the modinfo output:

# modinfo kvm
filename: /lib/modules/2.6.24-16-server/kernel/arch/x86/kvm/kvm.ko
license: GPL
author: Qumranet
version: kvm-68
srcversion: 864F52BC7981C1E04A13D0E
depends:
vermagic: 2.6.24-16-server SMP mod_unload

# modinfo kvm-intel
filename: /lib/modules/2.6.24-16-server/kernel/arch/x86/kvm/kvm-intel.ko
license: GPL
author: Qumranet
version: kvm-68
srcversion: C45B26FD80918EB572A6BEF
depends: kvm
vermagic: 2.6.24-16-server SMP mod_unload
parm: bypass_guest_pf:bool
parm: enable_vpid:bool
parm: flexpriority_enabled:bool
parm: enable_ept:bool

Just out of curiosity, what's the supposed way to generate the modules package? correct me if I'm wrong but it seems as there is no an official way to update them without updating the whole kernel package... back then I asked on IRC but someone told me you were on vacation... ;-)

Revision history for this message
Yann Hamon (yannh) wrote :

I can confirm the problems with kvm-68 and drop of network under heavy load.

On KVM62 it is fine, *but* only with virtio; if I don't use virtio I get network errors which eventually cause big problems (md5 checksum mismatch, ssh connection drops).

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hi Yann-

I think your issue sounds like a different one. The previous posters found that kvm-68 "solved" the issue for them, and that it was kvm62 that had problems.

Would you mind filing a new bug with some additional information, such as the guest os and host os, exact kernel version, loaded modules, kvm version, nic you're emulating, etc?

:-Dustin

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Hello, per Michele Cella's comments on 2008-05-30: "Thanks for your attention, feel free to invalidate this bug report if the problems come back we will eventually reopen it and add more details.", I'm going to close this bug as "Invalid."

Yann- If you're still experiencing a similar problem, of KVM without virtio dropping network under heavy load, please open a new bug with the pertinent details I requested on 2008-12-04.

Thanks,
:-Dustin

Changed in kvm:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.