Major network performance problems on AMD hardware

Bug #1036363 reported by Ziemowit Pierzycki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Won't Fix
Undecided
Unassigned
qemu-kvm
New
Undecided
Unassigned

Bug Description

Hi,

I am experiencing some major performance problems with all of our beefy AMD Opteron 6274 servers running Fedora 17 (kernel 3.4.4-5.fc17.x86_64, qemu 1.0-17). The network performance between host and the virtual machine is terrible:

# iperf -c 10.10.11.22 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.10.11.22, TCP port 5001
TCP window size: 197 KByte (default)
------------------------------------------------------------
[ 5] local 10.10.11.199 port 44192 connected with 10.10.11.22 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 2.45 GBytes 2.11 Gbits/sec
[ 4] local 10.10.11.199 port 5001 connected with 10.10.11.22 port 42601
[ 4] 0.0-10.0 sec 8.97 GBytes 7.71 Gbits/sec

So the VM's receive is super slow. I would be happy with 7.71 Gbps because it's closer to matching the speed of the 10G ethernet adapters but the iSCSI drive's write performance is few times faster than read. Now running a similar test on the slowest machine I have, Intel core i3 I see this:

# iperf -c 192.168.7.60 -r
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.7.60, TCP port 5001
TCP window size: 306 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.7.98 port 53992 connected with 192.168.7.60 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 22.5 GBytes 19.3 Gbits/sec
[ 4] local 192.168.7.98 port 5001 connected with 192.168.7.60 port 53339
[ 4] 0.0-10.0 sec 25.1 GBytes 21.5 Gbits/sec

As you can image this is a huge difference in network IO. Most setups are identical down to the same versions. Vhost-net is enabled and it appears to use MSI-X on the VM. I've tried all kinds of settings and while they improve performance a little I feel it's just masking a bigger problem. All 12 of my AMD servers have this issue and it appears I'm not the only one complaining. Any help would be appreciated. Thanks.

Tags: kvm qemu virtio
Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

I would also like to add that a VM running Windows 2008 R2 is having the same identical problem too.

Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

Well this is embarrasing. Other Intel KVMs are having the same problem. The test where I got 20 gbps was actually on a Fedora 16 VM running on Fedora 16 KVM. I reinstalled both KVM and VM with Fedora 17 and best I got was 4 gbps.

Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

I ran another test and here is a recap:

F16 KVM <-- 20gbps --> F16 VM
F17 KVM <-- 4 gbps --> 16 VM
F17 KVM <-- 4 gbps --> 17 VM

I'll check F16 KVM with both F16 and F17 VMs. This is all done on my Intel core i3.

Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

Executed another test:

F16 KVM <-- 15 gbps --> F17 VM

So why is F16 much faster?

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Qemu-devel] [Bug 1036363] Re: Major network performance problems on AMD hardware

On Sun, Aug 26, 2012 at 1:08 AM, Ziemowit Pierzycki
<email address hidden> wrote:
> Executed another test:
>
> F16 KVM <-- 15 gbps --> F17 VM
>
> So why is F16 much faster?

You could try "bisecting" to find the change that slowed down networking:

$ git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
$ cd qemu-kvm

# Test qemu-kvm 0.15.0, which F16 qemu-kvm is based on
$ git checkout qemu-kvm-0.15.0
$ ./configure && make
$ ...run test...

# Test qemu-kvm 1.0, which F17 qemu-kvm is based on
$ git checkout qemu-kvm-1.0
$ ./configure && make
$ ...run test...

If the 0.15.0 vs 1.0 results show the same change as F16 vs F17, then
you can use git-bisect(1):

$ git bisect start qemu-kvm-1.0 qemu-kvm-0.15.0

http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
http://git-scm.com/book/en/Git-Tools-Debugging-with-Git#Binary-Search

The bisect will leave you at a commit where the performance regression
was introduced.

Stefan

Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

Thank Stefan,

I compiled both 0.15 and 1.0 and they do not have that problem... but Fedora package does. Perhaps the way Fedora package was compiled? I'm going to grab a source package and attempt to compile from that.

Revision history for this message
Ziemowit Pierzycki (ziemowit) wrote :

Okay, looks like the performance issue started with introduction of pc-0.15 machine profile in version 1.0.1. I'll narrow the problem down further.

Revision history for this message
Thomas Huth (th-huth) wrote :

Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU? Or could we close this ticket nowadays?

Changed in qemu:
status: New → Incomplete
Thomas Huth (th-huth)
Changed in qemu:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.