Network Performance Problem with GRE using Openvswitch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Wishlist
|
Unassigned | ||
neutron |
Fix Released
|
Medium
|
Unassigned |
Bug Description
We are having GRE performance issues with Juno installation. From VM to network node, we can only get 3Gbit on 10Gbit interface. Finally, I tracked and solved the issue but that requires patches to nova and neutron-
The isssue is caused by MTU setting and lack of multiqueue net support in kvm. As official openstack documentation suggests, MTU settings are 1500 by default. This creates a bottleneck in VMs and it's only possible to process 3Gbit network traffic with 1500 MTU and without MQ support enabled in KVM.
What I did to solve the issue:
1- Set physical interface (em1) mtu to 9000
2- Set network_device_mtu = 8950 in nova and neutron.conf (both on compute/network nodes)
3- Set br-int mtu to 8950 manually
4- Set br-tun mtu to 8976 manually
5- Set VM MTU to be 8950 in dnsmasq-
6- Patch nova config code to add <device driver='vhost' queue='4'> element in libvirt.xml
7- Run "ethtool -L eth0 combined 4" in VMs
With network_device_mtu setting, tap/qvo/qvb in compute nodes and internal legs in the router/dhcp namespace in network node can be set automatically. However, it only solves half of the problem. I still need to set mtu to br-int and br-tun interfaces.
To enable MQ support in KVM, I needed to patch nova. Currently, there is no possible way to set queues in libvirt.xml. Without MQ support, even if jumbo frames are enabled, VMs are limited to 5Gbit. This is because of the fact that [vhost-xxxx] process is bound to one CPU and network load cannot be distributed to other CPUs. When MQ is enabled, [vhost-xxxx] can be distributed to other cores, which gives 9.3Gbit performance.
I am adding my ugly hacks just to give some idea on code change. I know that it is not a right way. Let's discuss how to properly address this issue.
Should I open another related bug to nova as this issue needs a change in nova code as well?
Note: this is a different bug than https:/
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |
For the full history how how I came up with a solution, please take a look at this thread on openstack mailing list: http:// lists.openstack .org/pipermail/ openstack/ 2015-January/ 011207. html
This issue is solved by "ethtool -K em1 tx off". And the performance problem is solved as I explained in the bug report.