DPDK: Inter VM communication of iperf3 TCP throughput is very low on same host compare to non DPDK throughput

Bug #1651727 reported by Rajalakshmi Prabhakar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DPDK
New
Undecided
Unassigned
devstack
Invalid
Undecided
Unassigned
networking-ovs-dpdk
Incomplete
Medium
Unassigned

Bug Description

Host - ubuntu16.04
devstack - stable/newton
which install DPDK 16.07 and OVS 2.6 versions

with DPDK plugin and following DPDK configurations

Grub changes

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash default_hugepagesz=1G hugepagesz=1G hugepages=8 iommu=pt intel_iommu=on"

local.conf - changes for DPDK

enable_plugin networking-ovs-dpdk https://git.openstack.org/openstack/networking-ovs-dpdk master
OVS_DPDK_MODE=controller_ovs_dpdk
OVS_NUM_HUGEPAGES=8
OVS_CORE_MASK=2
OVS_PMD_CORE_MASK=4
OVS_DPDK_BIND_PORT=False
OVS_SOCKET_MEM=2048
OVS_DPDK_VHOST_USER_DEBUG=n
OVS_ALLOCATE_HUGEPAGES=True
OVS_HUGEPAGE_MOUNT_PAGESIZE=1G
MULTI_HOST=1
OVS_DATAPATH_TYPE=netdev

before VM creation

#nova flavor-key m1.small set hw:mem_page_size=1048576

Able to create two ubuntu instance in flavor m1.small

Achieved iperf3 tcp throughput of ~7.5Gbps
Ensured the vhostport is created and HugePage is consumed at the end of 2VM created each of 2GB ie 4GB for VMs and 2GB for socket totally 6GB

$ sudo cat /proc/meminfo |grep Huge
AnonHugePages: 0 kB
HugePages_Total: 8
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB

The same scenario carried for without DPDK case of openstack and achieved higher throughput of ~19Gbps, which is contradictory to the expected results. Kindly suggest me what additional DPDK configuration to be done for high throughput. Also tried cpu pinning and multi queue for OpenStack DPDK but no improvement in the result.

description: updated
Revision history for this message
Sergey Matov (smatov) wrote :

Hello. Just several questions about setup you are running.

1 - What is NUMA topology on your host?
2 - If there is NUMA topology, how PMD threads are spreading across NUMA nodes?

Best performance might be achieved if all PMD threads (as well as all vCPU pinned to VMs) are located on a same NUMA node.

For more detailed information about performance tuning please refer to https://github.com/openvswitch/ovs/blob/branch-2.6/INSTALL.DPDK-ADVANCED.md

Revision history for this message
Rajalakshmi Prabhakar (raji2009) wrote :

Hello Matov,
 Thank you. Test PC is single NUMA only.I am not doing NIC binding as only trying to validate inter-VM communication in same host. PFB my PC configurations.

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 1212.000
CPU max MHz: 2400.0000
CPU min MHz: 1200.0000
BogoMIPS: 4794.08
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g b rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_t sc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline _timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ep t vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

I am following INSTALL.DPDK.ADVANCED.md but no clue on low throughput.

Revision history for this message
Sergey Matov (smatov) wrote :

Hello. Thanks for the information.

Let's try following scenario:
- Enable CPU pinning for Nova-compute, specify CPU list in nova.conf
- Isolate CPUs for nova in /etc/default/grub (reboot requires)
- Create flavor with 2 vCPUs and HugePages
- Spawn 2 VMs

Then you can use taskset command to run Iperf3 client/server on specific vCPU.
Also make sure that OVS pmd threads are not affected by nova vcpu pinning.

But generally speaking I can assume that this issue is not related to networking-ovs-dpdk. For more useful information you probably should take discussion in DPDK-users or OVS-discuss mailing lists.

Revision history for this message
Rajalakshmi Prabhakar (raji2009) wrote :

Hello Matov,
Thanks for the suggestion, soon I will share the results of tcp iperf3 with taskset. The intention to post in networking-ovs-dpdk is to get inputs on all DPDK configurations in local.conf and to check for its correctness.

Changed in devstack:
status: New → Invalid
Revision history for this message
sean mooney (sean-k-mooney) wrote :

removing devstack as all dpdk logic for devstack is provided by the networking-ovs-dpdk plugin.

as far as i am aware the DPDK launchpad project is not used so this should be repoted to the dpdk mailing list.

networking-ovs-dpdk installs a wrapper file around qemu to workaround limitaition in old nova and qemu versions.

https://github.com/openstack/networking-ovs-dpdk/blob/d20da35a272e937802680ce8084d97c60ef629cc/devstack/libs/ovs-dpdk#L79-L128

with mitaka and qemu 2.3+ certenly 2.5+ there is no reason that i am aware of to continue to use this wrapper.

we keep it in the repo to support older software revisions.

one of the things that was broken with older version of qemu was the offload negotiation.
as such we forcable disable all offloads
VIRTIO_OPTIONS="csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off"
can you try removing
https://github.com/openstack/networking-ovs-dpdk/blob/d20da35a272e937802680ce8084d97c60ef629cc/devstack/libs/ovs-dpdk#L98-L103
and which will allow qemu to negotiate what offloads are available such as tso or check sum offloads.

Changed in networking-ovs-dpdk:
status: New → Triaged
importance: Undecided → Medium
Changed in networking-ovs-dpdk:
status: Triaged → Incomplete
Revision history for this message
Michael Qiu (qdy220091330) wrote :

The root cause of the issue is that currently ovs-dpdk does not support TSO offload. With the offload enable, the throughput will not lower than traditional path

Revision history for this message
Rajalakshmi Prabhakar (raji2009) wrote :

Hello All,
Thanking you. Now I got clarity in the throughput difference. Yes, by default TOS is enabled in the test VM for without DPDK case and in DPDK TOS is disabled. so my test setup for comparing throughput with and without DPDK is not same.

Revision history for this message
Rajalakshmi Prabhakar (raji2009) wrote :

Its TCP segment Offload TSO not TOS

Revision history for this message
Andreas Karis (akaris) wrote :

Hi,

What is the current state about the ovs-dpdk TSO offload feature? Is this planned any time soon?

The only things that I could find so far are:
A Request For Comments was created by Intel and the TSO feature was showcased, but this feature is not yet implemented:
https://mail.openvswitch.org/pipermail/ovs-dev/2016-June/316414.html
https://ovs2016.sched.com/event/8aZf/optimizing-communications-grade-tcp-workloads-in-an-ovs-based-nfv-deployment-mark-kavanagh-intel
https://www.youtube.com/watch?v=hEP0_Bd3wrA

Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.