MOS 9.2 ovs-dpdk performance test

Bug #1705435 reported by Xiwen Deng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Xiwen Deng

Bug Description

In MOS 9.2, when test RFC2544 zero frame lossing, dpdk performance result is low.

In env there are two dpdk interfaces and a VM. VM have two nics and each nic have two queues. And dpdk interface config two queues too.

Configures of Env below:
top -p `pidof ovs-vswitchd` -H -d1
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15924 root 10 -10 27.159g 296408 9956 R 99.9 0.1 34:34.80 pmd222
15925 root 10 -10 27.159g 296408 9956 R 99.9 0.1 34:34.80 pmd219
15922 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.79 pmd223
15923 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.79 pmd224
15928 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.80 pmd218
15929 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.79 pmd221
15930 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.79 pmd220
15931 root 10 -10 27.159g 296408 9956 R 99.8 0.1 34:34.80 pmd217

root@compute-3:~# ovs-vsctl get open_vswitch . other_config
{dpdk-extra="-n 2 --vhost-owner libvirt-qemu:kvm --vhost-perm 0664", dpdk-init="true", dpdk-lcore-mask="0x400", dpdk-socket-mem="8192,1", max-idle="50000", pmd-cpu-mask="0x1e0001e000"}

root@compute-3:~# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 1 core_id 35:
 isolated : true
 port: vhu82a81743-88 queue-id: 1
pmd thread numa_id 1 core_id 33:
 isolated : true
 port: dpdk0 queue-id: 1
pmd thread numa_id 1 core_id 14:
 isolated : true
 port: dpdk1 queue-id: 0
pmd thread numa_id 1 core_id 15:
 isolated : true
 port: vhu82a81743-88 queue-id: 0
pmd thread numa_id 1 core_id 16:
 isolated : true
 port: vhueee4b3fb-32 queue-id: 0
pmd thread numa_id 1 core_id 34:
 isolated : true
 port: dpdk1 queue-id: 1
pmd thread numa_id 1 core_id 13:
 isolated : true
 port: dpdk0 queue-id: 0
pmd thread numa_id 1 core_id 36:
 isolated : true
 port: vhueee4b3fb-32 queue-id: 1

root@compute-3:~# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 1 core_id 35:
 emc hits:0
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:0
 lost:0
 polling cycles:183391851052 (100.00%)
 processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 33:
 emc hits:5169955
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:500
 lost:0
 polling cycles:123787408521 (80.77%)
 processing cycles:29463516747 (19.23%)
 avg cycles per packet: 29639.74 (153250925268/5170455)
 avg processing cycles per packet: 5698.44 (29463516747/5170455)
main thread:
 emc hits:3
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:0
 lost:0
 polling cycles:21534522 (99.88%)
 processing cycles:25472 (0.12%)
 avg cycles per packet: 7186664.67 (21559994/3)
 avg processing cycles per packet: 8490.67 (25472/3)
pmd thread numa_id 1 core_id 14:
 emc hits:5160183
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:502
 lost:2
 polling cycles:125545461034 (80.41%)
 processing cycles:30583715341 (19.59%)
 avg cycles per packet: 30253.58 (156129176375/5160685)
 avg processing cycles per packet: 5926.29 (30583715341/5160685)
pmd thread numa_id 1 core_id 15:
 emc hits:0
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:0
 lost:0
 polling cycles:182558896290 (100.00%)
 processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 16:
 emc hits:0
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:0
 lost:0
 polling cycles:182211680516 (100.00%)
 processing cycles:0 (0.00%)
pmd thread numa_id 1 core_id 34:
 emc hits:5162238
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:501
 lost:1
 polling cycles:123743090602 (80.82%)
 processing cycles:29366438623 (19.18%)
 avg cycles per packet: 29656.65 (153109529225/5162739)
 avg processing cycles per packet: 5688.15 (29366438623/5162739)
pmd thread numa_id 1 core_id 13:
 emc hits:5167918
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:500
 lost:0
 polling cycles:125398694242 (80.39%)
 processing cycles:30587837876 (19.61%)
 avg cycles per packet: 30180.71 (155986532118/5168418)
 avg processing cycles per packet: 5918.22 (30587837876/5168418)
pmd thread numa_id 1 core_id 36:
 emc hits:0
 megaflow hits:0
 avg. subtable lookups per hit:0.00
 miss:0
 lost:0
 polling cycles:182655421216 (100.00%)
 processing cycles:0 (0.00%)

From the pmd-stats-show we can find some pmd pin cores(15,16,35,36) are not polling. Only four PMD pinning core process packets.

Why only four pmd pin cores process packets?

Xiwen Deng (deng-xiwen)
description: updated
Xiwen Deng (deng-xiwen)
description: updated
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Xiwen, could you please provide an example of expected behavior? How low the dpdk performance? Do you experience packet loss, heavy jitter or maybe increased latency? What are the throughput numbers? Are you measuring the forwarding inside a VM? Or is it bridging? Are there any iptables/ebtables rules inside this VM? It would also be great to show some kind of a deployment scheme. Please answer all the questions above.

Changed in fuel:
assignee: nobody → Xiwen Deng (deng-xiwen)
status: New → Incomplete
importance: Undecided → High
milestone: none → 9.x-updates
Revision history for this message
Xiwen Deng (deng-xiwen) wrote :

Hello Denis,

My dpdk performance test have a physical traffic generator and a compute node. Compute node have two 10G nics. A vm with 5 cores and 8G ram in the compute node. In the vm there is a test-pmd running.

We test RFC2544 and packet sizes are 64,128,256,512,1024.

Test results is below:
https://drive.google.com/file/d/0By7pI-rd3Q4hNjVwV21HaW1vRkU/view?usp=sharing

From test result we can find 64 bytes zero frame loss is about 4.867%. The main packet loss at dpdk interfaces.

VM don't have any iptables rules.

I don't why there are 8 pmd cores but only 4 pmd cores processing packets.

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

Hello,

According to documentation [1] rx / tx queues can't be shared among multiple logical cores, one queue might be processes by one core. So if you have 4 queues total then only 4 core will process packets.

[1] http://dpdk.org/doc/guides-16.04/prog_guide/poll_mode_drv.html#generalities

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

Avoiding lock contention is a key issue in a multi-core environment. To address this issue, PMDs are designed to work with per-core private resources as much as possible. For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not DEV_TX_OFFLOAD_MT_LOCKFREE capable. In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).

[1] http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.