[OVN] QoS gives different bandwidth limit measures than ml2/ovs

Bug #1866039 reported by Maciej Jozefczyk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Maciej Jozefczyk

Bug Description

There is a difference in QoS tempest tests results between ml2/ovs and ml2/ovn.

In the change [1] that enables QoS tempest tests for OVN the test neutron_tempest_plugin.scenario.test_qos.QoSTest.test_qos_basic_and_update
fails on the last check [2], after the policy is updated to be configured with values:

max_kbps=constants.LIMIT_KILO_BITS_PER_SECOND * 3
max_burst_kbps=constants.LIMIT_KILO_BITS_PER_SECOND * 3,

Which means:
max_kbps = 3000
max_burst_kbps = 3000

Previous QoS validations in this test passes with values (max_kbps, max_burst_kbps): (1000, 1000) and (2000, 2000).

I added some more debug log to the tempest test here [3], so that we can compare test expected and measured values. Those are taken from test runs from gates.

-----------------------------------------------------------------------
Expected is calculated as:
TOLERANCE_FACTOR = 1.5
constants.LIMIT_KILO_BITS_PER_SECOND = 1000
MULTIPLEXING_FACTOR = 1 or 2 or 3 depends on stage of the test

    LIMIT_BYTES_SEC = (constants.LIMIT_KILO_BITS_PER_SECOND * 1024 *
                       TOLERANCE_FACTOR / 8.0) * MULTIPLEXING_FACTOR
-----------------------------------------------------------------------
Results:
If expected <= measured, the test passes.

|max_kbps/max_burst_kbps|expected(bps)|ovs(bps)|ovn(bps)|linux_bridge(bps)|
|(1000, 1000)|192000|112613|141250|129124|
|(2000, 2000)|384000|311978|408886, 411005, 385152, 422114, 352903|300163|
|(3000, 3000)|576000|523677|820522,..... failed|459569|

As we see only for (3000, 3000) OVN test failed. For (2000, 2000) it passed after 5 retries.

-----------------------------------------------------------------------

So lets see how the QoS is configured on OVN nowadays:

stack@mjozefcz-devstack-qos-2:~/logs$ neutron qos-bandwidth-limit-rule-list 047f7a8c-e143-471f-979c-4a4d95cefa5e
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-----------+--------------------------------------+----------------+----------+
| direction | id | max_burst_kbps | max_kbps |
+-----------+--------------------------------------+----------------+----------+
| egress | 9dd84dc7-f216-432f-b1aa-ec17eb488720 | 3000 | 3000 |
+-----------+--------------------------------------+----------------+----------+

Configured OVN NBDB:
stack@mjozefcz-devstack-qos-2:~/logs$ ovn-nbctl list qos
_uuid : 1176fe8f-695d-4f79-a99f-f0df8a7b8652
action : {}
bandwidth : {burst=3000, rate=3000}
direction : from-lport
external_ids : {}
match : "inport == \"4521ef05-d139-4d84-a100-efb83fde2b47\""
priority : 2002

Configured meter on bridge:
stack@mjozefcz-devstack-qos-2:~/logs$ sudo ovs-ofctl -O OpenFlow13 dump-meters br-int
OFPST_METER_CONFIG reply (OF1.3) (xid=0x2):
meter=1 kbps burst stats bands=
type=drop rate=3000 burst_size=3000

Flow in bridge:
stack@mjozefcz-devstack-qos-2:~/logs$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int | grep meter
 cookie=0x398f0e17, duration=71156.273s, table=16, n_packets=136127, n_bytes=41572857, priority=2002,reg14=0x4,metadata=0x1 actions=meter:1,resubmit(,17)

--------------------------------------------------------------------------

Questions:
* Why the test results are different compared to ml2/OVS?
* Maybe burst values should be configured differently?

[1] https://review.opendev.org/#/c/704833/
[2] https://github.com/openstack/neutron-tempest-plugin/blob/328edc882a3debf4f1b39687dfb559d7c5c385f3/neutron_tempest_plugin/scenario/test_qos.py#L271
[3] https://review.opendev.org/#/c/711048/

Tags: ovn qos
Changed in neutron:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
summary: - [OVN] QoS gives different burst limit values
+ [OVN] QoS gives different bandwidth limit values
summary: - [OVN] QoS gives different bandwidth limit values
+ [OVN] QoS gives different bandwidth limit measures than ml2/ovs
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

Ok, We figured out whats wrong.

The download speed of first batch of the data could be bigger than expected, and it causes slightly different measured values than expected. See the example, the limit has been set to 5Mbit/s:

root@mjozefcz-devstack-qos-2:~# iperf3 -c 172.24.5.99 -R
Connecting to host 172.24.5.99, port 5201
Reverse mode, remote host 172.24.5.99 is sending
[ 4] local 172.24.5.1 port 59692 connected to 172.24.5.99 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 1.11 MBytes 9.32 Mbits/sec
[ 4] 1.00-2.00 sec 628 KBytes 5.15 Mbits/sec
[ 4] 2.00-3.00 sec 530 KBytes 4.34 Mbits/sec
[ 4] 3.00-4.00 sec 620 KBytes 5.08 Mbits/sec
[ 4] 4.00-5.00 sec 646 KBytes 5.30 Mbits/sec
[ 4] 5.00-6.00 sec 531 KBytes 4.35 Mbits/sec
[ 4] 6.00-7.00 sec 619 KBytes 5.07 Mbits/sec
[ 4] 7.00-8.00 sec 547 KBytes 4.48 Mbits/sec
[ 4] 8.00-9.00 sec 669 KBytes 5.48 Mbits/sec
[ 4] 9.00-10.00 sec 632 KBytes 5.18 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 6.61 MBytes 5.54 Mbits/sec 933 sender
[ 4] 0.00-10.00 sec 6.41 MBytes 5.37 Mbits/sec receiver

iperf Done.
root@mjozefcz-devstack-qos-2:~#

The average bandwidth is: 5.54 Mbits/sec, which is upper the limit.

But if we'll omit the first batch of data:

root@mjozefcz-devstack-qos-2:~# iperf3 -O 1 -c 172.24.5.99 -R
Connecting to host 172.24.5.99, port 5201
Reverse mode, remote host 172.24.5.99 is sending
[ 4] local 172.24.5.1 port 59402 connected to 172.24.5.99 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 1.08 MBytes 9.04 Mbits/sec (omitted)
[ 4] 0.00-1.00 sec 554 KBytes 4.53 Mbits/sec
[ 4] 1.00-2.00 sec 644 KBytes 5.27 Mbits/sec
[ 4] 2.00-3.00 sec 616 KBytes 5.05 Mbits/sec
[ 4] 3.00-4.00 sec 648 KBytes 5.31 Mbits/sec
[ 4] 4.00-5.00 sec 495 KBytes 4.05 Mbits/sec
[ 4] 5.00-6.00 sec 626 KBytes 5.12 Mbits/sec
[ 4] 6.00-7.00 sec 650 KBytes 5.33 Mbits/sec
[ 4] 7.00-8.00 sec 523 KBytes 4.29 Mbits/sec
[ 4] 8.00-9.00 sec 650 KBytes 5.33 Mbits/sec
[ 4] 9.00-10.00 sec 520 KBytes 4.26 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.83 MBytes 4.89 Mbits/sec 809 sender
[ 4] 0.00-10.00 sec 5.79 MBytes 4.85 Mbits/sec receiver

Then the avg is withing the limit: 4.85 Mbits/sec.

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.opendev.org/711048
Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=41b8019c7b4b3921a077a032463b9d8c74957b4b
Submitter: Zuul
Branch: master

commit 41b8019c7b4b3921a077a032463b9d8c74957b4b
Author: Maciej Józefczyk <email address hidden>
Date: Tue Mar 3 17:10:57 2020 +0100

    QoS - Change the way we measure bw limits

    This patch introduces new way of fetching the data.
    Instead creating the file, it reads /dev/zero.
    /dev/zero is always very fast, so we also break the
    previous hard disk limitations.

    The test time is limited to 5 seconds. After that we
    calculate avg bytes per second value and compare it
    to expected one.

    Sometimes it is visible that first kilobytes of the
    test file are downloaded a little bit faster than the
    actual bw limit claims, especially while testing OVN
    as a backend.
    When it happens the avg bytes per second value that is
    measured in the test could be higher than required limit.

    It is pretty easy to show the case while testing QoS with iperf3:

    Accepted connection from 172.24.5.1, port 59690
    [ 5] local 10.1.0.35 port 5201 connected to 172.24.5.1 port 59692
    [ ID] Interval Transfer Bandwidth Retr Cwnd
    [ 5] 0.00-1.00 sec 1.32 MBytes 11.0 Mbits/sec 139 2.62 KBytes
    [ 5] 1.00-2.00 sec 628 KBytes 5.15 Mbits/sec 96 10.5 KBytes
    [ 5] 2.00-3.00 sec 502 KBytes 4.12 Mbits/sec 84 7.85 KBytes
    [ 5] 3.00-4.00 sec 649 KBytes 5.32 Mbits/sec 83 10.5 KBytes
    [ 5] 4.00-5.00 sec 643 KBytes 5.26 Mbits/sec 84 3.93 KBytes
    [ 5] 5.00-6.00 sec 529 KBytes 4.33 Mbits/sec 73 5.23 KBytes
    [ 5] 6.00-7.00 sec 628 KBytes 5.15 Mbits/sec 92 20.9 KBytes
    [ 5] 7.00-8.00 sec 534 KBytes 4.37 Mbits/sec 82 18.3 KBytes
    [ 5] 8.00-9.00 sec 667 KBytes 5.47 Mbits/sec 110 7.85 KBytes
    [ 5] 9.00-10.00 sec 635 KBytes 5.20 Mbits/sec 90 11.8 KBytes
    [ 5] 10.00-10.02 sec 0.00 Bytes 0.00 bits/sec 0 11.8 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval Transfer Bandwidth Retr
    [ 5] 0.00-10.02 sec 6.61 MBytes 5.53 Mbits/sec 933 sender
    [ 5] 0.00-10.02 sec 6.41 MBytes 5.36 Mbits/sec receiver
    -----------------------------------------------------------

    We can find out that during first second of the test the bw limit
    is exceeded, but after that the traffic is shaped.

    In our case when we run the tempest QoS test the avg bytes per second
    measured value that we compare with bw limit is impacted.

    Closes-Bug: 1866039

    Change-Id: I0964464e709baf9958548384933bd000fdee979b

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron-tempest-plugin 1.0.0

This issue was fixed in the openstack/neutron-tempest-plugin 1.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.