ovn-octavia-provider is not using load balancing algorithm source-ip-port

Bug #1871239 reported by Michael Johnson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Maciej Jozefczyk

Bug Description

When using the ovn-octavia-provider, OVN is not honoring the SOURCE_IP_PORT pool load balancing algorithm. The ovn-octavia-provider only supports the SOURCE_IP_PORT load balancing algorithm.

The following test was created for the SOURCE_IP_PORT algorithm in tempest:
octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_source_ip_port_tcp_traffic

Available in this patch: https://review.opendev.org/#/c/714004/

The test run shows that OVN is randomly distributing the connections from the same source IP and port across the backend member servers. One server is configured to return '1' and the other '5'.

Loadbalancer response totals: {'1': 12, '5': 8}

It should be seeing a result of:

Loadbalancer response totals: {'1': 20}

The attached files provide:

ovn-provider.pcap -- A pcap file capturing the test run.
ovn-tempest-output.txt -- The tempest console output.
tempest.log -- The tempest framework log from the test run.

Revision history for this message
Michael Johnson (johnsom) wrote :
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Thanks for the report.
It sounds more an RFE, am I wrong?
Another question: is this observed on master neutron and the migrated networking-ovn code?

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

Hello,

I think it is a inaccuracy in OVN Octavia provider/Networking-OVN and OVN.
We can observe the same behavior in both Networking-OVN and with dedicated OVN Octavia provider repository.

After checking the OVN and OVS code that is responsible for handling the Load Balancing, I can say following:

1) If there is a new TCP/UDP connection, meaning the same IP version, protocol, source ip, destination ip, source port, destination port and *new* TCP/UDP connection, the backend server will be chosen based on '5-tuple hash' match [1] [2] [3].
Based on the code [3] it is not understandable for me, why OVS choose different member in the proposed scenario:
octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_source_ip_port_tcp_traffic

Based on code the 5-tuple hash is created based on:
* Source IP
* Destination IP
* PROTOCOL
* Source and Destination port
* Arbitrary number - 42

So it should end with the same *hash*. I'm going to ask on #ovs-discuss mailing list why it is like this, and why the same LB member is not chosen.

2) If there is a new packet within same TCP/UDP connection, meaning the same IP version, protocol, source ip, destination ip, source port, destination port and the same TCP/UDP connection, the traffic will be NAT'ed to the same backend server as previous packets within this connection (it uses conntrack).

That is why for previous implementation of the octavia-tempest-tests, when HTTP requests withing same session were send, only one backend answered.
The tempest test was using not only the same SOURCE IP and SOURCE PORT, but also the same TCP connection. The condition [4] was met and it *always* chosen the same member.

Potential solutions:
1. Find potential issue in OVS/OVN and verify if there is possibility to fix it to work the way we thought it works.

2. If 1) is not possible: SOURCE_IP_PORT algorithm has been added in Train to Octavia and it is written that its supported by OVN Octavia provider only [5]. Maybe we can doc how it behaves and its limitation, as it is dedicated for OVN, and write tempest test that will cover this behavior?

3. If 1) and 2) will not be possible - I think we should find some more clear name for it, but this means we would need to somehow support stable/train and stable/ussuri deployments.

[1] https://github.com/openvswitch/ovs/blob/d58b59c17c70137aebdde37d3c01c26a26b28519/NEWS#L364-L371
[2] https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1059
[3] https://github.com/openvswitch/ovs/blob/74286173f4d7f51f78e9db09b07a6d4d65263252/lib/flow.c#L2217
[4] https://github.com/ovn-org/ovn/blob/master/lib/actions.c#L1022
[5] https://github.com/openstack/octavia/blob/master/releasenotes/notes/add-lb-algorithm-source-ip-port-ff86433143e43136.yaml

Revision history for this message
Daniel Alvarez (dalvarezs) wrote :

Thanks Michael for reporting and Maciej for the analysis.

Thanks Lajos, I see it more as a bug but the RFE version may apply as well:

1) It's a bug as OVN Octavia provider is claiming that it's using an algorithm which it is not.
At the very least we should document precisely how the underlying OVS action looks like to set the expectations of the users. Moreover, we should come up with a new algorithm name in the API that matches what OVN provider offers. To me, this'd be a way of fixing the reported bug.

This is assuming OVS/OVN works like this and it's the algorithm it uses. It works like this so it's not a bug *on OVN*. It doesn't seem to honor what Octavia offers in its list of supported algorithms but it's not a bug on the core OVN or OVS sides IMHO.

2) If the OVN Octavia provider needs to comply with a minimum set of algorithms, then we need to open an RFE to core OVN to support those.

Revision history for this message
Michael Johnson (johnsom) wrote :

This is not an RFE.

The OVN team specifically asked the Octavia team to add the "SOURCE-IP-PORT" as a new supported algorithm (none of the other drivers support it). It was significant work as even the reference driver does not support it, so not only was it a new feature, but required changes to a number of tests to accommodate that only one driver supported it. We specifically asked that the OVN team was sure this was needed given it is so unusual. By far the most commonly used and widely supported algorithm is round-robin, but we were told OVN would not be able to support it (pretty sure Daniel told us this) given the distributed design of OVN.

The Octavia team then noticed that the scenario test added for this algorithm was not actually testing it, so this issue was found when the test was corrected.

Scenario #2 above, with a maintained connection, is called session persistence, which is a different load balancing feature to the pool load balancing algorithm. If this is enabled by default in the OVN driver, we should also document this in the OVN driver release notes and documentation. It is not enabled by default on the other drivers.

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I agree with above comments. It doesn't seems like RFE but simple bug in ovn-octavia-provider and lets treat it as such.

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

We have a response from Numan from Core OVN team in ovs-discuss thread [1].

OVN choose dp_hash as selection method, but then OVS ignores that because OVN uses Openflow < 1.5, which doesn't enforce selection method. He is going to work on fixing this.

In next message Daniel found why we observed this working at the beginning. In OVS 2.10 the selecting method has changed. [2]

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049940.html
[2] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049941.html

Changed in neutron:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/726010
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ba16d2fc742bbb9c80f1aec2cbb6de8033d051e5
Submitter: Zuul
Branch: master

commit ba16d2fc742bbb9c80f1aec2cbb6de8033d051e5
Author: Flavio Fernandes <email address hidden>
Date: Wed May 6 17:58:26 2020 -0400

    [ovn] devstack needs to support openflow15

    OVN has been changed to use Openflow15 [1].

    Instead of creating br-int and setting the openflow version
    via ovn_agent script, it is better to delegate the bridge
    creation to the ovn-controller. Thus, having ovn-controller
    creating br-int addresses the potential version mismatch.

    [1]: https://github.com/ovn-org/ovn/commit/6ec0b82038052866533f12823fe410308b3e457a

    Change-Id: I62e4e98556c71312a7cf85b6246ddbecbc59a039
    Related-Bug: #1871239
    Closes-Bug: #1877195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (master)

Fix proposed to branch: master
Review: https://review.opendev.org/726787

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (master)

Reviewed: https://review.opendev.org/726787
Committed: https://git.openstack.org/cgit/openstack/ovn-octavia-provider/commit/?id=23d743a444a426379499193324723a9c7ab33734
Submitter: Zuul
Branch: master

commit 23d743a444a426379499193324723a9c7ab33734
Author: Maciej Józefczyk <email address hidden>
Date: Mon May 11 10:16:56 2020 +0000

    Add support for OVN LB selection fields

    Prior this patch OVN Octavia provider driver used
    default 5-tuple-hash algorithm which is pretty similar to
    SOURCE_IP_PORT.

    Unfornutelly because of the bug described here [1] it
    was not clear how 5-tuple-hash works and some inconsistencies
    between kernel and user space implementations were found.

    OVN recently added support for selective fields in OVN LB, to
    explicitly define what fields are being hashed to tackle this problem.

    This commit adds support for that kind of hashing. If installation
    of OVN on which OVN Octavia provider is running doesn't support
    selective fields - it will use old behavior.

    [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049896.html
    [2] https://github.com/ovn-org/ovn/commit/5af304e7478adcf5ac50ed41e96a55bebebff3e8

    Change-Id: I7b4ab99d1be2855e18b186557990c85f170ad548
    Closes-Bug: #1871239

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728416

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/728416
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=60212a1934e6bf38a4cb0a24c96c5b1ccb32f812
Submitter: Zuul
Branch: stable/ussuri

commit 60212a1934e6bf38a4cb0a24c96c5b1ccb32f812
Author: Flavio Fernandes <email address hidden>
Date: Wed May 6 17:58:26 2020 -0400

    [ovn] devstack needs to support openflow15

    OVN has been changed to use Openflow15 [1].

    Instead of creating br-int and setting the openflow version
    via ovn_agent script, it is better to delegate the bridge
    creation to the ovn-controller. Thus, having ovn-controller
    creating br-int addresses the potential version mismatch.

    [1]: https://github.com/ovn-org/ovn/commit/6ec0b82038052866533f12823fe410308b3e457a

    Change-Id: I62e4e98556c71312a7cf85b6246ddbecbc59a039
    Related-Bug: #1871239
    Closes-Bug: #1877195
    (cherry picked from commit ba16d2fc742bbb9c80f1aec2cbb6de8033d051e5)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (stable/ussuri)

Reviewed: https://review.opendev.org/728087
Committed: https://git.openstack.org/cgit/openstack/ovn-octavia-provider/commit/?id=fdc7776445d735905005a173e6b8a15b4b374a9d
Submitter: Zuul
Branch: stable/ussuri

commit fdc7776445d735905005a173e6b8a15b4b374a9d
Author: Maciej Józefczyk <email address hidden>
Date: Mon May 11 10:16:56 2020 +0000

    Add support for OVN LB selection fields

    Prior this patch OVN Octavia provider driver used
    default 5-tuple-hash algorithm which is pretty similar to
    SOURCE_IP_PORT.

    Unfornutelly because of the bug described here [1] it
    was not clear how 5-tuple-hash works and some inconsistencies
    between kernel and user space implementations were found.

    OVN recently added support for selective fields in OVN LB, to
    explicitly define what fields are being hashed to tackle this problem.

    This commit adds support for that kind of hashing. If installation
    of OVN on which OVN Octavia provider is running doesn't support
    selective fields - it will use old behavior.

    [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-April/049896.html
    [2] https://github.com/ovn-org/ovn/commit/5af304e7478adcf5ac50ed41e96a55bebebff3e8

     Conflicts:
            ovn_octavia_provider/common/constants.py
            ovn_octavia_provider/helper.py
            ovn_octavia_provider/tests/functional/base.py
            ovn_octavia_provider/tests/unit/test_helper.py

    Change-Id: I7b4ab99d1be2855e18b186557990c85f170ad548
    Closes-Bug: #1871239
    (cherry picked from commit 23d743a444a426379499193324723a9c7ab33734)

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.