[OVN] SRIOV routing on VLAN Tenant networks

Bug #1875852 reported by Lucas Alvares Gomes
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Wishlist
Unassigned

Bug Description

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1826364

<snipet>

Right now, the SRIOV support with ML2/OVN is limited to:

1) SRIOV ports on provider networks with external DHCP
2) SRIOV ports on provider networks with OVN DHCP and OVN Metadata service
3) SRIOV ports on VLAN tenant networks and E/W Neutron routing

This BZ is to track the implementation of a 4th scenario that covers:

4) SRIOV ports on VLAN tenant networks and N/S Neutron routing with and without FIPs

There are two ways of achieving this (possibly more) but let me explain why it doesn't work right now.

SRIOV ports are mapped into OVN 'external' ports that are all scheduled into one controller (or network node). Example:

CH1: compute node where SRIOV VM1 (192.168.1.10 - FIP: 10.0.0.10) is running
CH2: chassis where OVN external port is bound to
CH3: chassis where gateway port is bound to
CH4: chassis on the provider network - external

PING from CH4 to VM1:
CH4 -> CH3 -> CH2 -> CH1
When an external node CH4 pings the FIP of the VM, the traffic will go to CH3 which will perform the NAT and route the traffic to CH1 which will send it to the SRIOV NIC at CH1.

As the ICMP request is delivered to the VM, the VM will try to resolve the router interface IP (e.g 192.168.1.1) and will send an ARP broadcast request on the VLAN tenant network.

Right now, this ARP packet will be unanswered because:

* There are flows to drop the ARP packet from the external port VM for the router IP on all chassis except the chassis claiming the external port, so ideally CH2 would reply. However,
* Router ports have the 'reside-on-redirect-chassis' that will make the VLAN traffic centralized [0], meaning that only the chassis hosting the gateway port (CH3 in our example) would reply to it.

In this context we have two possibilities to get this working:

1) Co-locating external and gateway ports. This is non trivial as it may require moving things around that would cause dataplane disruption.

For example: when the external port is first created, it'll be scheduled on CH1 (no gateways involved yet). However, if the network that it belongs to is later attached to a router with a gateway, it may require moving the external port to achieve that co-location with the gateway port. Moving the external port can create disruption as DHCP/metadata will be unavailable for a certain window of time until everything settles.
This time window is unknown and clearly depends on factors such as how many ports need to be moved.

In this scenario, the packet flow in the example above would go this way:

Echo request: CH4 -> CH3 (gateway & external port) -> CH1
Echo reply: CH1 -> CH3 (gateway & external port) -> CH4

2) Supporting distributed traffic on VLAN tenant networks: Tracked here [1]
In this case, there's no need to co-locate things as routing can happen automatically where the external port is bound. This eliminates the burden explained at 1).

Option number 2) seems the more reasonable and efficient way of achieving N/S routing for SRIOV ports on ML2/OVN. Hence I'm marking this bug as dependent on [1] and TestOnly for validation.

[0] https://opendev.org/openstack/networking-ovn/src/tag/7.1.0/networking_ovn/common/ovn_client.py#L1406
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1766930

</snipet>

Tags: ovn
Changed in neutron:
status: New → Confirmed
Akihiro Motoki (amotoki)
summary: - [OVN] SRIOV routing on VLAN Tenant networks
+ [RFE] [OVN] SRIOV routing on VLAN Tenant networks
Changed in neutron:
importance: Undecided → Wishlist
tags: removed: rfe
summary: - [RFE] [OVN] SRIOV routing on VLAN Tenant networks
+ [OVN] SRIOV routing on VLAN Tenant networks
Revision history for this message
Liu Xie (liushy) wrote :

As you described,I think all traffics related vlan-network would be dropped. Including East-West routing:
3) SRIOV ports on VLAN tenant networks and E/W Neutron routing.
Do others have any comments?

Revision history for this message
Liu Xie (liushy) wrote :

hi,
I focus this BZ many days, and done a lot of worker.
1) I make new version (21.03.0) for ovn rpm pkgs by [1], and install those rpm pkgs at my environment.
2) Remove the 'reside-on-redirect' option at neutron [2]. Add option redirect-type value is 'bridged' like this [3]:
        if is_gw_port and network.get(pnet.NETWORK_TYPE) == const.TYPE_VLAN:
            options['redirect-type'] = 'bridged'

3) Set local OVSDBs with 'external_ids:ovn-chassis-mac-mappings' for every chassis.
4) Create two vlan tenant networks named net1 and net2 those subnets attach two different routers separately.
4)Set the router gateway by the same external vlan network.
5) Create VM1 at net1, and associated a floating-ip named fp1 (no distributed_floating_ip).
6) Create VM2 at net2
7) VM2 ping fp1, traffic failed.
8) Modify redirect-type value with 'overlay', traffic pass.

I think ovn not completely support centralized floating-ip at now at the scenarios of Vlan backed DVR. This matter also impact sriov.

[1]https://docs.ovn.org/en/latest/intro/install/fedora.html
[2]https://github.com/openstack/neutron/blob/3cbe340846cb00e542afbad238207186cc22a858/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1340
[3]https://github.com/ovn-org/ovn/commit/03493b33c073887a81ba90c84b5e063140712719

Revision history for this message
Vasyl Saienko (vsaienko) wrote :
Download full text (3.3 KiB)

This bug is caused by https://bugs.launchpad.net/neutron/+bug/1995078 and in detail described by Rodolfo here https://rodolfo-alonso.com/ha_router_gateway_ports_in_ovn

When network and router port are scheduled to different chassis we have this issue

router 00efb63c-2862-4b68-a11d-5b14f59293cc (neutron-22b8386a-c778-47da-8c5d-f899546ab74f) (aka sriov-57de5898-54f6-46e9-b9c0-aeebd873d1eb-router-erbibzwvbovu)
    port lrp-76256f0e-e665-4b1d-8958-17f0e794f161
        mac: "fa:16:3e:2f:a1:90"
        networks: ["192.168.96.1/24"]
    port lrp-34ff4b9c-5953-4853-828d-59e29cc03889
        mac: "fa:16:3e:15:ee:f2"
        networks: ["172.16.41.234/26"]
        gateway chassis: [befee082-1beb-48dd-bab7-23fe82ac307a fede01c5-e6d8-4e45-9e01-38edf5f30b6f c0037a40-753d-443c-86f0-5e280235c6db]
    nat 6a84bcb5-afeb-4bfd-afe9-70039595f50c
        external ip: "172.16.41.237"
        logical ip: "192.168.96.170"
        type: "dnat_and_snat"
    nat aff24f1e-e603-4230-9014-552a0cde4404
        external ip: "172.16.41.253"
        logical ip: "192.168.96.185"
        type: "dnat_and_snat"
    nat f0751954-ccca-4c82-8324-eb49f09dea4f
        external ip: "172.16.41.234"
        logical ip: "192.168.96.0/24"
        type: "snat"
I have no name!@openvswitch-ovn-db-0:/$ ovn-nbctl --db tcp:127.0.0.1:6641 --no-leader-only list Gateway_Chassis |grep -A2 -B4 lrp-34ff4b9c-5953-4853-828d-59e29cc03889

_uuid : 6099ee58-8976-439a-8f9a-729d3d389c63
chassis_name : "fede01c5-e6d8-4e45-9e01-38edf5f30b6f"
external_ids : {}
name : lrp-34ff4b9c-5953-4853-828d-59e29cc03889_fede01c5-e6d8-4e45-9e01-38edf5f30b6f
options : {}
priority : 1
--

_uuid : 33849843-7b21-4e30-80d0-7eee9e06c045
chassis_name : "befee082-1beb-48dd-bab7-23fe82ac307a"
external_ids : {}
name : lrp-34ff4b9c-5953-4853-828d-59e29cc03889_befee082-1beb-48dd-bab7-23fe82ac307a
options : {}
priority : 3

_uuid : a887fb7f-6d71-4b2c-b740-bdbb924ce1f7
chassis_name : "c0037a40-753d-443c-86f0-5e280235c6db"
external_ids : {}
name : lrp-34ff4b9c-5953-4853-828d-59e29cc03889_c0037a40-753d-443c-86f0-5e280235c6db
options : {}
priority : 2

cd50dafd-58b9-4564-8d58-6aacdfea27ae (neutron-76b568de-a727-4942-a229-45aa7e4b99fe)
    56b42acf-3839-4f8d-a0db-1dadfe513754 (fede01c5-e6d8-4e45-9e01-38edf5f30b6f)
    priority 32767

    c3afb677-b9ec-4b47-9ae4-9d5bdc667d6b (befee082-1beb-48dd-bab7-23fe82ac307a)
    priority 32765

    f451e0b0-4711-42f6-9341-c36b4cf2e5d6 (c0037a40-753d-443c-86f0-5e280235c6db)
    priority 32766

As you can see here network external ports primary chassis is
56b42acf-3839-4f8d-a0db-1dadfe513754 (fede01c5-e6d8-4e45-9e01-38edf5f30b6f)
    priority 32767

and router port primary is

_uuid : 33849843-7b21-4e30-80d0-7eee9e06c045
chassis_name : "befee082-1beb-48dd-bab7-23fe82ac307a"
external_ids : {}
name : lrp-34ff4b9c-5953-4853-828d-59e29cc03889_befee082-1beb-48dd-bab7-23fe82ac307a
options : {}
priority : 3

As soon we change primary chassis connectivi...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/939961

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/943372

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/939961
Committed: https://opendev.org/openstack/neutron/commit/9b0f62b81a049e990e9dc6f109d525550d4f4548
Submitter: "Zuul (22348)"
Branch: master

commit 9b0f62b81a049e990e9dc6f109d525550d4f4548
Author: Vasyl Saienko <email address hidden>
Date: Thu Jan 23 15:07:50 2025 +0200

    Update OVN installation guide with tunings for VLAN + DVR

    Add steps which are required to be configured in ovn-controller on
    compute hosts to handle VLAN + Distributed Floating IPs use-case
    correctly.

    Related-Bug: #1875852
    Related-Bug: #1995078
    Change-Id: I6c13282135d19085cb8802551e27f67692f4689f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/943372
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/953934

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Michal Nasiadka <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/953934
Reason: Continued in https://review.opendev.org/q/topic:%22bug/2092271%22

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.