[OVN] VIP port does not come up when its virtual-parents are trunk sub-ports

Bug #2080492 reported by Maximilian Sesterhenn
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned
ovn-bgp-agent
New
Undecided
Unassigned

Bug Description

OpenStack 2024.1
Neutron 25.0.0.0b2.dev189 (grabbed from master branch)
OVN 24.03.3

TLDR: VIP port does not come up when its virtual-parents are trunk sub-ports

Lets say I have two instances in an internal network which form some kind of high availability cluster using a VRRP-like protocol.
Both instances have each one port which has a fixed IP each and then they both have the same VIP configured in Allowed Address Pairs.
That's enough to provide reachability of this internal VIP.

Now we want to make this VIP reachable from the external world using a FIP.
So we created a router with an external gateway and an internal interface.
We create a dummy port in the internal network which has the VIP from the Allowed Address Pairs of the other ports in its Fixed IPs.
This port can then be used as the association of the FIP.

Looking at the OVN NB DB, specifically the Logical_Switch_Port table, there is an entry for each of these ports.
The VIP LSP is not associated to any instance directly, but either Neutron or OVN seem to be able to link this VIP LSP to the other two LSPs of the instances, maybe because the Fixed IP is in their Allowed Address Pairs. Under options, we get virtual-ip and virtual-parents entries.
Virtual-parents are the two LSPs that are directly connected to the instances.

We use DVR, so normally FIPs are exposed where the instance is running.
It seems to dynamically detect where the VIP is active at the moment and update the neutron:host_id accordingly.
Once any of the two LSPs connected directly to the instances is up, the VIP LSP comes up as well.

So far, so good, even the FIP for the VIP is exposed where the instance is running that is active for the VIP.

Things break once the virtual-parent of a VIP LSP is not a LSP directly connected to an instance, instead its a trunk sub-port.
On these VIP LSP objects, there are still virtual-ip and virtual-parent entries, it even has neutron:host_id with the current active host, but the VIP LSP will still be down (up : false).

Yesterday, I saw that traffic was forwarded to the gateway chassis instead of exposed locally, today I can see that without further changes traffic is indeed exposed locally on the compute node.
However, the VIP LSP is still down.

We're using ovn-bgp-agent, and the FIP is only exposed when the LSP is actually up.
The combination of both the behavior of Neutron / OVN and ovn-bgp-agent is what breaks this scenario for us and therefore stops further development.
I think that even with the VIP LSP being down, without the need for the port being up from ovn-bgp-agent, communication would work.

Is this expected?
Shouldn't the VIP LSP come up like it does with regular LSPs even when used with a trunk sub-port?
I dont know enough about where this mechanism is being triggered, is that something in Neutron or in OVN code?

OVN output with virtual-parents directly attached to instances (VIP LSP up):
https://paste.openstack.org/show/bJALt4Bt9j628S8Ve9aH/

OVN output with virtual-parents are trunk sub-ports (VIP LSP down):
https://paste.openstack.org/show/bS4G0mIo4If9rPIoaPYU/

Tags: ovn trunk
tags: added: trunk
Revision history for this message
Maximilian Sesterhenn (msnatepg) wrote :

To isolate the issue even further, we removed ovn-bgp-agent from our setup.

We saw, that once we removed the FIP and assigned IPs from a provider network directly to both directly connected ports and VIP ports, communication was possible while the VIP port was still down.

Therefore I assume that once VIP ports are used together with trunk sub-ports these VIP ports will be down in OVN NB DB which is not an issue on its own (besides in setups which use ovn-bgp-agent afaik), it starts to be an issue once FIPs should be used.
Maybe these are not programmed when the destination port is down.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.