ovn-bgp-agent

[RFE] The EVPN driver does not advertise floating IPs or router (SNAT) addresses

Bug #2017889 reported by Luis Tomas Bolivar on 2023-04-27

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	ovn-bgp-agent	Fix Released	Wishlist	Unassigned

Bug Description

The EVPN driver does not advertise the router's external gateway address nor floating IPs into EVPN. Tenant traffic that undergoes NAT (either router-based SNAT or floating-IP based SNAT+DNAT) will therefore not work, because the VRF in the external network does not have a return route to the IP in question.

This is the routing setup with a simple test setup with the following:
- An external network for allocation of router IPs and floating IPs. This is a VLAN-type provider network, but br-ex is not connected to any physical network device. Subnet prefixes are 87.238.54.0/23 and 2a02:c0:1:99::/64
- A Geneve-based tenant network inside OVN. Subnet prefixes are 10.42.42.0/24 and 2001:db8:42:42::/64
- A VM connected to the tenant network with IPs 10.42.42.123 and 2001:db8:42:42::344
- A floating IP in the external network associated with the VM: 87.238.54.105
- An OpenStack router connecting the external network with the tenant network.
- VRF 4041 which is corresponds to the "internet" VRF in the upstream network.

The router/gateway chassis and the VM are co-located on the same compute node.

The router's dual-stack port on the external network and two single-stack ports on the internal network (one ipv4, one ipv6) have all had the "neutron_bgpvpn:as"="64999", "neutron_bgpvpn:vni"="4041" annotations added to the OVS database.

This is the routing that ovn-bgp-agent created in the VRF is as follows:

[tore@node31-m11-osl4 ~]$ sudo vtysh -c 'show ip route vrf vrf-4041' -c 'show ipv6 route vrf vrf-4041'
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF vrf-4041:
K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 2d18h41m
C>* 10.42.42.123/32 is directly connected, lo-4041, 2d18h40m
K>* 87.238.54.89/32 [0/0] is directly connected, vlan-4041, 2d18h41m
B>* 100.64.0.0/29 [20/0] via 87.238.63.33, br-4041 onlink, weight 1, 00:01:31
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       v - VNC, V - VNC-Direct, F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF vrf-4041:
K>* ::/0 [255/8192] unreachable (ICMP unreachable) (vrf default), 2d18h41m
C>* 2001:db8:42:42::344/128 is directly connected, lo-4041, 2d18h40m
K>* 2a02:c0:1:99::2c4/128 [0/1024] is directly connected, vlan-4041, 2d18h41m
C * fe80::/64 is directly connected, vlan-4041, 2d18h41m
C * fe80::/64 is directly connected, br-4041, 2d18h41m
C>* fe80::/64 is directly connected, lo-4041, 2d18h41m
The IP addresses assigned to the VM (10.42.42.123 and 2001:db8:42:42::344) are being advertised and those routes are visible in the VRF in the upstream network. Great!

However, the router's IP addresses (87.238.54.89/32 and 2a02:c0:1:99::2c4/128) are not. This is due to the fact that they are added as routes, not as addresses. The VM's addresses are being assigned to lo-4041:

[tore@node31-m11-osl4 ~]$ ip address show dev lo-4041 scope global
11: lo-4041: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf-4041 state UNKNOWN group default qlen 1000
    link/ether 2e:c2:53:a6:ca:5f brd ff:ff:ff:ff:ff:ff
    inet 10.42.42.123/32 scope global lo-4041
       valid_lft forever preferred_lft forever
    inet6 2001:db8:42:42::344/128 scope global
       valid_lft forever preferred_lft forever
They get advertised into EVPN due to the redistribute connected stanza in the VRF template added to the FRR configuration.

The router addresses, on the other hand, are instead being added as routes:

[tore@node31-m11-osl4 ~]$ ip route show vrf vrf-4041 dev vlan-4041
10.42.42.0/24 via 87.238.54.89
87.238.54.89 scope link
[tore@node31-m11-osl4 ~]$ ip -6 route show vrf vrf-4041 dev vlan-4041
2001:db8:42:42::/64 via 2a02:c0:1:99::2c4 metric 1024 pref medium
2a02:c0:1:99::2c4 metric 1024 pref medium
This means that they are kernel routes (as opposed to connected routes, cf. the C> and K> prefixes in the FRR route output above), and they are therefore not being advertised.

Furthermore, the floating IP address (87.238.54.105) is nowhere to be seen. It's not added as an address on lo-4041 (like the VM address is), nor as a route to vlan-4041 (like the router address is). So this will obviously not work either.

Now, before you object that it makes no sense to both advertise the internal tenant-net IPv4 address of the VM while at the same time using SNAT and floating IPs, I kind of agree. However, removing the neutron_bgpvpn annotations from the router's IPv4-only tenant network port makes no difference - the floating IP address and the router's external/SNAT address are still not being advertised. It does however remove the advertisement of the VM's IPv4 address (10.42.42.123) as it it is removed from lo-4041, as expected. The route 10.42.42.0/24 via 87.238.54.89 on vlan-4041 also vanishes.

I note that this behaviour differs significantly from that of the BGP driver. With the BGP driver, when using the exact same setup, both the floating IP and the router IPs (both IPv4 and IPv6) are added to the bgp-nic interface (which is then imported into the underlay/default VRF thanks to import vrf bgp-vrf and advertised in BGP there).

Ideally, the EVPN driver would be capable of doing everything the BGP driver did, only that it additionally was VRF-aware so that the addresses in question could be advertised into a VRF instead of into the underlay/default VRF. The BGP driver does of course work very well if the operator does not use EVPN and VRFs (in that case, the default VRF is the internet VRF and there is no underlay), but when the operator uses EVPN, the default VRF is usually the underlay VRF, which most likely does not have any internet/external access whatsoever, so advertising SNAT/floating IPs there makes no sense at all.

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-09:

Is someone already working on that?

We have started to do some refactoring to make the EVPN driver capable of the features that at the moment only the BGP driver has.

If there is ongoing effort we could stop that, otherwise I am more than happy to create a WIP change once we have a POC ready.

Or is the overall intention more to realize this in an upcoming reimplementation of the EVPN driver using the OVN NB DB?

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-10:

Hey Maximilian, please read what I wrote in here for some context [1]

Long story short, nobody is working on this (as far as I know) on the EVPN driver side, and there is no (current) plans to move the EVPN driver to use NB DB. The idea is to have the BGP driver using the NB DB, and also having the option to expose on different VRFs (1 per provider network, instead of one per cr-lrp ports as in the EVPN driver)

Question is, would that be sufficient for your use case? If that is so, perhaps worth to join efforts on the BGP driver (NB DB) to add this support. If not the case, then for sure go ahead with the PoC for the EVPN driver!

In fact, it is even ok to have both (not sure how much work it is though), but if you already have the EVPN modifications more or less ready, nothing block us adding this support to EVPN driver, and also adding it to the BGP driver as use cases can be slightly different I suppose

[1] https://bugs.launchpad.net/ovn-bgp-agent/+bug/2018608/comments/1

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-10:

Hey Luis, thanks!

I will add my answer in this RFE as I think it fits better.

Our work right now is targeting the EVPN driver as it already has logic HOW to announce into EVPN, it misses the WHAT to announce part (more precisely it misses the provider networks). To avoid reinventing the wheel we have used large parts of the BGP driver logic to differentiate between port types. I like the idea of making it modular using different drivers, but I agree that this is kind of redundant.

1)
2) When talking about l2vni, are we talking about EVPN type 2 routes? If so, that would not be a replacement for what the EVPN driver is doing today as it would announce MAC-Routes and not IP-Routes. However, I think it would be a nice feature addition to allow that too, so that someone can choose to either expose a network as L2 or L3. We've unlocked that in the BGPVPN service driver as well. I guess due to limitations in how OVN works we would have to limit the L2 feature to provider networks.
3) Is this about EVPN type 5 routes or "just" different VRFs in BGP? If the latter, we would need some vpn-ipv4/ipv6 address families in the BGP setup which to my knowledge usually requires some kind of dataplane isolation using MPLS or similar.
4)

I will have a look into the NB driver to get a feeling what would be necessary to add 3) (and maybe 2)) into that.

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-10:

Yes, for 2), it is a new addition, not a replacement of the EVPN/VRF driver. This is more to address https://bugs.launchpad.net/ovn-bgp-agent/+bug/2017890. Regarding the provider network limitations, not really, if we add this support to the BGP driver, we could expose tenant IPs (through the cr-lrp), but on the same vni as the provider they are associated to. This is the actual use case/testing on 2017890

For 3), yes, it is for EVPN type 5 (VRF + l3vni). Idea is to have exactly what we have in the EVPN driver, but instead of creating a VNI/VRF per cr-lrp, we would create one per provider network, and then expose both IPs on the provider and (optionally) IPs on the tenant networks associated to that provider on the same VRF/VNI

As for the NB driver, note right now only cares about IPs on the provider network since we require an extra feature in core OVN so that the chassis associated to the cr-lrp can be checked from NB. At the moment that information is only available in the SB. Once that is done we'll add also support for tenant networks (as in the BGP driver that uses the SB)

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-19:

While working on this a minor problem occurred and I think it's best to discuss this first:

I've now reached a phase in development where I'm able to ping instances through an EVPN fabric on a public provider network.

However, the return path (from the VM to internet) is problematic.
As in the BGP driver, br-ex has proxy_arp / proxy_ndp activated to answer the ARP / NDP queries of the instances for their next-hop.
For proxy_arp this works fine as it answers all requests, proxy_ndp however seems to require explicit configuration for each IP.

Now this depends on the configuration on the instances:
If there is a default route which routes the traffic without a next-hop through eth0 of the instance, the instance will send neighbor solicitations for the destination ip.
Its not really rational to add each and every possible target into the proxy_ndp configuration and I wasn't successful finding some catch-all logic yet.

A different approach would be to configure a gateway into that subnet in neutron, that way we would have a known next hop that we could add into the proxy_ndp configuration.
My tests adding that manually were successful.

Unfortunately, we don't have that information in ovn-bgp-agent to my knowledge.
One solution I could think of is to add that information to the external_ids fields in networking-bgpvpn.

I guess thats not a problem today in the EVPN driver because its routed first through the OpenStack router.
I wonder how that works today in the BGP driver, shouldn't it have the same problems?
Maybe it's just too late :)

What do you think? Maybe you have an idea for some kind of catch-all logic in proxy_ndp?

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-22:

I'm not completely sure it is the exactly the same, but in the bgp-driver we not only enable ndp_proxy on br-ex but also configure it (something like ip nei proxy add...)

For the cr-lrp ports the wire_provider_port is called with proxy_cidrs information, which forces the call to [1], which ends up calling this [2]

[1] https://opendev.org/openstack/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/drivers/openstack/utils/wire.py#L186
[2] https://opendev.org/openstack/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/privileged/linux_net.py#L251

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-22:

Hey Tomas, thanks for your reply.

Actually I'm also adding the instance IPs into ndp_proxy.

However, this seems to be not sufficient as we also need NDP resolution for the reply from the instance to the outside world.
So we would need another entry in ndp_proxy for the next-hop of the instance.

I tested that with an instance on a provider network and the BGP driver (not modified) in use, while ARP for IPv4 works fine, NDP for IPv6 does not.

Furthermore, I saw in the code [1] that NDP entries are only created for specific port types, not for VM_VIF ports.

Do we require a special instance configuration to make this work?
If not, maybe you can confirm that behavior in your environment?

[1] https://opendev.org/openstack/ovn-bgp-agent/src/commit/e697e350af158de68fbc4f52e784d8d4a8c922ab/ovn_bgp_agent/drivers/openstack/ovn_bgp_driver.py#L567

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-22:

So you mean the BGP driver does not properly work for VM IPs on the provider network for IPv6? when pinging another VM on the provider network? or when pinging anything external? (I'm pretty sure we have some tests that cover the ping from external to VM on the provider with ipv6, not for the other case though, both VMs on the provider network)

Regarding the code in [1], if memory works that was needed, on the bgp driver, only for the cr-lrp ports (i.e., when accessing the tenant networks (VM) ips through the ovn gateway router, and for amphora VIPs, as that is accessed through the VM port

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-22:

ltomasbo:
So you mean the BGP driver does not properly work for VM IPs on the provider network for IPv6?

msnatepg:
Exactly. In my opinion we need L2 resolution in both directions for any kind of communication.
For incoming packets that's working, but we need the same for outgoing packets.
With IPv4, the ARP proxy solves this as it replies to each request with its own MAC, for IPv6 we have to configure each IP explicitly.

ltomasbo:
I'm pretty sure we have some tests that cover the ping from external to VM on the provider with ipv6

msnatepg:
I wonder from where the ICMPv6 NS/NA messages gets answered in your environment. Can you verify that this is working for you?
It does not work in my lab and I would otherwise propose extending networking-bgpvpn to add the gateway of the network to external_ids so that we can add a proxy configuration for that.

I captured whats happening on my compute node (this is captured using the modified EVPN driver), you can see the packet traversing all the interfaces, but the NS messages are unanswered because the proxy_ndp has no configuration to proxy GW:

listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:24:09.567235 vxlan-12 In IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567248 br-12 In IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567262 veth-vrf-12 Out IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567267 veth-ovs-12 P IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567853 tapaa267af3-b2 Out IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.568361 tapaa267af3-b2 M IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32
13:24:09.568621 veth-ovs-12 Out IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32
13:24:09.568625 veth-vrf-12 In IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32

ltomasbo: 
So you mean the BGP driver does not properly work for VM IPs on the provider network for IPv6?

msnatepg: 
Exactly. In my opinion we need L2 resolution in both directions for any kind of communication. 
For incoming packets that's working, but we need the same for outgoing packets. 
With IPv4, the ARP proxy solves this as it replies to each request with its own MAC, for IPv6 we have to configure each IP explicitly.

ltomasbo:
I'm pretty sure we have some tests that cover the ping from external to VM on the provider with ipv6

listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:24:09.567235 vxlan-12 In  IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567248 br-12 In  IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567262 veth-vrf-12 Out IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567267 veth-ovs-12 P   IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.567853 tapaa267af3-b2 Out IP6 SRC > INSTANCE_IP: ICMP6, echo request, id 5, seq 1, length 64
13:24:09.568361 tapaa267af3-b2 M   IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32
13:24:09.568621 veth-ovs-12 Out IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32
13:24:09.568625 veth-vrf-12 In  IP6 INSTANCE_IP > ff02::1:ff00:1: ICMP6, neighbor solicitation, who has GW, length 32

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-22:

#10

I checked (with the bgp driver) and works for me (to ping an external IP):
12:07:21.855116 tap4ffea1af-c4 M IP6 fe80::f816:3eff:fe6a:d580 > ff02::1:ff00:0: ICMP6, neighbor solicitation, who has 2001:db8::, length 32
12:07:21.855511 tap4bcf2878-b0 Out IP6 fe80::f816:3eff:fe6a:d580 > ff02::1:ff00:0: ICMP6, neighbor solicitation, who has 2001:db8::, length 32
12:07:21.855525 br-ex In IP6 fe80::f816:3eff:fe6a:d580 > ff02::1:ff00:0: ICMP6, neighbor solicitation, who has 2001:db8::, length 32
12:07:21.864405 tap4ffea1af-c4 P IP6 2001:db8::f816:3eff:fe6a:d580 > f00d:f00d:f00d:4:5054:ff:fe37:d55a: ICMP6, echo request, id 1, seq 1, length 64
12:07:21.864748 br-ex In IP6 2001:db8::f816:3eff:fe6a:d580 > f00d:f00d:f00d:4:5054:ff:fe37:d55a: ICMP6, echo request, id 1, seq 1, length 64
12:07:21.864767 enp3s0 Out IP6 2001:db8::f816:3eff:fe6a:d580 > f00d:f00d:f00d:4:5054:ff:fe37:d55a: ICMP6, echo request, id 1, seq 1, length 64
12:07:21.865370 enp3s0 In IP6 f00d:f00d:f00d:4:5054:ff:fe37:d55a > 2001:db8::f816:3eff:fe6a:d580: ICMP6, echo reply, id 1, seq 1, length 64
12:07:21.865380 br-ex Out IP6 f00d:f00d:f00d:4:5054:ff:fe37:d55a > 2001:db8::f816:3eff:fe6a:d580: ICMP6, echo reply, id 1, seq 1, length 64
12:07:21.865570 tap4ffea1af-c4 Out IP6 f00d:f00d:f00d:4:5054:ff:fe37:d55a > 2001:db8::f816:3eff:fe6a:d580: ICMP6, echo reply, id 1, seq 1, length 64

I have a (fake) ip added to br-ex though:
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 2a:d2:6f:5b:3b:4a brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.1/32 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fd53:d91e:400:7f17::1/128 scope global
       valid_lft forever preferred_lft forever

And with this kernel flag enabled:
sudo sysctl -a | grep proxy_ndp | grep br-ex
net.ipv6.conf.br-ex.proxy_ndp = 1

How did you test it with the bgp driver?

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-22:

#11

That's interesting.

In your packet capture I cannot see any NA messages following the NS messages.
Which routes are configured in the instance itself?
Do you know from where your instances gets their L2 resolution?
Can you share your proxy_ndp configuration?

In my scenario proxy_arp/ndp is enabled:

net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv4.conf.br-ex.proxy_arp = 1
net.ipv6.conf.br-ex.proxy_ndp = 1

br-ex has addresses assigned:

14: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7e:e8:52:8b:44:4b brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.1/32 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fd53:d91e:400:7f17::1/128 scope global
       valid_lft forever preferred_lft forever

To test with the BGP driver I had reconfigured a single node, I will now try to reconfigure the whole cluster. Should not make any difference, but who knows...

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-22:

#12

I have the same sysctl configuration.

Perhaps you are missing the ovs flow in br-ex that changes the MAC in the outgoing traffic by the one of br-ex, so that br-ex can pick it up?

$ sudo ovs-ofctl dump-flows br-ex
cookie=0x3e7, duration=17401.889s, table=0, n_packets=299, n_bytes=28284, priority=900,ip,in_port="patch-provnet-0" actions=mod_dl_dst:2a:d2:6f:5b:3b:4a,NORMAL
cookie=0x3e7, duration=17401.879s, table=0, n_packets=500, n_bytes=50380, priority=900,ipv6,in_port="patch-provnet-0" actions=mod_dl_dst:2a:d2:6f:5b:3b:4a,NORMAL
cookie=0x0, duration=21009.799s, table=0, n_packets=901, n_bytes=176415, priority=0 actions=NORMAL

Revision history for this message

Maximilian Sesterhenn (msnatepg) wrote on 2023-05-22:

#13

I checked the flows, they are there:
# ovs-ofctl dump-flows br-ex
cookie=0x3e6, duration=3539.673s, table=0, n_packets=66, n_bytes=5676, priority=1000,ipv6,in_port="patch-provnet-f",dl_src=fa:16:3e:8b:50:7b,ipv6_src=SRCv6 actions=mod_dl_dst:0a:7a:cc:07:f5:bf,output:"veth-ovs-12"
cookie=0x3e6, duration=3539.480s, table=0, n_packets=2909, n_bytes=248847, priority=1000,ip,in_port="patch-provnet-f",dl_src=fa:16:3e:8b:50:7b,nw_src=SRCv4 actions=mod_dl_dst:0a:7a:cc:07:f5:bf,output:"veth-ovs-12"
cookie=0x0, duration=227838.466s, table=0, n_packets=11347, n_bytes=11704353, priority=0 actions=NORMAL

My packet capture above makes the issue more obviously:
My instance has no L2 resolution (no MAC for the IPv6 gateway) and all ICMP NS attempts are failing. That makes sense as proxy_ndp only works if it was configured beforehand, which is not the case for the IPv6 gateway.

Maybe you could check using tcpdump -ni any INSTANCE_IPv6_ADDRESS from where the NA messages are coming?
You would have to clear the cache once the tcpdump is running: ip -s -s neigh flush all

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-23:

#14

Download full text (6.9 KiB)

This is what I get on the node where the VM is:
# ip -s -s neigh flush all
100.64.0.9 dev enp3s0 lladdr 52:54:00:b9:29:4d ref 1 used 95/0/95probes 4 REACHABLE
100.65.3.9 dev enp2s0 lladdr 52:54:00:00:c0:8d ref 1 used 95/0/95probes 4 REACHABLE
2001:db8::f816:3eff:fe9c:17b7 dev br-ex lladdr fa:16:3e:9c:17:b7 used 89/89/49probes 4 STALE
fe80::5054:ff:fe00:c08d dev enp2s0 lladdr 52:54:00:00:c0:8d router used 89/89/61probes 4 STALE
fe80::5054:ff:feb9:294d dev enp3s0 lladdr 52:54:00:b9:29:4d router used 88/148/88probes 0 STALE
fe80::f816:3eff:fe9c:17b7 dev br-ex used 76/143/75probes 3 FAILED

*** Round 1, deleting 6 entries ***
*** Flush is complete after 1 round ***

This is what I get on the node where the VM is:
# ip -s -s neigh flush all                                                                                                                                                                                                       
100.64.0.9 dev enp3s0 lladdr 52:54:00:b9:29:4d  ref 1 used 95/0/95probes 4 REACHABLE
100.65.3.9 dev enp2s0 lladdr 52:54:00:00:c0:8d  ref 1 used 95/0/95probes 4 REACHABLE
2001:db8::f816:3eff:fe9c:17b7 dev br-ex lladdr fa:16:3e:9c:17:b7  used 89/89/49probes 4 STALE
fe80::5054:ff:fe00:c08d dev enp2s0 lladdr 52:54:00:00:c0:8d router  used 89/89/61probes 4 STALE
fe80::5054:ff:feb9:294d dev enp3s0 lladdr 52:54:00:b9:29:4d router  used 88/148/88probes 0 STALE
fe80::f816:3eff:fe9c:17b7 dev br-ex  used 76/143/75probes 3 FAILED

*** Round 1, deleting 6 entries ***
*** Flush is complete after 1 round ***

# tcpdump -ni any host 2001:db8::f816:3eff:fe9c:17b7 or host f00d:f00d:f00d:4:5054:ff:fea5:f6e8 (first IP is the VM IP, second the destination VM)
tcpdump: data link type LINUX_SLL2
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
05:37:08.142009 tapbae685a6-66 M   IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:37:08.142578 tapbae685a6-66 Out IP6 fe80::f816:3eff:fe3c:b5f8 > 2001:db8::f816:3eff:fe9c:17b7: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe3c:b5f8, length 32                                                                                 
05:37:08.142664 tapbae685a6-66 P   IP6 2001:db8::f816:3eff:fe9c:17b7 > f00d:f00d:f00d:4:5054:ff:fea5:f6e8: ICMP6, echo request, id 3, seq 1, length 64                                                                                                       
05:37:08.142910 br-ex In  IP6 2001:db8::f816:3eff:fe9c:17b7 > f00d:f00d:f00d:4:5054:ff:fea5:f6e8: ICMP6, echo request, id 3, seq 1, length 64                                                                                                                
05:37:08.142933 enp2s0 Out IP6 2001:db8::f816:3eff:fe9c:17b7 > f00d:f00d:f00d:4:5054:ff:fea5:f6e8: ICMP6, echo request, id 3, seq 1, length 64                                                                                                               
05:37:08.143535 enp3s0 In  IP6 f00d:f00d:f00d:4:5054:ff:fea5:f6e8 > 2001:db8::f816:3eff:fe9c:17b7: ICMP6, echo reply, id 3, seq 1, length 64                                                                                                                 
05:37:08.143854 tapbae685a6-66 P   IP6 2001:db8::f816:3eff:fe9c:17b7 > fe80::2c21:43ff:fe1e:244e: ICMP6, neighbor advertisement, tgt is 2001:db8::f816:3eff:fe9c:17b7, length 32                                                                             
05:37:08.143959 br-ex In  IP6 2001:db8::f816:3eff:fe9c:17b7 > fe80::2c21:43ff:fe1e:244e: ICMP6, neighbor advertisement, tgt is 2001:db8::f816:3eff:fe9c:17b7, length 32                                                                                      
05:37:08.143975 br-ex Out IP6 f00d:f00d:f00d:4:5054:ff:fea5:f6e8 > 2001:db8::f816:3eff:fe9c:17b7: ICMP6, echo reply, id 3, seq 1, length 64                                                                                                                  
05:37:08.144122 tapbae685a6-66 Out IP6 f00d:f00d:f00d:4:5054:ff:fea5:f6e8 > 2001:db8::f816:3eff:fe9c:17b7: ICMP6, echo reply, id 3, seq 1, length 64

Then, inside the VM, with ip nei I see the next:
fe80::2c21:43ff:fe1e:244e dev eth0 lladdr 2e:21:43:1e:24:4e router STALE
fe80::f816:3eff:fe3c:b5f8 dev eth0 lladdr fa:16:3e:3c:b5:f8 router STALE

First one is the ll for the br-ex, and its mac, which is correct:
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 2e:21:43:1e:24:4e brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.1/32 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fd53:d91e:400:7f17::1/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::2c21:43ff:fe1e:244e/64 scope link

And the second one is the ovn router gateway port, which makes me realise perhaps that is the difference in your setup. In OVN, with slaac mode (the one I'm using, slaac/slaac), is the router the one replying to that, so perhaps it is working for me because I have a router connected to that provider network.

When I remove the router this is what I get:
05:51:35.910334 tapbae685a6-66 M   IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:35.910533 tap775d5de2-70 Out IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:35.910545 br-ex In  IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                                  
05:51:36.965778 tapbae685a6-66 M   IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:36.965793 tap775d5de2-70 Out IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:36.965810 br-ex In  IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                                  
05:51:37.989829 tapbae685a6-66 M   IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:37.989844 tap775d5de2-70 Out IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32                                                                                         
05:51:37.989868 br-ex In  IP6 2001:db8::f816:3eff:fe9c:17b7 > ff02::1:ff3c:b5f8: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe3c:b5f8, length 32

And inside the VM:
[root@vm-provider ~]# ip nei
fe80::2c21:43ff:fe1e:244e dev eth0 lladdr 2e:21:43:1e:24:4e router REACHABLE
fe80::f816:3eff:fe3c:b5f8 dev eth0 FAILED

Can you confirm it works for you just by adding a router?

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-23:

#15

I've created https://bugs.launchpad.net/ovn-bgp-agent/+bug/2020410 for addressing this

Revision history for this message

Luis Tomas Bolivar (ltomasbo) wrote on 2023-05-23:

#16

Download full text (5.6 KiB)

Another interesting side effect, if you add on the host an ndp proxy:
$ ip nei show proxy
2001:db8:: dev br-ex proxy

The first ping fails:
[root@vm-provider ~]# ping f00d:f00d:f00d:4:5054:ff:fea5:f6e8 -c1
PING f00d:f00d:f00d:4:5054:ff:fea5:f6e8(f00d:f00d:f00d:4:5054:ff:fea5:f6e8) 56 data bytes
From 2001:db8::f816:3eff:fe9c:17b7 icmp_seq=1 Destination unreachable: Address unreachable
--- f00d:f00d:f00d:4:5054:ff:fea5:f6e8 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

[root@vm-provider ~]# ip nei
fe80::2c21:43ff:fe1e:244e dev eth0 lladdr 2e:21:43:1e:24:4e router REACHABLE
fe80::f816:3eff:fe3c:b5f8 dev eth0 FAILED

But the second succeeds:
[root@vm-provider ~]# ping f00d:f00d:f00d:4:5054:ff:fea5:f6e8 -c1
PING f00d:f00d:f00d:4:5054:ff:fea5:f6e8(f00d:f00d:f00d:4:5054:ff:fea5:f6e8) 56 data bytes
64 bytes from f00d:f00d:f00d:4:5054:ff:fea5:f6e8: icmp_seq=1 ttl=60 time=2.04 ms
--- f00d:f00d:f00d:4:5054:ff:fea5:f6e8 ping st...

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Changed in ovn-bgp-agent:
importance:	Undecided → Wishlist