Comment 5 for bug 2020410

Revision history for this message
Tore Anderson (toreanderson) wrote :

Hi, ltomasbo asked I shared some thoughts here.

So the way I see it, problems such as these are a result of doing things in a rather non-standard way to begin with, rather than implementing EVPN support in a more normal way.

The fundamental problem is that ovn-bgp-agent does not implement L2VNI support (https://bugs.launchpad.net/ovn-bgp-agent/+bug/2017890). I'll try to explain:

Because of the lack of L2VNI support, there is no L2 connectivity between nodes on the same provider net residing on different hypervisors. However the nodes (be it routers/cr-lrps or VMs) residing on the provider networks certainly do have an expectation there should be L2 connectivity - to them, it's just a regular VLAN, after all.

So instead, ovn-bgp-agent ends up having to resort to various dirty tricks and hacks (like ARP/ND proxy), all in order to mask the lack of L2 connectivity and make it work somehow. Unfortunately, as often happens when relying on hacks like these, some kind of functionality is catered to correctly, requiring more hacks and tricks, and so on. Even if you make it all work somehow, the result gets really complicated and hard to debug.

So what I've proposed is a more fundamental rethink of how it all fits together. In a nutshell: get rid of all the the hacks, replace them with regular L2VNIs. This restores L2 connectivity on provider networks between hypervisors (and also between hypervisors and devices external to OpenStack), allowing ARP/ND to work normally. That means no more need for proxy ARP/ND, ip rules, static host routes, or whatever else black magic ovn-evpn-agent has needed to do before.

I made a demo/lab for luis5tb that shows how it could be done, described in more detail here: https://drive.redpill-linpro.com/s/xs3WpLQmPTNAMMa

If you're interested in taking a look at the lab, msnatepg, just send me an SSH pubkey and I'll add it to the authorized_keys files on the nodes.