DVR router ARP traffic broken for networks containing multiple subnets

Bug #1913646 reported by Alexandre Perreault
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
LIU Yulong

Bug Description

Hi,
I am running openstack ussuri with ovs and DVR routers.

When there are multiple subnets in one network, Neutron does not consider the possibility that the subnets could be connected to different routers. This is a problem when a DVR router is expecting to receive an ARP reply. In OVS br-int, table 3 contains only one rule per network which applies to traffic destined to the DVR MAC. This rule translates the DVR MAC to the MAC of the newest router on the network but does not take into consideration that a network could have multiple subnets connected to different routers.

The use case where I am facing this issue is with manila. Manila defines one network object in the service project but each time a user creates a new "Share Network", the Manila service creates a new subnet within the network. So you can end up with many subnets and routers within a network.

It is a bit confusing so below are more details taking my use case with manila as an example.
Manila has a network called manila_service_network
In manila.conf a CIDR and mask is configured and the subnets that will be created by manila service are configured within the CIDR defined and using the mask that is defined.

On a user project I create NetworkX/SubnetX (172.20.20.0/24) and connect it to routerX. I also have instanceX on this network.
Then I create a Share network. This creates a subnet within network manila_service_network with IP 10.128.16.0/20.
Manila_subnet1 (10.128.16.0/20) is connected to routerX which is already connected to SubnetX (172.20.20.0/24).
A ShareInstance is created on the manila_subnet1 and has IP 10.128.19.189.
It is important that the ShareInstance and InstanceX be located on different computes.

Now communication between InstanceX and the ShareInstance should work but it does not. Here's why.

InstanceX wants to communicate to the ShareInstance so it sends a packet to it's gateway RouterX.
RouterX needs to route the packet to the ShareInstance but it does not have the MAC address in its ARP table.
Router X sends an ARP request -> ARP, Request who-has 10.128.19.189 tell 10.128.16.1, length 28
RouterX never receives an ARP reply.
I followed the flows in br-int and br-tun.

Since traffic is coming from a DVR router, OVS br-tun changes the router's source MAC to the computes's DVR MAC. fa:16:3e:80:4c:3a is the MAC of the router with IP 10.128.16.1
 cookie=0x7027c9402a453a34, duration=411942.542s, table=1, n_packets=434, n_bytes=33812, idle_age=589, hard_age=65534, priority=1,dl_vlan=31,dl_src=fa:16:3e:80:4c:3a actions=mod_dl_src:fa:16:3f:67:83:30,resubmit(,2)

Then the packet reaches the ARP responder table 21. There is an entry in table 21 for the ShareInstance MAC so it modifies the packet and sends it back to br-int.
cookie=0x7027c9402a453a34, duration=11769.612s, table=21, n_packets=23, n_bytes=966, idle_age=2, priority=1,arp,dl_vlan=31,arp_tpa=10.128.19.189 actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163ee3273f->NXM_NX_ARP_SHA[],load:0xa8013bd->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:e3:27:3f,IN_PORT

But remember that the router source MAC became the DVR MAC and, because of table 21, it is now the destination MAC.

This br-int table 1 rules sends us to table 3 because of the destination MAC being the DVR MAC.
cookie=0xe728ac45412eb352, duration=4676042.487s, table=0, n_packets=10695122, n_bytes=449195124, idle_age=0, hard_age=65534, priority=5,in_port=2,dl_dst=fa:16:3f:67:83:30 actions=resubmit(,3)

In table 3 there is a rule that changes the destination MAC from the DVR MAC to the router MAC based on the VLAN (network). In our case vlan 31.
cookie=0xe728ac45412eb352, duration=23642.517s, table=3, n_packets=10626537, n_bytes=446314554, idle_age=0, priority=5,dl_vlan=31,dl_dst=fa:16:3f:67:83:30 actions=mod_dl_dst:fa:16:3e:4d:d0:f9,strip_vlan,output:725

You can see that the mod_dl_dst MAC (fa:16:3e:4d:d0:f9) is not the original source MAC of my router (fa:16:3e:80:4c:3a).
Why?
Because there are multiple subnets in the network manila_service_network, each connected to a different router.
fa:16:3e:4d:d0:f9 belongs to a router connected to Manila_subnet2 (10.128.48.0/20) which is within manila_service_network.
This means the ARP reply is sent to the wrong router.
All the subnets in manila_service_network use vlan 31 so, by having one rule in table 3 for vlan 31, causes all traffic to be sent to one router (usually the newest).

In my use case there are 6 subnets in manila_service_network and the ARP replies for all 6 subnets go to the same router (usually the newest). This means 5 subnets out of 6 are broken.

You dont need to use manila to recreate the problem.
You need networkA with subnetA and network1 with subnet1, subnet2, subnet3, etc...
Connect subnetA and subnet1 to the same router and create a couple instances on subnetA and subnet1 (they need to be on different computes).
Then connect subnet2 to a new router and subnet3 to another new router.
You should see that the instances on subnetA and subnet1 wont be able to communicate as the traffic will stop on the router.

Table 3 is a fairly new addition to neutron/ovs so I believe that the possibility of having multiple subnets in one network was not considered when writing the code.
I think it is related to this commit
https://review.opendev.org/c/openstack/neutron/+/651905/12/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py

Though it is rare to have multiple subnets in one network, the manila service uses this architecture, making it important that it works correctly.

Revision history for this message
LIU Yulong (dragon889) wrote :

Cloud not reproduce this on Rocky deployment, maybe there is an aggression on the ussuri.

Here is my test toplogy:
Router1 is connect to subnet-1(from network-1), Router-1 is also connected to subnet-2 (from network-2).
VM-1 from network-1 on host-1 can ping VM-2 from network-2 on host-2.

This should be same to your case "instanceX from NetworkX/SubnetX " and "ShareInstance from manila_service_network/Manila_subnet1", right?

Revision history for this message
LIU Yulong (dragon889) wrote :

Hi, please have a try to revert this patch locally:
https://review.opendev.org/c/openstack/neutron/+/651905
to see if it can solve your problem.

But this is still a bit strange to me, since I've add the east-west traffic test for the upstream neutron.
https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/659896/11/neutron_tempest_plugin/scenario/test_connectivity.py#114
If that patch have the issue, we should notice that.

Revision history for this message
Alexandre Perreault (alexperreault) wrote :

Hi,

Thanks for the comments.

Actually the use case is slightly different. The initial setup is fine but then you need to add a second subnet to network-2 (we can call it subnet-2b) and connect subnet-2b to a different router (router-2). This is when my connectivity problem between VM-1 and VM-2 start.

But two other things must be true as well.
- VM-1 and VM-2 have to be on separate compute servers
- Router-1 namespace must not have a permanent ARP entry for VM-2 in it's ARP table. If the qrouter namespace has a permanent entry, then the qrouter does not need to send an ARP request and will send traffic correctly.

Actually, this issue would not occur if the qrouter namespace had permanent ARP entries for instances. I think is another problem I faced.
https://bugs.launchpad.net/neutron/+bug/1913621

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Is this bug a duplicate of https://bugs.launchpad.net/neutron/+bug/1913621?

Regards.

Revision history for this message
Alexandre Perreault (alexperreault) wrote :

Hi Rodolfo,

No I consider them separate issues.
https://bugs.launchpad.net/neutron/+bug/1913621? is about permanent ARP entries in DVR qrouter ARP table. I deal more on the operations side and less with code but from what I can tell each time a new instance is created it seems like an ARP entry should be added to the qrouter. I noticed that this stops happening if two networks are connected to the same router. I'm not sure if this is normal. Anyways that's to discuss further in https://bugs.launchpad.net/neutron/+bug/1913621?

Here (1913646) is about when there are multiple subnets in ONE network and those multiple subnets have gateways on different routers. This has an affect on learning MACs dynamically (ARP). I was just mentioning that I probably would not have discovered this if the other was not a problem. I will try to find in the code where I think the issue is.

regards,

Revision history for this message
LIU Yulong (dragon889) wrote :

OK, I see the bug. The behavior is similar to this bug:
https://bugs.launchpad.net/neutron/+bug/1859638
IMO, this is a known issue.

I have some comments here[1], allow me to quote:
"""
ARP request from qr-device (router internal gateway) go out compute node, the source MAC address of ethernet will be changed to dvr_host_mac. So this responder will finally have an ARP reply whose ethernet destination MAC is source compute node dvr_host_mac . So when this reply go back to qr-device, it will not match the MAC address.
"""

So, after manually remove the PERMANENT mac entry in qrouter-namespace:

sudo ip netns exec qrouter-b247f145-569a-4d5a-bdd8-31a5213641ea ip neigh del 192.168.222.31 dev qr-5f571f90-3d

And manually arping this IP address, it is not reachable:
sudo ip netns exec qrouter-b247f145-569a-4d5a-bdd8-31a5213641ea arping -I qr-5f571f90-3d 192.168.222.31

This case is in the same host. The bug problem is same. The return ARP's destination MAC address is the dvr_host_mac, not the qrouter's internal gateway MAC.

So basically, IMO, the bug 1913621 is duplicated to this bug 1913646. And even more with bug 1859638, maybe we can merge these 3 bugs.

But remember, the PERMANENT ARP entries are intentionally added for DVR, see related code [2][3].

Thank you for reporting this.

[1] https://review.opendev.org/c/openstack/neutron/+/601336/42/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_tun.py#289
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L247
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L276

Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
LIU Yulong (dragon889)
Changed in neutron:
assignee: nobody → LIU Yulong (dragon889)
Revision history for this message
LIU Yulong (dragon889) wrote :

I've uploaded an old local patch for this, please have a try to see if it solve your problem.
https://review.opendev.org/c/openstack/neutron/+/773597

My test step is:
1. manually remove the PERMANENT mac entry in qrouter-namespace:
2. and arping in namespace, or ping east-west traffic inside the VM
3. an REACHABLE arp entry will be added
192.168.222.31 dev qr-5f571f90-3d lladdr fa:16:3e:ea:8a:d4 REACHABLE
4. ping east-west traffic inside the VM is reachable

Revision history for this message
Alexandre Perreault (alexperreault) wrote :

Hi,
I apologize for lack of feedback this week, my focus was pulled elsewhere.
I will try to respond to everything that's been said.

It is similar to https://bugs.launchpad.net/neutron/+bug/1859638 and
https://bugs.launchpad.net/neutron/+bug/1774459

but it seems like it was attempted to be fixed with:
https://review.opendev.org/c/openstack/neutron/+/651905/12/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py

This adds table 3 in br-int which replaces the ethernet destinatation MAC from the dvr_host_mac to the router MAC.
My issue is that only one rule is created per network even if the network has multiple subnets and routers. The table 3 rule is overwritten by the newest router.

IMPORTANT: You say "But remember, the PERMANENT ARP entries are intentionally added for DVR, see related code [2][3]." BUT that's my problem in 1913621, there are no permanent ARP entries! There should be but they are not created. In your example, you deliberately delete the permanent ARP. In my case, they do not exist.

This is why I dont consider bug 1913646 and 1923621 to be duplicates.
1923621 -> permanent ARP entries are not created.
1913646 -> if there are no permanent ARP entries (like VIPs), dynamic ARP fails when multiple routers in a network.

My original idea was to change the match line in the function _arp_dvr_dst_mac_match(ofp, ofpp, vlan, dvr_mac) in
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py
I was thinking of adding a third match criteria which would match on arp_tha=gateway_mac.
Right now it is getting overwritten because it is matching only on VLAN and dvr_mac which means each time there is a new subnet with a router, the rule gets overwritten.
By adding arp_tha=gateway_mac in the match statement we would make sure that each qrouter would have it's own rule.

I will look at your patch. At first glance it looks like a different way to resolve the problem which is interesting.

Revision history for this message
Alexandre Perreault (alexperreault) wrote :

I have tested the patch and I have added it to a couple test environments. At the moment it seems to resolve the dynamic ARP problem. I'm still doing some testing over the course of this week to make sure.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.0.0.0rc1

This issue was fixed in the openstack/neutron 18.0.0.0rc1 release candidate.

Changed in neutron:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.