Permant ARP entries not added to DVR qrouter when connected to two Networks

Bug #1913621 reported by Alexandre Perreault
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned

Bug Description

Hi,
I am running openstack ussuri with ovs and DVR routers.

I'm facing a problem with communication between two networks connected to the same router. The issue is caused because there are no permanent ARP entries added to the qrouter when a new instance is created on one of the networks. This means that when traffic reaches the router, it does not know how to reach the destination MAC address of the new instance. Below is an example.

I created two Networks each with its own subnet.
NetworkA/SubnetA: 172.18.18.0/24
NetworkB/SubnetB: 172.19.19.0/24

I created one router and connected both networks to it.
The qrouter has a port with IP 172.18.18.1 and another port with IP 172.19.19.1

Then I created multiple instance on NetworkA which were spawned on different computes.
Here is the ARP table from the DVR router on one of the computes
root@compute004[SRV][PRD001][LAT]:~# ip netns exec qrouter-3fe791ef-8432-41c3-a4ac-28ae741b533f arp -a | grep 18.18
? (172.18.18.2) at fa:16:3e:13:7b:bd [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.78) at fa:16:3e:66:bf:8b [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.27) at fa:16:3e:85:bd:e2 [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.161) at fa:16:3e:43:07:b2 [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.66) at fa:16:3e:85:75:cb [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.3) at fa:16:3e:7b:32:0d [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.21) at fa:16:3e:05:c7:ef [ether] PERM on qr-e68fe2ed-2a
? (172.18.18.4) at fa:16:3e:02:3d:1a [ether] PERM on qr-e68fe2ed-2a

The permanent ARPs exist for DHCP (.2, .3, .4), snat (.27) and 4 instances (.78, .161, .66, .21).
No problem for now.
Then I created an instance on NetworkB. When I check the ARP table, there is no permanent entries for my new instance.
root@compute004[SRV][PRD001][LAT]:~# ip netns exec qrouter-3fe791ef-8432-41c3-a4ac-28ae741b533f arp -a | grep 19.19
? (172.19.19.3) at fa:16:3e:b4:16:3e [ether] PERM on qr-6d2d939d-1e
? (172.19.19.138) at fa:16:3e:fa:f7:f1 [ether] PERM on qr-6d2d939d-1e
? (172.19.19.4) at fa:16:3e:0c:84:53 [ether] PERM on qr-6d2d939d-1e
? (172.19.19.2) at fa:16:3e:e4:44:e3 [ether] PERM on qr-6d2d939d-1e

The only entries are for DHCP (.2, .3, .4) and the SNAT (.138).
My instance IP on NetworkB is 172.19.19.56.

Then I added a new instance but in NetworkA. The instance has IP 172.18.18.230.
This time no permanent ARP entry is added! The original instances ARP entries exist but not for the new instance.

So now, if I add any new instances on either NetworkA or NetworkB, no new permanent ARP entry is added to to the DVR qrouter. It is the same on all computes for which this qrouter exists.
So it seems that as soon as there are instances that exist on both networks connected to the same router, permanent ARP entries cease to be created.

I don't believe this is normal and and it is affecting communication between both networks via the router. Can someone confirm this issue?

tags: added: l3-dvr-backlog
Revision history for this message
LIU Yulong (dragon889) wrote :

Since no response in this bug, add according to the comment [1], we set this as a duplicated bug to #1913646.
And IMO, we have fixed these two bugs with the patch [2].

[1] https://bugs.launchpad.net/neutron/+bug/1913646/comments/6
[2] https://review.opendev.org/c/openstack/neutron/+/773597

Revision history for this message
Alexandre Perreault (alexperreault) wrote :

Hi,

I dont agree that it is a duplicate. What I am saying in this bug is that permanent ARP entries are not created by neutron in my qrouter.
To my knowledge they should be.

In my above example I am showing that there is no permanent ARP entry for my new instance with IP 172.19.19.56.

The issue seems to show up once you have an instance in each subnet. When you add a second or third instance in either subnet, no permanent ARP entries are created by L3_agent inside my router.

Revision history for this message
LIU Yulong (dragon889) wrote :

Yes, I agree with you. The absent permant ARP may be related to dvr related code.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

I am hitting an issue that is related to this and might help you to know about.

I do:
  * ussuri with dvr_snat
  * create port P1 with address A1 and create vm on node C1 with this port
  * associate floating ip with P1 and ping it
  * observe REACHABLE arp entry for A1 in qrouter arp cache
  * so far so good
  * restart the neutron-l3-agent
  * observe REACHABLE arp entry for A1 is now PERMANENT
  * delete vm and port
  * create port P2 with address A1 and create vm on node C1 with this port
  * vm is unreachable since arp cache contains wrong mac address of old port P1

So for me the issue is that if you restart the l3-agent it *does* set arp entries to PERMANENT for bound ports but it never deletes then so if you reuse ip addresses with new ports you will eventually not be able to reach them.

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Alexandre Perreault (alexperreault) wrote :

Hi Edward, thanks for the added comment.
I have not tried your exact work flow to create a port and then assigned it to a vm .

I notice the lack of a permanent ARP entries when there are two networks connected to one router. The intitial instances or ones that existed before have permanent ARP entries in the router but any new instances I create do not.

I did notice that after I restart the neutron services, the missing permanent ARP entries are created on the router. But again, any new instance created after does not get a permanent ARP in the router.

I had this happen to a client this week. He has three networks (netA, netB & netC) connected to one router and all instances from all three networks can communicate between each other but this week the client created a new instance on netA and communication fails between that new instance and anything on netB and netC. This is because no permanent ARP was added to the router during instance creation. All previous instances can still communicate together because the permanent ARP entries exist for them.

Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.