[DVR] ARP chaos in mixed cloud scenarios

Bug #1791268 reported by LIU Yulong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

Supposing the tenant network type is vlan. And we have a neutron network whose vlan id is 1000 (CIDR: 192.168.111.0/24, gateway IP: 192.168.111.1).
We aslo have a physical switch (SWITCH-1), which connect the compute NODE-1, NODE-2.
For these compute nodes, we set the l3 agent_mode to `dvr_no_external`.
And we have network NODE-3, the l3 agent_mode is dvr_snat.

We have one bare metal mechine NODE-4. And set the bare metal mechine swith port with vlan id 1000.
Assuming we have vm-1 in NODE-1 and vm-2 in NODE-2, then the qrouter-namespace will be created in these hosts.
And for the snat traffic, the qrouter-namespace will also be created in network NODE-3.

Then the VMs and the bare metal mechine can connect each other

Then we get something strange for internal gateway ARP, when the bare metal mechine try to arp the internal gateway IP (192.168.111.1) mac.
We get 3 arp response from compute NODE-1, compute NODE-2 and network NODE-3. Because they all have the qrouter-namespace with the same qr-device and same IP(192.168.111.1) and mac.

But the arp responses are not totally same for the physical (SWITCH-1).
NODE-1 will response the src mac with it's own dvr_host mac, and the data segment is the right 192.168.111.1 mac.
NODE-2 and NODE-3 have the same behavior.

This may cause the physical switch to flood the arp request again and again. Since they do not know which physical port (maybe, fdb entry) to located the 192.168.111.1 mac.

So, this bug is try to find a solution about DVR and bare metal (ironic), can they work together now?

LIU Yulong (dragon889)
description: updated
Revision history for this message
LIU Yulong (dragon889) wrote :

I think the VM can arp the internal gateway IP locally, since the ARP request never goes out to the physical world. But in this scenario, the ARP is coming from the outside. So the potential solution maybe:
add some flows or rules to allow only the SNAT node to process internal gateway IP ARP request.

Thanks

tags: added: l3-dvr-backlog
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.