ovs flooding packets, not learning MAC addresses

Bug #1825147 reported by Junien F
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned
neutron (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

Using OpenStack rocky on Ubuntu 18.04, with dvr_snat and L3HA, and using the openvswitch firewall driver. openvswitch version 2.10.0-0ubuntu2~cloud0. Deployed with juju.

I was doing load testing by creating a bunch of instances, and noticed that the network throughput available to instances dropped dramatically as I was creating VMs. In other words, with 2 VMs on my cloud, I had pretty good bandwith, but with 100 (idle) VMs, bandwidth became ridiculously slow.

Investigating the problem, I noticed that ovs was flooding traffic : all instances of an hypervisor were getting all the traffic destined to any VM on another hypervisor.

In other words, I had vmA1 and vmA2 on hypervisor A, and vmB1 on hypervisor B, then TCP traffic between vmA1 and vmB1 could be seen on vmA2.

Digging more into this, I think I located the problem in the ovs MAC learning process, more specifically on br-int (using "sudo ovs-appctl fdb/show br-int").

Traffic flow from vmA1 to vmB1, on hypervisor A, looks like : tap (on br-int), patch-tun (on br-int), patch-int (on br-tun), vxlan to hypervisor B.

So whenever traffic comes back (the other way around), the MAC address of vmB1 should be learned, on br-int, on the patch-tun port - and that is not the case. So whenever vmA1 sends traffic to vmB1, at some point it reaches the "NORMAL" action, and since the destination MAC is not learned, traffic is getting flooded : see ofproto/trace https://pastebin.ubuntu.com/p/mbrrj4wPxY/ (see "no learned MAC for destination, flooding")

Digging more into this, it would appear that ovs learns a MAC address only from broadcast ARP requests, and not from ARP requests with a unicast MAC address (which is what Linux uses after a successful broadcast ARP request) : https://pastebin.ubuntu.com/p/Sfq775cX6V/.

Once the MAC is learned, there's no more flooding : https://pastebin.ubuntu.com/p/bBNHrRKndg/ (see "forwarding to learned port" instead of "no learned MAC for destination, flooding").

Flooding has security consequences (VMs can see traffic not destined to them - although only traffic for VMs in the same neutron network), and performance consequences, so it should be avoided.

Thanks

Revision history for this message
Junien F (axino) wrote :

An additional datapoint : MAC learning appears to be working fine for subnets not attached to a router. As soon as I attach the subnet to a router, the bad behaviour starts.

Revision history for this message
Hongbin Lu (hongbin.lu) wrote :

This sounds like the same issue as https://bugs.launchpad.net/neutron/+bug/1732067 . Could you confirm? If yes, I will mark this one as duplicated.

Revision history for this message
Junien F (axino) wrote :

Hi - it is indeed a duplicate. Marking as such - thanks !

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.