Duplicate entries in FDB table

Bug #1531013 reported by James Denton
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Undecided
Unassigned

Bug Description

Posting here, because I'm not sure of a better place at the moment.

Environment: Juno
OS: Ubuntu 14.04 LTS
Plugin: ML2/LinuxBridge

root@infra01_neutron_agents_container-4c850328:~# bridge -V
bridge utility, 0.0
root@infra01_neutron_agents_container-4c850328:~# ip -V
ip utility, iproute2-ss131122
root@infra01_neutron_agents_container-4c850328:~# uname -a
Linux infra01_neutron_agents_container-4c850328 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

We recently discovered that across the environment (5 controller, 50+ compute) there are (tens of) thousands of duplicate entries in the FDB table, but only for the 00:00:00:00:00:00 broadcast entries. This is in an environment of ~1600 instances, ~4,100 ports, and 80 networks.

In this example, the number of duplicate FDB entries for this particular VTEP jumps wildly:

root@infra01_neutron_agents_container-4c850328:~# bridge fdb show | grep "00:00:00:00:00:00 dev vxlan-10 dst 172.29.243.157" | wc -l
1429
root@infra01_neutron_agents_container-4c850328:~# bridge fdb show | grep "00:00:00:00:00:00 dev vxlan-10 dst 172.29.243.157" | wc -l
81057
root@infra01_neutron_agents_container-4c850328:~# bridge fdb show | grep "00:00:00:00:00:00 dev vxlan-10 dst 172.29.243.157" | wc -l
25806
root@infra01_neutron_agents_container-4c850328:~# bridge fdb show | grep "00:00:00:00:00:00 dev vxlan-10 dst 172.29.243.157" | wc -l
473141
root@infra01_neutron_agents_container-4c850328:~# bridge fdb show | grep "00:00:00:00:00:00 dev vxlan-10 dst 172.29.243.157" | wc -l
225472

That behavior can be observed for all other VTEPs. We're seeing over 13 million total FDB entries on this node:

root@infra01_neutron_agents_container-4c850328:~# bridge fdb show >> james_fdb2.txt
root@infra01_neutron_agents_container-4c850328:~# cat james_fdb2.txt | wc -l
13554258

We're also seeing the wild counts on compute nodes. These were run within 1 second of the previous completion:

root@compute032:~# bridge fdb show | wc -l
898981
root@compute032:~# bridge fdb show | wc -l
734916
root@compute032:~# bridge fdb show | wc -l
1483081
root@compute032:~# bridge fdb show | wc -l
508811
root@compute032:~# bridge fdb show | wc -l
2349221

On this node, you can see over 28,000 duplicates for each of the entries:

root@compute032:~# bridge fdb show | sort | uniq -c | sort -nr
  28871 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.39 self permanent
  28871 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.38 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.252 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.157 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.243.133 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.242.66 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.242.193 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.60 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.59 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.58 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.57 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.55 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.54 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.53 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.51 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.50 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.49 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.48 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.47 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.46 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.45 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.44 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.43 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.42 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.40 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.37 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.36 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.35 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.34 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.33 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.32 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.31 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.30 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.29 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.28 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.27 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.26 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.25 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.24 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.23 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.22 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.21 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.137 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.132 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.131 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.130 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.129 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.128 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.127 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.107 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.106 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.105 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.104 self permanent
  28870 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.103 self permanent
  28869 00:00:00:00:00:00 dev vxlan-15 dst 172.29.240.136 self permanent

All other entries for other VXLAN networks on this node have 2 duplicates per VTEP, but it varies wildly across the environment.

Using the 'bridge monitor fdb' command, I am unable to see this behavior in action. Nor is there anything wild in the syslog other than messages like this:

2016-01-04T22:52:02.040435+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343454.013037] vxlan: non-ECT from 172.29.240.39 with TOS=0x2
2016-01-04T22:52:12.120434+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343464.105158] vxlan: non-ECT from 172.29.240.126 with TOS=0x2
2016-01-04T22:52:12.200251+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343464.185067] vxlan: non-ECT from 172.29.240.104 with TOS=0x2
2016-01-04T22:52:32.295703+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343484.298660] net_ratelimit: 689 callbacks suppressed
2016-01-04T22:52:32.355418+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343484.359395] vxlan: non-ECT from 172.29.240.133 with TOS=0x2
2016-01-04T22:52:37.352455+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343489.358137] vxlan: non-ECT from 172.29.240.60 with TOS=0x2
2016-01-04T22:52:37.494525+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343489.503365] vxlan: non-ECT from 172.29.240.125 with TOS=0x2
2016-01-04T22:52:37.526831+00:00 infra01_neutron_agents_container-4c850328 kernel: [25343489.535736] vxlan: non-ECT from 172.29.240.127 with TOS=0x2

If additional info is needed please let me know.

Revision history for this message
James Denton (james-denton) wrote :

Turned on debug for a brief moment and captured what I could. Last count was > 12 million FDB entries. Linked here is a gist with a subset of the log that I could capture:

https://gist.github.com/busterswt/f80db135400623d92919

Hope it helps. Please let me know if you need any other info.

Assaf Muller (amuller)
tags: added: linuxbridge
Revision history for this message
Sean M. Collins (scollins) wrote :

Going to mark this as confirmed since reporter has given us pretty detailed dumps

Changed in neutron:
status: New → Confirmed
tags: added: needs-attention
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@James. Sorry this has been sitting so long. Is this consistently reproducible on Juno? If so, what can we do to trigger the behavior?

Changed in neutron:
status: Confirmed → Incomplete
Revision history for this message
James Denton (james-denton) wrote :

Hi Kevin,

The issue exists across all nodes in the environment, and continues to be an issue even after rebooting a node. I have not yet come across another environment with similar behavior.

The duplicates appear to be limited to the flooding entries (00:00:00:00:00:00) only and we are unable to replicate manually. The counts fluctuate so much that any attempt to 'append' a duplicate flooding entry (same MAC, interface, and VTEP addr) would go undetected.

In my testing. I have attempted to 'append' duplicate flooding entries on a vxlan interface *not* managed by Neutron, and it resulted in no observable change. Meaning, either the existing entry was overwritten or the command was simply ignored. Seems like expected behavior. I don't really know the conditions that could cause the observed behavior in the bug here, and unfortunately can't point you in any direction for duplicating it just yet. I will continue to dig into it.

Revision history for this message
Sangeetha Srikanth (ssrikant) wrote :

Any updates to this defect?
I see similar symptoms in my deployment which is kilo based?
The bridge fdb entries have several entries for the same host (00:00:00:00:00:00 MAC address)
and bridge process occupies close to 90% of the system resources when linuxbridge agent tries to dump the fdb table.

Revision history for this message
James Denton (james-denton) wrote :

Kevin,

I can replicate this issue to a certain extent by firing up hundreds of instances in dozens of networks across multiple (dozens/hundreds) of compute nodes. In most cases, I see duplicates of the flooding entries (about 2 per network per vtep). In the environment this bug was opened for, I see thousands of duplicates. That number fluctuates wildly though. Last test, there were ~15 million total FDB entries on a single network node (mainly duplicates).

I did try appending to the FDB table manually using an existing vxlan interface and some bogus VTEPs using the following:

for i in {1..10}; do for x in {1..20}; do bridge fdb append 00:00:00:00:00:00 dev vxlan-36 dst 172.29.$i.$x; done; done

That should create about 200 entries. At some point in that loop, the 'bridge fdb show' output would just begin looping forever. YMMV. This may not be a valid test, but it does cause some wonky behavior regardless and has been verified on CentOS and Ubuntu. This 'test' is probably more reproducible by bumping up the numbers from 200 to say, 2000.

Revision history for this message
Sean M. Collins (scollins) wrote :

I'm setting this is "triaged" for the time being since I don't want this to expire and the reporter has been responsive. My suspicion is that there is a similar bug lurking in the OVS agent - since we see something similar to this in https://bugs.launchpad.net/neutron/+bug/1532338 - and the cause may be related.

Changed in neutron:
status: Incomplete → Triaged
Changed in neutron:
assignee: nobody → John Perkins (john-d-perkins)
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

It would be interesting to see if a quick patch to check and skip appending if the entry already exists makes any difference.

Changed in neutron:
assignee: John Perkins (john-d-perkins) → nobody
Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

Attached patch that checks and skips appending flooding entries if they already exist. I haven't tried to reproduce the problem.

Revision history for this message
Dustin Lundquist (dlundquist) wrote :

Ran into this on Kilo w/ 3.13 kernel. So far I've been unable to reproduce this outside of Neutron, and it appears duplicate FDB entries for the same MAC and destination are rejected within the Linux kernel: https://github.com/torvalds/linux/blob/v4.5/drivers/net/vxlan.c#L497.

Revision history for this message
James Denton (james-denton) wrote :

Thanks, Darragh.

One of my concerns about passing curr_entries for an interface is that there may hundreds (or thousands) of legitimate entries. I wonder what that could/would do to processing time. It only appears to be the flooding entries that are duplicated (in most cases). Normal 'append' behavior is to overwrite and not duplicate, which makes me think it is not a Neutron problem but rather a bridge utility issue. Given that it seems fairly easy to overwhelm using the for loop in an earlier comment, it may not be unreasonable to think it's a problem in busy environments.

Revision history for this message
Dustin Lundquist (dlundquist) wrote :

I've been able to reproduce this outside of Neutron on a single node with the following test script: https://gist.github.com/dlundquist/4691dfb16426e973a89b. It doesn't not appear to occur on on the 4.3.0 kernel in the Debian jessie-backports.

Revision history for this message
Daniel (dlevy) wrote :

@Darragh,
I modified your patch a bit so that the function add_fdb_flooding_entry() uses more stringent measures for its comparison, and only adds the entry if both the mac and ip are not found. This fixed the issue for us. Will you be creating a PR for this?

@ James
I don't see any increase in processing time. The fdb table is parsed in both cases. You're correct that this is a kernel bug, but it would still be positive to have this fix in Neutron, as it will be a lengthy process for this to be fixed in the kernel.

Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

@Daniel, I'm still trying to get my head around this. You tried my patch but it didn't work without the extra changes you made?

Are you saying what if the flooding entry is absent for a remote_ip, but a unicast entry exists for that ip, then it is necessary to "append" the flooding entry instead of "add"?

Revision history for this message
Daniel (dlevy) wrote :

@Darragh
The patch as it was did not work.
I believe it had to do with the way you were comparing the variable 'entry'. You defined it like this: '%s dst %s self permanent' wheres entries could take a different form (ours look like this '%s dev vxlan-2 dst %s self permanent'), so I parsed it to extract the comparision parameters (mac and ip) and just compare those.

I was actually thinking that if there is not an exact match it should be added (both mac and ip). I just added line 621 to align with your code and the original code, but not sure if needed.

Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

thankjs Daniel, I understand now. What distro/kernel/iproute2 versions are you using?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/297296

Changed in neutron:
assignee: nobody → Darragh O'Reilly (darragh-oreilly)
status: Triaged → In Progress
Revision history for this message
Daniel (dlevy) wrote :

I found a way to cause this issue while adding only UNIQUE entries to the fdb table.
Running a script like the following and creating a file with a list of at least 100 unqiue IPs will cause the fdb table to grow by 10^6 entries instead of just 100.

PRIVATE_IPS=$(cat private_ip_list.txt)
for IP in $PRIVATE_IPS
do
                               echo Adding FDB entry for IP: $IP
                               bridge fdb append 00:00:00:00:00:00 dev {{VXLAN}} dst $IP self permanent
done

Linux version 3.13.0-74-generic (buildd@lcy01-07) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #118-Ubuntu SMP

Revision history for this message
Daniel (dlevy) wrote :

I wrote a script that can install a virtual machine using vagrant and reproduce the bug. You can find it here:
https://github.com/dlevy-ibm/fdb_bug

Revision history for this message
Darragh O'Reilly (darragh-oreilly) wrote :

Thanks Daniel, I tried your script and it reproduced for me. So skipping adding entries that already exist is not a workaround. I don't know how Neutron can workaround this Linux bug.

Changed in neutron:
assignee: Darragh O'Reilly (darragh-oreilly) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Darragh O'Reilly (<email address hidden>) on branch: master
Review: https://review.openstack.org/297296
Reason: This is not a possible workaround. See the bug report.

Changed in neutron:
status: In Progress → Confirmed
Revision history for this message
Perry (panxia6679) wrote :

Adding more info I identified for comment:

1) When adding the 53rd entry for multicast ethernet frames, it doubles all entries for multicast ethernet frames. After adding the 55th entry, it creates huge number of entries and it won't count the lines as a result. This also means we won't get the problem in environment with nodes <=52.

2) After applying 9063e21fb026c4966fc93261c18322214f9835eb to v3.13, the problem was gone.

root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.50 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l
55
root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.51 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l
56
root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.52 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l
57
root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.53 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l
111
root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.54 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l
112
root@vmtotest:~/fdb_bug# sudo bridge fdb append 00:00:00:00:00:00 dev vxlan1 dst 123.123.123.55 self permanent
root@vmtotest:~/fdb_bug# bridge fdb show | wc -l #hung

Revision history for this message
Dustin Lundquist (dlundquist) wrote :

This is been fixed in the upstream Linux kernel and backported to the Ubuntu Trusty kernel. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1568969

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Since it was fixed in kernel, I moved the bug to Won't Fix for neutron, because no fix for Neutron is expected.

Changed in neutron:
status: Confirmed → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/368993
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=411a1265dae4c8e77d84cdd66a6df702a81a34f8
Submitter: Jenkins
Branch: stable/mitaka

commit 411a1265dae4c8e77d84cdd66a6df702a81a34f8
Author: Dustin Lundquist <email address hidden>
Date: Tue Jun 14 13:34:57 2016 -0700

    ml2 lb: do not program arp responder when unused

    When arp_responder is not set, the proxy flag is not set on the VXLAN
    VTEP interface so no ARP/ND responses are sent. In this (default case)
    it is unnecessary to populate the neighbor table on each VxLAN VTEP
    interface.

    Related-Bug: #1531013
    Change-Id: I0fff2228b5b819829edac0bb6597ecb8e5a036ad
    (cherry picked from commit 57848f7ba789fcefe712b1b89026cf0cdaf03436)

tags: added: in-stable-mitaka
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.