Boot of multiple nodes causes ARP storm

Bug #1720766 reported by Jakub Libosvar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Jakub Libosvar

Bug Description

Fix for https://bugs.launchpad.net/os-net-config/+bug/1712517 introduce a regression when more nodes are booted at once. In case there are two bridges to physical networks, they both get a patch port to br-int. As the fix for bug 1712517 puts bridges defined with fail mode to behave as learning switches, L2 broadcast packets (e.g. ARP request) are passed from one bridge to another via br-int patch ports. e.g.

physical network 1 -> br-ex -> patch_port -> br-int -> patch_port -> br-isolated -> physical network 2 -> br-isolated (different node) -> patch_port -> br-int -> patch_port -> br-ex -> physical network 1

We can destroy the patch ports between br-int and provider bridges on boot.

Changed in os-net-config:
assignee: nobody → Jakub Libosvar (libosvar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-net-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/508918

Changed in os-net-config:
status: New → In Progress
Revision history for this message
Dan Sneddon (dsneddon) wrote :

My concept for a solution to this issue is to destroy the patch ports between br-int and the various bridges at boot time only. That way we won't have a disruption when the network is restarted or an ifdown/ifup is done for some reason. Ideally we would only remove the patch ports from br-int to the bridges actually used by Neutron.

I'm thinking we could use a systemd service file to do this. The workflow would be:

1) Determine which bridges are used by Neutron (Scan /etc/neutron.conf? Some other method?)

2) For each bridge used by Neutron, remove the patch between br-int and the bridge

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-net-config (master)

Change abandoned by Jakub Libosvar (<email address hidden>) on branch: master
Review: https://review.openstack.org/508918
Reason: We'll go with cleanup script executed during boot time

Revision history for this message
Jakub Libosvar (libosvar) wrote :

This bug will be fixed on packaging level in RDO project. Relevant patch is here: https://review.rdoproject.org/r/#/c/10145/

Revision history for this message
Jakub Libosvar (libosvar) wrote :

I think this bug can be closed as "won't fix"

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

The fix will land in neutron dist-git for RDO. I am not sure we just close the bug as won't fix, or assume it's tripleo. I leave it up to tripleo team to decide.

affects: os-net-config → tripleo
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Changed in tripleo:
importance: Undecided → High
milestone: none → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.