Boot of multiple nodes causes ARP storm

Bug #1720766 reported by Jakub Libosvar on 2017-10-02
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Jakub Libosvar

Bug Description

Fix for introduce a regression when more nodes are booted at once. In case there are two bridges to physical networks, they both get a patch port to br-int. As the fix for bug 1712517 puts bridges defined with fail mode to behave as learning switches, L2 broadcast packets (e.g. ARP request) are passed from one bridge to another via br-int patch ports. e.g.

physical network 1 -> br-ex -> patch_port -> br-int -> patch_port -> br-isolated -> physical network 2 -> br-isolated (different node) -> patch_port -> br-int -> patch_port -> br-ex -> physical network 1

We can destroy the patch ports between br-int and provider bridges on boot.

Changed in os-net-config:
assignee: nobody → Jakub Libosvar (libosvar)

Fix proposed to branch: master

Changed in os-net-config:
status: New → In Progress
Dan Sneddon (dsneddon) wrote :

My concept for a solution to this issue is to destroy the patch ports between br-int and the various bridges at boot time only. That way we won't have a disruption when the network is restarted or an ifdown/ifup is done for some reason. Ideally we would only remove the patch ports from br-int to the bridges actually used by Neutron.

I'm thinking we could use a systemd service file to do this. The workflow would be:

1) Determine which bridges are used by Neutron (Scan /etc/neutron.conf? Some other method?)

2) For each bridge used by Neutron, remove the patch between br-int and the bridge

Change abandoned by Jakub Libosvar (<email address hidden>) on branch: master
Reason: We'll go with cleanup script executed during boot time

Jakub Libosvar (libosvar) wrote :

This bug will be fixed on packaging level in RDO project. Relevant patch is here:

Jakub Libosvar (libosvar) wrote :

I think this bug can be closed as "won't fix"

The fix will land in neutron dist-git for RDO. I am not sure we just close the bug as won't fix, or assume it's tripleo. I leave it up to tripleo team to decide.

affects: os-net-config → tripleo
Changed in tripleo:
importance: Undecided → High
milestone: none → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers