explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
LIU Yulong |
Bug Description
We took this bug fix: https:/
The latter is for iptables based firewall.
We have VLAN based networks, and seeing ingress packets destined to local MACs being flooded. We are not seeing any local MACs present under ovs-appctl fdb/show br-int.
Consider following example:
HOST 1:
MAC A = fa:16:3e:c1:01:43
MAC B = fa:16:3e:de:0b:8a
HOST 2:
MAC C = fa:16:3e:d6:3f:31
A is talking to C. Snooping on qvo interface of B, we are seeing all the traffic destined to MAC A (along with other unicast traffic not destined to or sourced from MAC B. Neither Mac A or B are present in br-int FDB, despite sending heavy traffic.
Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:
sudo ovs-appctl ofproto/trace br-int in_port=
Flow: tcp,in_
bridge("br-int")
----------------
0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
goto_table:25
25. in_port=
goto_table:60
60. in_port=
resubmit(,61)
61. in_port=
push_
set_
output:1
bridge("br-ext")
----------------
0. in_port=2, priority 2, cookie 0xab09adf2af892674
goto_table:1
1. priority 0, cookie 0xab09adf2af892674
goto_table:2
2. in_port=
set_
NORMAL
-> forwarding to learned port
bridge("br-vlan")
-----------------
0. priority 1, cookie 0x651552fc69601a2d
goto_table:3
3. priority 1, cookie 0x651552fc69601a2d
NORMAL
-> forwarding to learned port
Final flow: tcp,in_
Megaflow: recirc_
Datapath actions: push_vlan(
Because it took output: action from table=61, added by fix explicitly_
sudo ovs-appctl ofproto/trace br-int in_port=
Flow: in_port=
bridge("br-int")
----------------
0. in_port=
set_
goto_table:60
60. priority 3, cookie 0x9a67096130ac45c2
NORMAL
-> no learned MAC for destination, flooding
bridge(
---
0. in_port=4, priority 2, cookie 0x651552fc69601a2d
1. priority 0, cookie 0x651552fc69601a2d
2. in_port=4, priority 2, cookie 0x651552fc69601a2d
drop
bridge("br-tun")
----------------
0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
goto_table:1
1. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:2
2. dl_dst=
goto_table:20
20. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:22
22. priority 0, cookie 0xf1baf24d000c6f7c
drop
Final flow: in_port=
Megaflow: recirc_
Datapath actions: pop_vlan,
dump-flows br-int indicates it first hits this rule:
cookie=
then at table=60, the only rule it matches is the final NORMAL rule:
cookie=
I tried both attaching, and unattaching the subnet to a DVR router. If I attach to a DVR router, I *DO* see a bunch of table=60 output actions for my local VMs. The problem however, is they appear with the *external VLAN ID*, here is an example:
cookie=
But as we saw, the ingress packet hits that first table=0 mod_vlan_
For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:
cookie=
And because this is a provider network, there are no local DVR mac rules at table=60, so it always hits NORMAL action.
So, how do we cover all bases and ensure we have the fix to prevent egress flooding (https:/
Changed in neutron: | |
importance: | Undecided → High |
tags: | added: ovs |
tags: | removed: l3-dvr-backlog |
summary: |
- explicity_egress_direction prevents learning of local MACs and causes + explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets |
Changed in neutron: | |
assignee: | nobody → LIU Yulong (dragon889) |
status: | New → In Progress |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
Changed in neutron: | |
status: | New → Fix Released |
I think one problem is that the table=60 output:<local port> action adds the rule using the segmentation ID (or the external VLAN). That is why in this case, we do not hit those rules. Instead, we hit the table=60 NORMAL action for ingress packets: https:/ /github. com/openstack/ neutron/ blob/master/ neutron/ plugins/ ml2/drivers/ openvswitch/ agent/ovs_ dvr_neutron_ agent.py# L520
But the fix for 1732067 and 1866445 changes the egress to be explicit, rather than NORMAL - so the local MACs aren't learned.
We would be fine, if on the return (ingress path), we matched one of the explicit table=60 rules, but we don't.
The fix seems to break certain scenarios. How do we reconcile the two? Without the fix, we have flooding in some cases. With the fix, we have flooding in other direction in different case