linuxbridge packet forwarding issue with vlan backed networks

Bug #1849463 reported by Daniel 'f0o' Preussker on 2019-10-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Unassigned

Bug Description

This is related to: https://bugs.launchpad.net/os-vif/+bug/1837252

In Ubuntu 18.04 using Ubuntu Cloud Archives (UCA) and Stein os-vif version 1.15.1 is deployed.

According to the bug #1837252/OSSA-2019-004/CVE-2019-15753 this version is vulnerable to unicast packet broadcasting to all bridge members resulting in traffic interception due to disabled mac-learning (ageing set to 0). The fix is to set ageing to the default of 300.

With this vulnerable set up instances using vlan-backed networks have working traffic flows as expected since all packets are being distributed to all members.

The FDB entries show:
# bridge fdb | grep -e tapb2b8c5ff-8c -e brqa50c5b7b-db -e ens256.3002 | grep -v -e ^01:00:5e -e ^33:33
00:16:3e:ba:fa:33 dev ens256.3002 vlan 1 master brqa50c5b7b-db permanent
00:16:3e:ba:fa:33 dev ens256.3002 master brqa50c5b7b-db permanent
fe:16:3e:0d:c0:42 dev tapb2b8c5ff-8c vlan 1 master brqa50c5b7b-db permanent
fe:16:3e:0d:c0:42 dev tapb2b8c5ff-8c master brqa50c5b7b-db permanent

Showmacs confirm:
# brctl showmacs brqa50c5b7b-db
port no mac addr is local? ageing timer
  2 00:16:3e:ba:fa:33 yes 0.00
  2 00:16:3e:ba:fa:33 yes 0.00
  1 fe:16:3e:0d:c0:42 yes 0.00
  1 fe:16:3e:0d:c0:42 yes 0.00

However, once ageing is enabled by either `brctl setageing brqa50c5b7b-db 300` or upgrading to UCA/Train with os-vif 1.17.0 traffic flows directed towards tapb2b8c5ff-8c are not being forwarded.

Traffic coming from tapb2b8c5ff-8c is being forwarded correctly through the bridge and exits ens236.3002.

Only incoming traffic destined for tapb2b8c5ff-8c' MAC is being dropped or not forwarded.

the FDB entries show:
# bridge fdb | grep -e tapb2b8c5ff-8c -e brqa50c5b7b-db -e ens256.3002 | grep -v -e ^01:00:5e -e ^33:33
00:50:56:89:64:e0 dev ens256.3002 master brqa50c5b7b-db
00:16:3e:ba:fa:33 dev ens256.3002 vlan 1 master brqa50c5b7b-db permanent
fa:16:3e:f8:76:cf dev ens256.3002 master brqa50c5b7b-db
00:16:35:bf:5f:e5 dev ens256.3002 master brqa50c5b7b-db
fa:16:3e:0d:c0:42 dev ens256.3002 master brqa50c5b7b-db
00:50:56:89:69:d9 dev ens256.3002 master brqa50c5b7b-db
9e:dc:1b:a2:9b:2e dev ens256.3002 master brqa50c5b7b-db
00:16:3e:ba:fa:33 dev ens256.3002 master brqa50c5b7b-db permanent
0e:c7:c3:cd:8d:fa dev ens256.3002 master brqa50c5b7b-db
fe:16:3e:0d:c0:42 dev tapb2b8c5ff-8c vlan 1 master brqa50c5b7b-db permanent
fe:16:3e:0d:c0:42 dev tapb2b8c5ff-8c master brqa50c5b7b-db permanent

Showmacs confirm:
# brctl showmacs brqa50c5b7b-db
port no mac addr is local? ageing timer
  2 00:16:35:bf:5f:e5 no 0.16
  2 00:16:3e:ba:fa:33 yes 0.00
  2 00:16:3e:ba:fa:33 yes 0.00
  2 00:50:56:89:64:e0 no 0.10
  2 00:50:56:89:69:d9 no 0.20
  2 0e:c7:c3:cd:8d:fa no 0.10
  2 9e:dc:1b:a2:9b:2e no 0.12
  2 fa:16:3e:0d:c0:42 no 20.00
  2 fa:16:3e:f8:76:cf no 13.33
  1 fe:16:3e:0d:c0:42 yes 0.00
  1 fe:16:3e:0d:c0:42 yes 0.00

This shows the Guest (fa:16:3e:0d:c0:42) as Non-Local originating ens256.3002 instead of tapb2b8c5ff-8c which I suspect causes packets not being forwarded into tapb2b8c5ff-8c.

The VM has now no means of ingress connectivity to the vlan backed network but outgoing packets are still being forwarded fine.

It's important to note that instances using vXlan backed networks function without issues when ageing is set. The issue seems therefore limited to vlan backed networks.

One significant difference in the FDB table between vlan and vxlan backed networks is the device which holds the guest MAC. On vxlan backed networks, this MAC is mapped to the tap device inside the FDB

I have 2 pcap recordings of DHCP traffic, one from the bridge and one from the tap showing traffic flowing out of the tap but not returning despite replies arriving on the bridge interface.

iptables have been rules out by prepending a -j ACCEPT at the top of the neutron-linuxbri-ib2b8c5ff-8 chain.

I talked to @ralonsoh and @seam-k-mooney on IRC yesterday about this issue and both suggested me to open this bug report.

Let me know if there is any logs/sysctl/settings I should append.

  • br.pcap Edit (13.3 KiB, application/vnd.tcpdump.pcap)

Bridge pcap

Tap pcap

https://lists.linuxfoundation.org/pipermail/bridge/2017-December/010830.html

This thread experiences the same issue but lacks a conclusion/solution...

Brian Haley (brian-haley) wrote :

I added Rodolfo to this bug, hopefully he can add more information from the IRC discussion.

Sean's comment in the related bug:

  its not that ageing prevents the vm receiving the packet.
  it worked without ageing because the unicast packets were bing flooded

  so that is why it "works" with ageing disabled. based on our irc conversation i think
  the flooding was masking a different proablem in the linux bridge agent where it was/is
  incorrectly setting entries in the bridge fdb against the vlan interface instead of the tap

From that it looks like a bug in the agent code, I just can't confirm it.

Changed in neutron:
importance: Undecided → High
tags: added: linuxbridge
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments