Openvswitch VLAN stripping issue with tunneling
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Openvswitch VLAN tag stripping and MTU issue
When L2 population is not used in an environment using Openvswitch as ML2 or when the learned rules are matching, the vlan tag used internally by Neutron is not stripped. Hence, for VXLAN the overhead of the tunneling is higher than the MTU reduction on the virtual networks because the VLAN tag is not stripped, thus causing MTU issues.
In my setup, I have several OpenStack clouds (Newton) deployed using Fuel, with VXLAN segmentation and using Openvswitch. It runs on Ubuntu 16.04. Some machines in the tenants virtual networks act as bridges and thus L2 population is not sufficient, the learning feature of br-tun is required. The deployments are the most basic that can be performed with Fuel 10 (no additionnal services).
The overhead of VXLAN is 50 Bytes, if the original ethernet frame does not have a VLAN tag. However, if the ethernet frame has a vlan tag, the overhead is 54 Bytes. When setting up the virtual network MTU, Neutron assumes that there is no vlan tag. However, Neutron uses internally vlan tags to isolate the networks in br-int and br-tun. When using L2 populations, the rules set in br-tun strip the vlan tag before tunneling, hence everything work properly. But, when L2 population is not used or its rules not hit and the learning part takes place, the learned rules do not strip the vlan, they only zero it, hence the overhead is 54 Bytes and the communication is broken.
The following learning rule in br-tun installs flows that zero the vlan tag and do not remove it.
in table 10:
#table=
resulting in flows in table 20 like this :
#table=
This flow does not remove the vlan tag. When using L2 population, some flows with higher priority are inserted, that do strip the vlan tag correctly. However, the learned flows are used if the L2 populations flow do not match.
Expected output : traffic without vlan tag tunneled in VXLAN with a 50 Bytes overhead
Actual output : traffic with a vlan tag (0) tunneled in VXLAN with a 54 Bytes overhead
The issue does not happen for GRE as the 4 additionnal bytes are still fitting in the 50 Bytes MTU reduction on the tenant network
tags: |
added: ovs removed: mtu openvswitch tunnel vlan |
Changed in neutron: | |
status: | New → Incomplete |
A temporary workaround could be to decrease all MTUs to 1446 instead of 1450 by seting the path_mtu variable to 1496 instead of 1500 in /etc/neutron/ plugins/ ml2/ml2_ conf.ini . However, to really fix the issue, the following could be done. The VLAN tag would be copied to a registry and stripped from the packet in table 2 of br-tun. The following matches (table 20 and 22) would be upon the registry instead of the vlan tags.
In terms of rules, the following changes are possible to fix the issue (output from ovs-ofctl dump-flows)
Categorizing the traffic into unicast and broadcast : Modify the rules in table 2 from 0,dl_dst= 00:00:00: 00:00:00/ 01:00:00: 00:00:00 actions= goto_table: 20 0,dl_dst= 01:00:00: 00:00:00/ 01:00:00: 00:00:00 actions= goto_table: 22 0,vlan_ tci=0x1000/ 0x1000, dl_dst= 00:00:00: 00:00:00/ 01:00:00: 00:00:00 actions= move:OXM_ OF_VLAN_ VID[]-> NXM_NX_ REG0[0. .11],pop_ vlan,goto_ table:20 1,vlan_ tci=0x1000/ 0x1000, dl_dst= 01:00:00: 00:00:00/ 01:00:00: 00:00:00 actions= move:OXM_ OF_VLAN_ VID[]-> NXM_NX_ REG0[0. .11],pop_ vlan,goto_ table:22
#table=2, priority=
#table=2, priority=
to
#table=2, priority=
#table=2, priority=
This will copy the vlan tag to the registry and remove the vlan tag
Mac learning rules : Modify the rules in table 10 from learn(table= 20,hard_ timeout= 300,priority= 1,cookie= <UNSET> ,OXM_OF_ VLAN_VID[ ],NXM_OF_ ETH_DST[ ]=NXM_OF_ ETH_SRC[ ],load: 0->OXM_ OF_VLAN_ VID[],load: NXM_NX_ TUN_ID[ ]->NXM_ NX_TUN_ ID[],output: OXM_OF_ IN_PORT[ ]),output: 1 learn(table= 20,hard_ timeout= 300,priority= 1,cookie= <UNSET> ,NXM_NX_ REG0[0. .11]=OXM_ OF_VLAN_ VID[],NXM_ OF_ETH_ DST[]=NXM_ OF_ETH_ SRC[],load: NXM_NX_ TUN_ID[ ]->NXM_ NX_TUN_ ID[],output: OXM_OF_ IN_PORT[ ]),output: 1
#table=10, priority=1 actions=
to:
#table=10, priority=1 actions=
This will match upon the registry instead of the vlan tag (that has been stripped)
Multicast output : finally modify the rules in table 22 following this patern 1,dl_vlan= <X> actions= pop_vlan, set_field: 0x<Y>-> tun_id, output: <Z>,output: <W> 22,priority= 2,reg0= 0x<X>00000/ 0xfff00000 actions= set_field: 0x<Y>-> tun_id, output: <Z>,output: <W>
#table=22, priority=
to:
#table=
This will match upon the registry instead of the vlan tag and will not strip it since it has been removed already in table 2.