Comment 0 for bug 2048785

Revision history for this message
Bence Romsics (bence-romsics) wrote :

... therefore a forwarding loop, packet duplication, packet loss and double tagging is possible.

Today a trunk bridge with one parent and one subport looks like this:

# ovs-vsctl show
...
    Bridge tbr-b2781877-3
        datapath_type: system
        Port spt-28c9689e-9e
            tag: 101
            Interface spt-28c9689e-9e
                type: patch
                options: {peer=spi-28c9689e-9e}
        Port tap3709f1a1-a5
            Interface tap3709f1a1-a5
        Port tpt-3709f1a1-a5
            Interface tpt-3709f1a1-a5
                type: patch
                options: {peer=tpi-3709f1a1-a5}
        Port tbr-b2781877-3
            Interface tbr-b2781877-3
                type: internal
...

# ovs-vsctl find Port name=tpt-3709f1a1-a5 | egrep 'tag|vlan_mode|trunks'
tag : []
trunks : []
vlan_mode : []

# ovs-vsctl find Port name=spt-28c9689e-9e | egrep 'tag|vlan_mode|trunks'
tag : 101
trunks : []
vlan_mode : []

I believe the vlan_mode of the tpt port is wrong (at least when the port is not "vlan_transparent") and it should have the value "access".
Even when the port is "vlan_transparent", forwarding loops between br-int and a trunk bridge should be prevented.

According to: http://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt

"""
       vlan_mode: optional string, one of access, dot1q-tunnel, native-tagged,
       native-untagged, or trunk
              The VLAN mode of the port, as described above. When this column
              is empty, a default mode is selected as follows:

              • If tag contains a value, the port is an access port. The
                     trunks column should be empty.

              • Otherwise, the port is a trunk port. The trunks column
                     value is honored if it is present.
"""

"""
       trunks: set of up to 4,096 integers, in range 0 to 4,095
              For a trunk, native-tagged, or native-untagged port, the 802.1Q
              VLAN or VLANs that this port trunks; if it is empty, then the
              port trunks all VLANs. Must be empty if this is an access port.

              A native-tagged or native-untagged port always trunks its native
              VLAN, regardless of whether trunks includes that VLAN.
"""

The above combination of tag, trunks and vlan_mode for the tpt port means that it is in trunk mode (in the ovs sense) and it forwards both untagged and tagged frames with any vlan tag. But the tpt port should only forward untagged frames.

Feel free to treat this as the end of the bug report. But below I'll add more about how we found this bug, in what conditions can it be triggered, what consequences it may have. However please keep in mind I don't have a full upstream reproduction at the moment. Nor have I a full analysis of every suspicion mentioned below.

I'm aware of a full reproduction of this bug only in a downstream environment, which looked like below. While the following was sufficient to reproduce the problem, this was likely far from a minimal reproduction and some/many of the below steps are unnecessary.

* [securitygroup].firewall_driver = noop
* [ovs].explicitly_egress_direct = True
* 2 VMs started on the same compute.
* Both having a trunk port with one parent and one subport.
* The parent and the subport of each trunk have the same MAC address.
* All ports are on vlan networks belonging to the same physnet.
* All ports are created with --disable-port-security and --no-security-group.
* The subport segmentation IDs and the corresponding vlan network segmentation IDs were the same (as if they used "inherit").
* Traffic was generated from a 3rd VM on a different compute addressed to one of the VM's subport IP, for which
* the destination MAC was not yet learned by either br-int or the two trunk bridges on the host.

I believe the environment looked like this:

openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100
openstack network create net1 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 101

openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
openstack subnet create --network net1 --subnet-range 10.0.101.0/24 subnet1

openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0a
port0a_mac="$( openstack port show port0a -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address "$port0a_mac" --network net1 --fixed-ip ip-address=10.0.101.10 port1a

openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.11 port0b
port0b_mac="$( openstack port show port0b -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address "$port0b_mac" --network net1 --fixed-ip ip-address=10.0.101.11 port1b

openstack network trunk create --parent-port port0a trunka
openstack network trunk set --subport port=port1a,segmentation-type=vlan,segmentation-id=101 trunka

openstack network trunk create --parent-port port0b trunkb
openstack network trunk set --subport port=port1b,segmentation-type=vlan,segmentation-id=101 trunkb

openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --wait vma
openstack server create --flavor ds1G --image u1804 --nic port-id=port0b --wait vmb # booted on the same compute as vma

At the moment I don't have a reproduction independent of that environment, that re-creates the same state of the bridges' FDBs and the same kind of traffic.

Anyway, in this environment colleagues observed:
* Lost frames.
* Duplicated frames arriving to the vNIC of one of the VMs.
* Unexpectedly double tagged frames on the physical bridge leaving the compute host.

Local analysis showed as the traffic arrived to br-int, which did not have the dst MAC in its FDB, had to flood to all ports.
This way the frame ended up on both trunk bridges.
One of these trunk bridges was on the proper way to the destination address.
But the other trunk bridge, also not having the dst MAC in its FDB, had to flood to all ports.
And this trunk bridge also flooded the frame to its tpt port back to br-int.
But the tpt port conceptually is in a different VLAN and the frame should never have been flooded to that port.
However the tpt port has the wrong configuration and forwards the traffic from the wrong VLANs.

After the looped frame got back to br-int, it reached the intended VMs vNIC via the trunk parent (sic!) port. Which means that the latter trunk bridge learned the traffic generator's source MAC now on the wrong port. I have a suspicion that this may have lead to the unexpectedly double tagged packets in the other direction.