Trunk parent port (tpt port) vlan_mode is wrong in ovs

Bug #2048785 reported by Bence Romsics
260
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
OpenStack Security Notes
Incomplete
Undecided
Unassigned
neutron
Fix Released
High
Bence Romsics

Bug Description

... therefore a forwarding loop, packet duplication, packet loss and double tagging is possible.

Today a trunk bridge with one parent and one subport looks like this:

# ovs-vsctl show
...
    Bridge tbr-b2781877-3
        datapath_type: system
        Port spt-28c9689e-9e
            tag: 101
            Interface spt-28c9689e-9e
                type: patch
                options: {peer=spi-28c9689e-9e}
        Port tap3709f1a1-a5
            Interface tap3709f1a1-a5
        Port tpt-3709f1a1-a5
            Interface tpt-3709f1a1-a5
                type: patch
                options: {peer=tpi-3709f1a1-a5}
        Port tbr-b2781877-3
            Interface tbr-b2781877-3
                type: internal
...

# ovs-vsctl find Port name=tpt-3709f1a1-a5 | egrep 'tag|vlan_mode|trunks'
tag : []
trunks : []
vlan_mode : []

# ovs-vsctl find Port name=spt-28c9689e-9e | egrep 'tag|vlan_mode|trunks'
tag : 101
trunks : []
vlan_mode : []

I believe the vlan_mode of the tpt port is wrong (at least when the port is not "vlan_transparent") and it should have the value "access".
Even when the port is "vlan_transparent", forwarding loops between br-int and a trunk bridge should be prevented.

According to: http://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.txt

"""
       vlan_mode: optional string, one of access, dot1q-tunnel, native-tagged,
       native-untagged, or trunk
              The VLAN mode of the port, as described above. When this column
              is empty, a default mode is selected as follows:

              • If tag contains a value, the port is an access port. The
                     trunks column should be empty.

              • Otherwise, the port is a trunk port. The trunks column
                     value is honored if it is present.
"""

"""
       trunks: set of up to 4,096 integers, in range 0 to 4,095
              For a trunk, native-tagged, or native-untagged port, the 802.1Q
              VLAN or VLANs that this port trunks; if it is empty, then the
              port trunks all VLANs. Must be empty if this is an access port.

              A native-tagged or native-untagged port always trunks its native
              VLAN, regardless of whether trunks includes that VLAN.
"""

The above combination of tag, trunks and vlan_mode for the tpt port means that it is in trunk mode (in the ovs sense) and it forwards both untagged and tagged frames with any vlan tag. But the tpt port should only forward untagged frames.

(edit: Instead of what's below please find a complete reproduction in comment #2: https://bugs.launchpad.net/neutron/+bug/2048785/comments/2)

Feel free to treat this as the end of the bug report. But below I'll add more about how we found this bug, in what conditions can it be triggered, what consequences it may have. However please keep in mind I don't have a full upstream reproduction at the moment. Nor have I a full analysis of every suspicion mentioned below.

I'm aware of a full reproduction of this bug only in a downstream environment, which looked like below. While the following was sufficient to reproduce the problem, this was likely far from a minimal reproduction and some/many of the below steps are unnecessary.

* [securitygroup].firewall_driver = openvswitch ((edited, originally was: noop))
* [agent].explicitly_egress_direct = True ((edited, originally was: [ovs].explicitly_egress_direct = True))
* 2 VMs started on the same compute.
* Both having a trunk port with one parent and one subport.
* The parent and the subport of each trunk have the same MAC address.
* All ports are on vlan networks belonging to the same physnet.
* All ports are created with --disable-port-security and --no-security-group.
* The subport segmentation IDs and the corresponding vlan network segmentation IDs were the same (as if they used "inherit").
* Traffic was generated from a 3rd VM on a different compute addressed to one of the VM's subport IP, for which
* the destination MAC was not yet learned by either br-int or the two trunk bridges on the host.

I believe the environment looked like this:

openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100
openstack network create net1 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 101

openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
openstack subnet create --network net1 --subnet-range 10.0.101.0/24 subnet1

openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0a
port0a_mac="$( openstack port show port0a -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address "$port0a_mac" --network net1 --fixed-ip ip-address=10.0.101.10 port1a

openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.11 port0b
port0b_mac="$( openstack port show port0b -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address "$port0b_mac" --network net1 --fixed-ip ip-address=10.0.101.11 port1b

openstack network trunk create --parent-port port0a trunka
openstack network trunk set --subport port=port1a,segmentation-type=vlan,segmentation-id=101 trunka

openstack network trunk create --parent-port port0b trunkb
openstack network trunk set --subport port=port1b,segmentation-type=vlan,segmentation-id=101 trunkb

openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --wait vma
openstack server create --flavor ds1G --image u1804 --nic port-id=port0b --wait vmb # booted on the same compute as vma

At the moment I don't have a reproduction independent of that environment, that re-creates the same state of the bridges' FDBs and the same kind of traffic.

Anyway, in this environment colleagues observed:
* Lost frames.
* Duplicated frames arriving to the vNIC of one of the VMs.
* Unexpectedly double tagged frames on the physical bridge leaving the compute host.

Local analysis showed as the traffic arrived to br-int, which did not have the dst MAC in its FDB, had to flood to all ports.
This way the frame ended up on both trunk bridges.
One of these trunk bridges was on the proper way to the destination address.
But the other trunk bridge, also not having the dst MAC in its FDB, had to flood to all ports.
And this trunk bridge also flooded the frame to its tpt port back to br-int.
But the tpt port conceptually is in a different VLAN and the frame should never have been flooded to that port.
However the tpt port has the wrong configuration and forwards the traffic from the wrong VLANs.

After the looped frame got back to br-int, it reached the intended VMs vNIC via the trunk parent (sic!) port. Which means that the latter trunk bridge learned the traffic generator's source MAC now on the wrong port. I have a suspicion that this may have lead to the unexpectedly double tagged packets in the other direction.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/905125

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote (last edit ):
Download full text (16.1 KiB)

After I realized and corrected two mistakes in my original bug report I managed to fully reproduce this bug.
This is still not a minimal reproduction, but at least it works in an upstream environment.

mistake #1: The firewall_driver should be openvswitch, not noop.
mistake #2: The explicitly_egress_direct config option should be in section [agent], not [ovs].

The reproduction below actually demonstrates two bugs, this one and another one.
To track the latter I'll open a new bug report, but will point out the 2nd bug here anyways.

I also had to realize that this bug has security implications, because there is crosstalk between networks.
In particular traffic meant for a trunk subport may be forwarded on a trunk's parent's network too.
In most trunk port use cases I believe the severity of this crosstalk is limited, because these networks most of time belong to the same tenant.
However if the parent network belongs to a different tenant or is a shared network then the crosstalk may be overheard by a different tenant too.
The amount of traffic potentially overheard is also limited by the state of the trunk bridge's fdb.

# environment
two host devstack setup with ml2/ovs
devstack0 - all in one host
devstack0a - compute host (most commands are issued here)

# devstack0a's correct config
ml2_conf.ini:
[securitygroup]
firewall_driver = openvswitch
[agent]
explicitly_egress_direct = True
[ovs]
bridge_mappings = physnet0:br-physnet0,...

# on devstack0a
# arbitrary IP that we can ping from our vms on net1
sudo ip link set up dev br-physnet0
sudo ip link add link br-physnet0 name br-physnet0.101 type vlan id 101
sudo ip link set up dev br-physnet0.101
sudo ip address add dev br-physnet0.101 10.0.101.1/24

# code
devstack 6b0f055b
neutron dce8c34dd2
openvswitch 2.17.8-0ubuntu0.22.04.1
linux 5.15.0-91-generic

# clean up, if not the first run
openstack server delete vma --wait
openstack network trunk delete trunka
openstack port delete port1a port0a
openstack network delete net1 net0

# build the environment
openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100
openstack network create net1 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 101

openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
openstack subnet create --network net1 --subnet-range 10.0.101.0/24 subnet1

openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0a
port0a_mac="$( openstack port show port0a -f value -c mac_address )"
openstack port create --no-security-group --disable-port-security --mac-address "$port0a_mac" --network net1 --fixed-ip ip-address=10.0.101.10 port1a
# we know these two ports have the same mac (on different networks though) therefore later I may use port0a's mac where port1a's mac would be more logical

openstack network trunk create --parent-port port0a trunka
openstack network trunk set --subport port=port1a,segmentation-type=vlan,segmentation-id=101 trunka

openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --availability-zone :devstack0a --wait vma...

information type: Public → Private Security
information type: Private Security → Public Security
Revision history for this message
Jeremy Stanley (fungi) wrote :

Thanks for flagging the potential security impact of this. Can someone provide a succinct exploit scenario for how an attacker might cause this to occur and then take advantage of it? Or is it merely one of those situations where someone could take advantage of the issue if they happen to find an environment where the necessary conditions were already met?

Changed in ossa:
status: New → Incomplete
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi Jeremy,

I believe it is "one of those situations where someone could take advantage of the issue if they happen to find an environment".

Revision history for this message
Lajos Katona (lajos-katona) wrote :

I checked and I was able to do the above reproduction in a single-host (all-in-one) devstack also.
By documentation the explicitly_egress_direct cfg option is for computes (where ovs-agent is running, but for the sake of testing it works in single host devstack also.
Thanks Bence for the detailed steps.

description: updated
Revision history for this message
Bence Romsics (bence-romsics) wrote :

I had some time to look into what are the minimal necessary conditions for the bugs reproduced in comment #2.

The forwarding problem is present in all ml2/ovs environments where a trunk port is set up by the ovs-agent and when the trunk bridge is flooding traffic. The values of firewall_driver and explicitly_egress_direct are irrelevant.

The flooding problem requires more. It is only present if explicitly_egress_direct is set to True (that's when the forwarding is asymmetric - direct output action in egress direction, but normal action in ingress direction). I did not open a new bug report yet, because I believe this problem was already reported here: https://bugs.launchpad.net/neutron/+bug/1884708 This bug report seems to be fixed by https://review.opendev.org/c/openstack/neutron/+/738551. However I believe that change did not fix all cases of the bug. It fixed the flooding problem with firewall_driver=noop. However it did not fix it with firewall_driver=openvswitch. I did not test with other firewall_drivers yet. I think we should reopen #1884708.

Revision history for this message
LIU Yulong (dragon889) wrote :

@Bence,

``explicity_egress_direct`` was mainly added for openvswitch firewall driver. Noop firewall driver is not the primary case of this option. There is a bug which is related to iptables_hybird drivers, so the fix is added to noop and iptables_hybird drivers as well.

But anyway, TBH, this option is not really cover the trunk port scenario. If trunk port is widely used, it's worthy to have a fix to cover it.

Revision history for this message
Bence Romsics (bence-romsics) wrote :

@Liu,

Please see my answer in the other bug report: https://bugs.launchpad.net/neutron/+bug/2051351/comments/2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/905125
Committed: https://opendev.org/openstack/neutron/commit/27601f8eead444283e4d1c258298ac5afaff377f
Submitter: "Zuul (22348)"
Branch: master

commit 27601f8eead444283e4d1c258298ac5afaff377f
Author: Bence Romsics <email address hidden>
Date: Tue Jan 9 14:18:45 2024 +0100

    Set trunk parent port as access port in ovs to avoid loop

    A non-vlan-transparent trunk parent port (tpt) should only forward
    untagged frames. Earlier it was configured to forward anything (trunk
    mode in ovs). This patch changes the trunk mode to access mode and
    sets the trunk parent's tag explicitly to 0.

    Change-Id: I4bcfe53fe87d7c9218dd0db9d7224bb323709a21
    Closes-Bug: #2048785

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Jeremy Stanley (fungi) wrote :

Since the fix provided for the upcoming 2024.1 coordinated release requires additional actions on the part of the operator to apply to existing deployments affected by the former behavior, it won't qualify for a security advisory (OSSA) but may still warrant a security note (OSSN) if anyone feels strongly enough about the severity to draft one. The closest classes in our report taxonomy are B1/B2 (or perhaps C1 or D if the severity is deemed low enough by Neutron maintainers): https://security.openstack.org/vmt-process.html#report-taxonomy

I'm closing out our OSSA (advisory) task as "won't fix" and adding an "incomplete" OSSN task in case someone decides it's something they want to work on.

Changed in ossa:
status: Incomplete → Won't Fix
Changed in ossn:
status: New → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/907142

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/907143

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/907144

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/907174

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/yoga)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/907174
Reason: Transition Yoga to Unmaintained: https://review.opendev.org/c/openstack/releases/+/906564

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/907142
Committed: https://opendev.org/openstack/neutron/commit/1e5ba78870fbf9de38764e0b39d7602cc32f7a25
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 1e5ba78870fbf9de38764e0b39d7602cc32f7a25
Author: Bence Romsics <email address hidden>
Date: Tue Jan 9 14:18:45 2024 +0100

    Set trunk parent port as access port in ovs to avoid loop

    A non-vlan-transparent trunk parent port (tpt) should only forward
    untagged frames. Earlier it was configured to forward anything (trunk
    mode in ovs). This patch changes the trunk mode to access mode and
    sets the trunk parent's tag explicitly to 0.

    Change-Id: I4bcfe53fe87d7c9218dd0db9d7224bb323709a21
    Closes-Bug: #2048785
    (cherry picked from commit 27601f8eead444283e4d1c258298ac5afaff377f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/907143
Committed: https://opendev.org/openstack/neutron/commit/17d15db7f4d350dd7252dc0d9a8a1855e6eda170
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 17d15db7f4d350dd7252dc0d9a8a1855e6eda170
Author: Bence Romsics <email address hidden>
Date: Tue Jan 9 14:18:45 2024 +0100

    Set trunk parent port as access port in ovs to avoid loop

    A non-vlan-transparent trunk parent port (tpt) should only forward
    untagged frames. Earlier it was configured to forward anything (trunk
    mode in ovs). This patch changes the trunk mode to access mode and
    sets the trunk parent's tag explicitly to 0.

    Change-Id: I4bcfe53fe87d7c9218dd0db9d7224bb323709a21
    Closes-Bug: #2048785
    (cherry picked from commit 27601f8eead444283e4d1c258298ac5afaff377f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/907144
Committed: https://opendev.org/openstack/neutron/commit/2de488ab11c4d66087ce86d9f3dac23c1fa3164d
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 2de488ab11c4d66087ce86d9f3dac23c1fa3164d
Author: Bence Romsics <email address hidden>
Date: Tue Jan 9 14:18:45 2024 +0100

    Set trunk parent port as access port in ovs to avoid loop

    A non-vlan-transparent trunk parent port (tpt) should only forward
    untagged frames. Earlier it was configured to forward anything (trunk
    mode in ovs). This patch changes the trunk mode to access mode and
    sets the trunk parent's tag explicitly to 0.

    Change-Id: I4bcfe53fe87d7c9218dd0db9d7224bb323709a21
    Closes-Bug: #2048785
    (cherry picked from commit 27601f8eead444283e4d1c258298ac5afaff377f)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0rc1

This issue was fixed in the openstack/neutron 24.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (unmaintained/yoga)

Fix proposed to branch: unmaintained/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/913505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.2.1

This issue was fixed in the openstack/neutron 21.2.1 release.

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.