tenant isolation is bypassed if port admin-state-up=false

Bug #1798904 reported by sean mooney
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
High
sean mooney
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
New
Undecided
Unassigned
os-vif
Fix Released
Critical
sean mooney

Bug Description

This bug is a second variant of https://bugs.launchpad.net/neutron/+bug/1734320

The original bug which is now public, was limited to the case where a vm is live
migrated resulting in a short window where the teant instance could recive vlan
tag traffic on the destination node before the neutron ml2 agent wires up the
port on the ovs bridge.

Note that while the original bug implied that the vm was only able to easedrop
on trafic it was also possible for the vm to send traffic to a different tenant
network by creating a vlan subport which corresponded to vlan in use for tenant
isolation on the br-int.

The original bug was determined to be a result of the fact that during live
migratrion if the vif-type was ovs and ovs_hybrid_plug=false the VIF was pluged
to the ovs bridge by the hyperviors when the vm was started on the destination
node instead of pre plugging it and waiting for neutron to signel it had
completed wireing up the port before migrating the instance.

Since live migration is a admin only operation unless intentionally change by
the operator the scope of this inital vector was limited.

The second vector to create a running vm with an untagged port does not require
admin privalages.

If a user creates a neutron port and sets the admin-state-up field to False

openstack port create --disable --network < my network> <port name>

and then either boots a vm with this port

openstack server create --flavor <flavor id> --image <image id> --port <port name> <vm name>

or attaches the port to an existing vm

openstack server add port <vm name> <port name>

This will similarly create a window where the port is attached to the guest but
neutron has not yet wired up the interface.

Note that this was repoted to me for queens with ml2/ovs and iptables firewall.
i have not personnaly validated that how to recreate it but i intend to
to reporduce this on master next week an report back.

i belive there are a few way that this can be mitagated.
the mitgations for the live migration variant will narrow the window
in which this variant will be viable and in general may be suffient in the
cases where the netruon agent is is running correctly.

but a more complete fix would involve modifiaction to nova neutron and os-vif.

from a neutron perspective we could extend the neturon port binidngs to container 2 addtion
fields.

ml2_driver_names:
    a orderd comma sperated list of the agents that bound this port.
    Note: this will be used by os-vif to determin if it should preferom adtion
    actions such as taging the port, or setting its tx/rx quese down
    to mitigate this issue.

ml2_port_events
    a list of time port stats events are emitted by a ml2 driver
    or a enum.
    Note: currently ml2/ovs signals nova that it has completed wiring
    up the port only when the agent has configured the vswitch but odl send the
    notification when the port is bound in the ml2 driver before the vswtich is
    configured. to be able to use these more effectivly with in nova we need
    to be able to know if the event is sent only

additionally change to os-vif and nova will be required to process this new info.

on the nova side if we know that a backend will send a event when the port is
wired up on the vswitch we may be able to make attach wait untll that has been
done.

if os-vif know the ovs plugin was been used with ml2/ovs and the ovs l2 agent it could
also contionally wait for the interface to be tagged by neutron.
this could be done via a config option however since the plugin is shared with
sdn controllers that manage ovs such as odl, ovn, onos and dragon flow it would
have to default to not waiting as these other backends do not use vlans for
tenant isolation.

similarly instad of waiting we could have os-vif apply a drop rule and vlan 4095
based on a config option. again this would have to default to false or insecure
to not break sdn based deploymetns.

if we combine one of the config options with the ml2_driver_names change
we can backport the fix with the config option only for stable releases and use
the ml2_driver_names from the vif detail if presnet for stien to
dynamically enable the mitigation when informed by neutron that it is required.
this will minimise the upgrade path and make it secure by defualt going forward
without breaking compatablity for stable branches.

Tags: security
Revision history for this message
Magnus Bergman (magnusbe) wrote :

I have successfully reproduced (both boot vm with port and attach existing vm to port) on Pike, Queens and Rocky now. I don't have the cycles to try to reproduce it on master at the moment but will try to get there as well.

Also note that we are not talking about a limited window of time, but rather that the port stays indefinitely as untagged.

Changed in os-vif:
assignee: nobody → sean mooney (sean-k-mooney)
Revision history for this message
sean mooney (sean-k-mooney) wrote :

not i am treating the os-vif bug as critial but the nova bug as high as i can intoduce a config option to enable the mitgation on the os-vif side without the nova change. since this is a longstanding bug that exists across multple release of openstack i dont think its a blocker i.e. a critical bug from a nova point of view but i think it should be considerd high for both nova and neutron.

Changed in os-vif:
status: New → Confirmed
importance: Undecided → Critical
Changed in nova:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → sean mooney (sean-k-mooney)
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

In keeping with recent OpenStack vulnerability management policy changes, no report should remain under private embargo for more than 90 days. Because this report predates the change in policy, the deadline for public disclosure is being set to 90 days from today. If the report is not resolved within the next 90 days, it will revert to our public workflow as of 2020-05-27. Please see http://lists.openstack.org/pipermail/openstack-discuss/2020-February/012721.html for further details.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

It doesn't look like this report has seen any activity since my update two months ago, so consider this a friendly reminder:

The embargo for this report is due to expire one month from today, on May 27, and will be switched public on or shortly after that day if it is not already resolved sooner.

Thanks!

Jeremy Stanley (fungi)
description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

The embargo for this report has expired and is now lifted, so it's acceptable to discuss further in public.

description: updated
information type: Private Security → Public Security
Revision history for this message
sean mooney (sean-k-mooney) wrote :

i believe this was fixed by
https://github.com/openstack/os-vif/commit/d291213f1ea62f93008deef5224506fb5ea5ee0d

when isolate_vif=True we will plug the interfaces on the dead vlan 4095
that will prevent the port form sending receiving packets until it is moved by the l2 agent to the tenant vlan.

i have not recently check this case specifriclly so its not clear if its resolved but i belive it should be. that chage was released in december 2018 https://github.com/openstack/os-vif/releases/tag/1.13.0 as part of the parent issue https://bugs.launchpad.net/neutron/+bug/1734320

we should still validate this to confrim before we update this to fixed release.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

actully maybe not if this affect ml2/ovs with ovs/noop firewall driver
we will still need https://review.opendev.org/c/openstack/nova/+/602432

which is requried to fix the remain edgecase for https://bugs.launchpad.net/neutron/+bug/1734320
also.

unfortunetlly to backport that patch we need to backport the following set of patches.

live migration backports
https://review.opendev.org/c/openstack/nova/+/747454/4
https://review.opendev.org/c/openstack/nova/+/742180/12
https://review.opendev.org/c/openstack/nova/+/767368
https://review.opendev.org/c/openstack/neutron/+/640258
https://review.opendev.org/c/openstack/nova/+/602432
https://review.opendev.org/c/openstack/neutron/+/766277
https://review.opendev.org/c/openstack/neutron/+/753314

most of which are being backpoted anyway but to fully fix both issues we need to fix a race inside neutron and what ports the l2 agent will process that then allows us to delegate port plugin to os-vif in all cases regarding ovs which allwos us to remove the libvirt race for
https://bugs.launchpad.net/neutron/+bug/1734320

while the libvirt race is not relevent for this bug as its cause by the admin status it is relevent to be able to use os-vif to enfoce the dead vlan since we cant add the interface to the isolated dead vlan using libvirt directly.

so both bugs need to delegate port plugging to os-vif to allow addign it to the isolated vlan.
the other bug also need that to avoid the race on live migration.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

im going to move the os-vif bug to fixed released as
https://github.com/openstack/os-vif/commit/d291213f1ea62f93008deef5224506fb5ea5ee0d
fixes what can be fixed by os-vif alone. this was part of https://github.com/openstack/os-vif/releases/tag/1.13.0

i am going to leave the nova bug as is untill this has been tested end to end as i belive
 https://review.opendev.org/c/openstack/nova/+/602432 is still required for nova

Changed in os-vif:
status: Confirmed → Fix Released
Revision history for this message
Jeremy Stanley (fungi) wrote :

Thanks for clearing up the os-vif bits (though the OpenStack Vulnerability Management Team doesn't oversee any security advisory process for that deliverable at present, so it doesn't imply any action on our part yet). Sounds like the nova situation is still up in the air... any idea what neutron's involvement is in the defect at this point? Is there any chance of changes being needed to the neutron deliverable as part of a fix?

Revision history for this message
Gage Hugo (gagehugo) wrote :

Any updates with this? Looks like https://review.opendev.org/c/openstack/nova/+/602432 was merged a week ago.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Also is that fix suitable for backporting to supported stable branches?

Revision history for this message
sean mooney (sean-k-mooney) wrote :

https://review.opendev.org/c/openstack/nova/+/602432 has now merged yes but i have not tested if it also fixes this.

in principal it should when combinid with the os-vif config option for port isolation.
https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L90-L94

i guess we just need to test it an verify.
in principal i belive seting

[os_vif_ovs]
isolate_vif=true

with master should close this issue as the ovs port will be created on vlan 4095 which will result in all packet being droped.
admin-state-up=false will not alter the behaivor of nova/os-vif so at that point connectivy to the port is entirely up to the l2 agent to establish when admin-state-up=true is set.

for ovn you cannot use [os_vif_ovs]/isolate_vif=true
but we do not expect this issue to be present with ovn.

i can try and retest this later in the week but if anyone else can test it in the interim that would be awesome.

i have propsosed an inital backport of https://review.opendev.org/c/openstack/nova/+/602432
to stable wallaby https://review.opendev.org/c/openstack/nova/+/790447

if im being totally honest this bug and all it forms has somewhat burnt me out so im not
sure i have the mental enery to back port this to all affected brnahces.

sepcially when it comes to checkign all the depencies across nova,os-vif and neutron but i think we shoudl be able to move nova to fix released if we can confirm https://review.opendev.org/c/openstack/nova/+/602432 + [os_vif_ovs]/isolate_vif=true works.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Based on discussion between VMT members and others in the OpenStack Security SIG during the 2023.1 PTG, it's presumed that all maintained branches of affected projects are no longer subject to this problem. Since it was fixed in master and then the affected branches eventually aged out of maintenance, we're treating it as effectively class B1 per our report taxonomy: https://security.openstack.org/vmt-process.html#report-taxonomy

Changed in ossa:
status: Incomplete → Won't Fix
information type: Public Security → Public
tags: added: security
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.