tenant isolation is bypassed if port admin-state-up=false

Bug #1798904 reported by sean mooney on 2018-10-19
264
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
sean mooney
OpenStack Security Advisory
Undecided
Unassigned
neutron
Undecided
Unassigned
os-vif
Critical
sean mooney

Bug Description

This bug is a second variant of https://bugs.launchpad.net/neutron/+bug/1734320

The original bug which is now public, was limited to the case where a vm is live
migrated resulting in a short window where the teant instance could recive vlan
tag traffic on the destination node before the neutron ml2 agent wires up the
port on the ovs bridge.

Note that while the original bug implied that the vm was only able to easedrop
on trafic it was also possible for the vm to send traffic to a different tenant
network by creating a vlan subport which corresponded to vlan in use for tenant
isolation on the br-int.

The original bug was determined to be a result of the fact that during live
migratrion if the vif-type was ovs and ovs_hybrid_plug=false the VIF was pluged
to the ovs bridge by the hyperviors when the vm was started on the destination
node instead of pre plugging it and waiting for neutron to signel it had
completed wireing up the port before migrating the instance.

Since live migration is a admin only operation unless intentionally change by
the operator the scope of this inital vector was limited.

The second vector to create a running vm with an untagged port does not require
admin privalages.

If a user creates a neutron port and sets the admin-state-up field to False

openstack port create --disable --network < my network> <port name>

and then either boots a vm with this port

openstack server create --flavor <flavor id> --image <image id> --port <port name> <vm name>

or attaches the port to an existing vm

openstack server add port <vm name> <port name>

This will similarly create a window where the port is attached to the guest but
neutron has not yet wired up the interface.

Note that this was repoted to me for queens with ml2/ovs and iptables firewall.
i have not personnaly validated that how to recreate it but i intend to
to reporduce this on master next week an report back.

i belive there are a few way that this can be mitagated.
the mitgations for the live migration variant will narrow the window
in which this variant will be viable and in general may be suffient in the
cases where the netruon agent is is running correctly.

but a more complete fix would involve modifiaction to nova neutron and os-vif.

from a neutron perspective we could extend the neturon port binidngs to container 2 addtion
fields.

ml2_driver_names:
    a orderd comma sperated list of the agents that bound this port.
    Note: this will be used by os-vif to determin if it should preferom adtion
    actions such as taging the port, or setting its tx/rx quese down
    to mitigate this issue.

ml2_port_events
    a list of time port stats events are emitted by a ml2 driver
    or a enum.
    Note: currently ml2/ovs signals nova that it has completed wiring
    up the port only when the agent has configured the vswitch but odl send the
    notification when the port is bound in the ml2 driver before the vswtich is
    configured. to be able to use these more effectivly with in nova we need
    to be able to know if the event is sent only

additionally change to os-vif and nova will be required to process this new info.

on the nova side if we know that a backend will send a event when the port is
wired up on the vswitch we may be able to make attach wait untll that has been
done.

if os-vif know the ovs plugin was been used with ml2/ovs and the ovs l2 agent it could
also contionally wait for the interface to be tagged by neutron.
this could be done via a config option however since the plugin is shared with
sdn controllers that manage ovs such as odl, ovn, onos and dragon flow it would
have to default to not waiting as these other backends do not use vlans for
tenant isolation.

similarly instad of waiting we could have os-vif apply a drop rule and vlan 4095
based on a config option. again this would have to default to false or insecure
to not break sdn based deploymetns.

if we combine one of the config options with the ml2_driver_names change
we can backport the fix with the config option only for stable releases and use
the ml2_driver_names from the vif detail if presnet for stien to
dynamically enable the mitigation when informed by neutron that it is required.
this will minimise the upgrade path and make it secure by defualt going forward
without breaking compatablity for stable branches.

Magnus Bergman (magnusbe) wrote :

I have successfully reproduced (both boot vm with port and attach existing vm to port) on Pike, Queens and Rocky now. I don't have the cycles to try to reproduce it on master at the moment but will try to get there as well.

Also note that we are not talking about a limited window of time, but rather that the port stays indefinitely as untagged.

Changed in os-vif:
assignee: nobody → sean mooney (sean-k-mooney)
sean mooney (sean-k-mooney) wrote :

not i am treating the os-vif bug as critial but the nova bug as high as i can intoduce a config option to enable the mitgation on the os-vif side without the nova change. since this is a longstanding bug that exists across multple release of openstack i dont think its a blocker i.e. a critical bug from a nova point of view but i think it should be considerd high for both nova and neutron.

Changed in os-vif:
status: New → Confirmed
importance: Undecided → Critical
Changed in nova:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → sean mooney (sean-k-mooney)

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated
Jeremy Stanley (fungi) wrote :

In keeping with recent OpenStack vulnerability management policy changes, no report should remain under private embargo for more than 90 days. Because this report predates the change in policy, the deadline for public disclosure is being set to 90 days from today. If the report is not resolved within the next 90 days, it will revert to our public workflow as of 2020-05-27. Please see http://lists.openstack.org/pipermail/openstack-discuss/2020-February/012721.html for further details.

Changed in ossa:
status: New → Incomplete
description: updated
Jeremy Stanley (fungi) wrote :

It doesn't look like this report has seen any activity since my update two months ago, so consider this a friendly reminder:

The embargo for this report is due to expire one month from today, on May 27, and will be switched public on or shortly after that day if it is not already resolved sooner.

Thanks!

Jeremy Stanley (fungi) on 2020-05-19
description: updated
Jeremy Stanley (fungi) wrote :

The embargo for this report has expired and is now lifted, so it's acceptable to discuss further in public.

description: updated
information type: Private Security → Public Security
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers