nova allows non string values in pci whitelist for physical_network tag.

Bug #1915282 reported by sean mooney
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
High
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
New
Undecided
Unassigned

Bug Description

the nova pci whitelist has evovled and gained may capablities since it was first intodcued.
due to the complexity of the value we support we do not parse it using oslo.config
to do typed parseing of all values.

the whitelist is a multiopt string field
https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173

which is then parsed by oslo_serialization.jsonutiles.loads

https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58

 dev_spec = jsonutils.loads(jsonspec)

this allows a wider set of inputs then were technically leagal values based on what we intended to support.

one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it.

This has been abused to enable functionality that was never intended to work.
specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks.

the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet.

https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html
https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html

after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network.

https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340

this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it
was never intended to be supported.

to be clear the invlaid input here is the litral value null

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null}
This gets converted to tags: {"physical_network": None}

if you instead quote null e.g.

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"}

This gets converted to tags: {"physical_network": "null"}

then it will not allow the device to be use with a tunneled network.
which is problematinc and confusign for users.

when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db

it gets storaed as

{"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}
 which means both "null" and null are being coalesed into the same value but that also means if we
read this back it will be read as a sting.

This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way.

unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it.

to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future.
docs should also be updated in nova and neutron to refect this.
when strict enforcment is enabled we should treat the presence of null as an error.
i would suggest making this the default form xena on.

in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases.

the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova.

i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent.
a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif.
form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation.

now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind.

https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90

i have also looked at ml2/ovs and ml2/ovn.
in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs.
in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation.

for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply.

a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs.
its not clear that this api will only ever be used by nic managed by hardware offloaed ovs.

i have another concern that i tought of while writing this up.
the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs.

if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in
https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4

if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs.

this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this.

i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova.

neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete
security advisory task has been added while the core security
reviewers for the affected project or projects confirm the bug and
discuss the scope of any vulnerability along with potential
solutions.

description: updated
Changed in ossa:
status: New → Incomplete
Revision history for this message
Jeremy Stanley (fungi) wrote :

Just as a reminder, the VMT will only issue an advisory (OSSA) if an exploitable vulnerability is identified and code fixes for it are backported to supported stable branches. If there turns out to be the possibility of vulnerable production deployments where the best we can do is provide guidance to operators, then we should go ahead and switch this to a public report as soon as reasonable guidance has been drafted, for example in the form of a security note (OSSN), so as to get information into the hands of people who need it at the earliest opportunity.

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (6.9 KiB)

just to clarify the possibel security issue ill try and give an exact scenario.

neutron deployed with ml2/ovn after support for external ports is added with vnic-type driect physical.

https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4

in nova.conf you have
    [pci]
    passthrough_whitelist = {"address":"*:03:00.*","physical_network":null}

note the address is not relevent here just the physical_network: null
im assuming the addres matches a PF in this case and that it is in switchdev mode.
it would be in switch deve mode if it had VF were inteded to be used for hardware offloaded ovs.

a more commen example might be
    [pci]
    passthrough_whitelist = {"devname":"<pf netdev name>","physical_network":null}
while we discourage the use of devname in the whitelist anyway many people
use it as an easy way to whitelist the nic and if its a PF any of its child vfs.

as a use i create a port of vnic-type=driect-physical on a vxlan or geneve network.
i.e. openstack port create --vnic-type direct-physical --network my-tunneled-network my-pf

note many users will not know if the private teanant networks they create are backed by a tunnel or not. https://docs.openstack.org/api-ref/network/v2/index.html?expanded=create-network-detail#create-network
per the api ref the segmentation-type is optional and if not set is chosen by neutron.

NOTE: (neutron will chose the first segmentation type driver form its list of enabled tenant segmentaion types which is typically a tunnel network to preserve the vlan/flat networks for the admin to use as provider networks. the network type is available to admin users via a openstack network show but not to normal users so they dont know what type of network they will get.)

then the user boot a vm with this port
i.e. openstack server create --port my-pf my-server

at this point nova will schdule the vm to a host that has a free PF.
nova will consider any PF that is whitelisted and has not had a vm claim one of its VFs
as an avaiable PF. this is a feature used by many to not need to decide if a host nic should be used for PF or VFs upront. e.g. it can be consumed as a PF it its not hosting instance one of its VF alraedy or can be used for VF without needing to update the nova config and restart the agent.

if the PF has is in switchdev mode (i.e because you intended its VFs to be used via ovs) then
it will pass this check in the ovn driver.

https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L861-L865

        capabilities = ovn_utils.get_port_capabilities(port)
        if (vnic_type in ovn_const.EXTERNAL_PORT_TYPES and
                ovn_const.PORT_CAP_SWITCHDEV not in capabilities):
            LOG.debug("Refusing to bind port due to unsupported vnic_type: %s "
                      "with no switchdev capability", vnic_type)
            return

ovn_const.EXTERNAL_PORT_TYPES is defined here https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/common/ovn/constants.py#L286-L289

EXTERNAL_PORT_TYPES = (portbindings.VNIC_DIRECT,
                       por...

Read more...

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (3.9 KiB)

i will try to check this again tomorow after i have a clear head but if someone else can confrim that analysis that woudl be helpful.

asuuming this is not a securuity issue today there are a few things we can do to solve this broken behavior.

first on the neutron side we can remove direct-physical form the list of supported vnic types for now. if we want to supprot PF with ovn as external port in the future we can by ensuring they are being bound to a flat network and use vif_type hw_veb instead as the sriov nic agent does
that will fix the nova xml generattion and prevent this issue.

note there are still some tenat isolation issue with allowign flat networks but typically those will only be create by admin and they should know that its not safe to combine vm with direct-physical vnic type with a phyent that has both flat and vlan netowrk as it could easdrop on the vlan ones unless you use a heriacical port bindinding driver on the top of rack swith to prevent that. i.e. make so that the network is not vlan transparent at the top of rack swithc level.

on the nova side we can block whitelisting pci devices with physical_network: null, but add a workaround config option to renable that for those that are abusing our parsing to enable tunneled netowrks with ovs hardware offloads.

we could also be a little more targeted and never allow null for PFs.

in the senario above a VF would have been fine because we would have added the representor netdev to ovs and hardware offloaed ovs would have either process all packets in the software dataplane or offloaded the vxlan encapsulation to hardware.

in the PF case there is noting on the host we can add to ovs for hardware offloaed ovs to manage and we cannot enforece any tenant isolation via nova/libvirt so i dont see how we can ever supprot tunneled netowrk for type PF
vlan networks are alos suspect for PF by the way even though they are allowed by
https://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/sriov-pf-passthrough-neutron-port-vlan.html

personally i would have vetoed that if given the opertunity since it rely on a hyiracical port binding driver which we as nova have no way to validate to make it safe.
but that was at least discussed and reasoned about before hand even if i disagree with the choice that was made. there are many other issue with PF like the fact that all move operation while they appear to work are broken in at least 1 way i know of so people really should avoid them in production.

any addtionally to filtering or blocking the use of null conventionally via a workaround config option the other thing we need to do is understand how the pci passhtough filter is working or not work in this case and what other un intended side effect happen by not blocking the use of null.

the only reason i am not recommend flat out blocking null in all cases on the nova side is i am told that we have customer using it succesfully or planing to do so in the future so it appare when the stars aline it can actully work even if we dont know how.

the fact that device with phyical_network: null are storead as

{"devname":"<pf netdev name>","physical_network":null}

does mean...

Read more...

Revision history for this message
melanie witt (melwitt) wrote :

This was not easy for me to parse as it's not my wheelhouse at all, but I'll try to give my interpretation of the bug.

If I understand correctly, the potential security issue is one where tenant(?) isolation is not enforced when it was supposed to be enforced.

But in order for the generated XML to exhibit the non-isolated, vulnerable state, multiple things (plugin type + tunneled network and vif_type=hw_veb) have to be true in order to have the security hole. And the rationale is that if it's sufficiently rare for a user to end up with this combination of things, it need not be considered a security bug.

If I've understood correctly (there's a high chance I haven't), then I would tend to think even if the security vulnerability is rare, I would still think to treat it as a security bug.

I guess the question is, how rare is it? How could we know how rare it is that deployers have such a configuration?

Revision history for this message
melanie witt (melwitt) wrote :

We discussed this issue the other day and Sean walked me through the details. I'm attempting to summarize here and Sean will correct me if I've got something wrong.

Tenant network isolation will not be enforced if (1) "'physical_network':null" is specified in the nova [pci]passthrough_whitelist and (2) any of the following deployment configurations are used:

  * Tunneled network + PF

  * Tunneled network + VF with vif_type=hw_veb (used by SRIOV NIC agent) + NIC that does not have features required for isolation (only possible with out-of-tree neutron driver)

The in-tree neutron driver (ml2) will only offload traffic to the NIC if the NIC has the required features to enforce tenant isolation, else the driver will fall back to not offloading the traffic. This is why the ml2 driver is safe, because it does its own validation.

The use of "'physical_network':null" in the nova [pci]passthrough_whitelist has been documented by neutron [1] and some operators are already using it in production. These operators are assumed to be using the ml2 driver with supported NICs.

Tunneled network + PF is something that doesn't make logical sense to configure and we expect zero operators to be doing this today.

Tunneled network + VF SRIOV + ml2 in-tree driver we expect operators are using today and is safe.

Tunneled network + VF SRIOV + out-of-tree driver + unsupported NICs is something that operators _could_ be in danger of doing and these are the operators who should be warned.

Action we can take now:

  * Block and reject tunneled network + PF when detected as it is dangerous and does not make sense. Upgrade impact not expected as we don't expect anyone is currently doing this. The block would be just to explicitly reject such an invalid config going forward

  * Add a warning/known issue to our release notes and [pci]passthrough_whitelist config option help to explain possible danger with out-of-tree driver config described earlier. Maybe also post to ML with the [ops] tag

Future action we could take (Xena release):

  * Block and reject tunneled network + VF by default and force operator opt-in via [workarounds] config option until we (nova + neutron) formally support this unintended feature of hardware offloading

Now that I [hopefully] understand the situation more, I lean toward thinking blocking and rejecting tunneled network + VF by default seems maybe too hostile, as it would require operators safely using the in-tree ml2 driver today to deploy a [nova] configuration change when they upgrade to Xena in order to avoid breaking their deployments. Operators in this boat probably got there by following the neutron documentation [1].

[1] https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html#configure-nodes-vxlan-configuration

Revision history for this message
melanie witt (melwitt) wrote :

One more note, tunneled network + PF _could_ be safe IFF the operator has configured tenant isolation on their top-of-rack switches. Because of this, I'm not 100% sure we should block it. And maybe just add that explanation/warning onto the config help for [pci]passthrough_whitelist.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Just following back up on this because it would be nice to have it figured out before the window closes on the Wallaby release, but also we're already nearly a month into the embargo... is there some consensus that this is an unlikely enough risk we could safely switch to discussing the bug and potential fixes in public instead?

Revision history for this message
melanie witt (melwitt) wrote :

Understood Jeremy, I will ping Sean to ask to come back to this after our Feature Freeze Day (tomorrow March 11), things have been too hectic because of it.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i spoke to melanie about this on a private irc today.
i did actully see comes 6-9 as they came in and melwitt was right that i unfortunetly
could not find time to respond to this because fo FF and the VDPA feature i was working on.
i did however make a few attepmets so scope the possibel impact but did not have time to comment on them.

as noted above the only in tree neturon ml2 driver that will allow PF + tunnels is ovn
and we know that that by desingn cannot work correctly so we should block that in the ovn ml2 driver as part of adressing this bug.

i also use https://codesearch.opendev.org/ to quickly search for all ml2 driver that support PFs and then i tried to see if any also support vxlan. from wahat i recall a few weeks ago wehn i did that i found none. so while its possible that there is an ml2 driver out there that implemnt hieracical port binidng and support tunnel termination at the top of rack switch similar to how the arrista network mech driver can do it for VF i have not found any that do so for PFs.

so if we block tunnels+PF i dont known of any configurtion today that we woudl break.

to be safe we could block it by default and add a new workarounds config option with a nice an scary name i.e. [workarounds]/enable_unsafe_pf_tunnelling, which we could deprecate imediatly.
if we can then keep it for a cycle or two and remove it if no operator objects.

while working on the VDPA work i also in adversely tested part of this code path.
VDPA uses VFs not PFs but i was able to whitelist the VF with physical_network: null
and was able to assert that we can indeed bind the VF ports (technial i used vdpa ports) to a tunneled network. on the nova side i also observed that the pcipassthough filter could actuly tell the difference between null and "physnet1", i didn expressly test "null" but when i was printing some extra debug logs null form the db was not quoted as i previously thought so we shoudl be able to tell them appart at all points.

as a result we can tell the difference between someone declaring the network as for use with tunnels with an unquoted null and someone who happend to chosee "null"as there neuton physnet name for some other reason. as a result we can explictly doument the reseved value null as being a sentil resrved for allocating pci device for use with tunneled networking.

for VF + tunnels on check we probably shoudl do is check if the sriov device has switchdev capablity. we could then either block it or warn if it not a switchdev enable vf.

in anycase i thikn we have a path forward both in terms of nova and neutron action and we can likely move this to a public bug so that we can use the normal reveiew process to track the work around both projects.

Revision history for this message
melanie witt (melwitt) wrote :

+1 what Sean said, agree based on (1) the fact that this mode of operation has already been publicly documented on the neutron side and (2) cannot adversely affect anyone using in-tree drivers and (3) requires an advanced deployment topology in order to be affected (VF + tunneled + unsupported NIC or PF + tunneled + no proper TOR config), it's likely OK to work on this through the normal public channels.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Thanks, for now the VMT will consider this a class C1 report (impractical vulnerability) and so not plan to issue any advisory:

https://security.openstack.org/vmt-process.html#incident-report-taxonomy

 Let us know if the situation changes and you determine it's risky enough to backport a solution to supported stable branches.

description: updated
Changed in ossa:
status: Incomplete → Won't Fix
information type: Private Security → Public
tags: added: security
Changed in nova:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.