neutron

Bug #1915282
Comment #0

Comment 0 for bug 1915282

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2021-02-10:

the nova pci whitelist has evovled and gained may capablities since it was first intodcued.
due to the complexity of the value we support we do not parse it using oslo.config
to do typed parseing of all values.

the whitelist is a multiopt string field
https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173

which is then parsed by oslo_serialization.jsonutiles.loads

https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58

dev_spec = jsonutils.loads(jsonspec)

this allows a wider set of inputs then were technically leagal values based on what we intended to support.

one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it.

This has been abused to enable functionality that was never intended to work.
specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks.

the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet.

https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html
https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html

after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network.

https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340

this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it
was never intended to be supported.

to be clear the invlaid input here is the litral value null

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null}
This gets converted to tags: {"physical_network": None}

if you instead quote null e.g.

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"}

This gets converted to tags: {"physical_network": "null"}

then it will not allow the device to be use with a tunneled network.
which is problematinc and confusign for users.

when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db

it gets storaed as

{"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}
which means both "null" and null are being coalesed into the same value but that also means if we
read this back it will be read as a sting.

This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way.

unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it.

to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future.
docs should also be updated in nova and neutron to refect this.
when strict enforcment is enabled we should treat the presence of null as an error.
i would suggest making this the default form xena on.

in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases.

the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova.

i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent.
a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif.
form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation.

now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind.

https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90

i have also looked at ml2/ovs and ml2/ovn.
in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs.
in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation.

for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply.

a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs.
its not clear that this api will only ever be used by nic managed by hardware offloaed ovs.

i have another concern that i tought of while writing this up.
the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs.

if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in
https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4

if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs.

this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this.

i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova.

neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.

the whitelist is a multiopt string field
https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173

which is then parsed by oslo_serialization.jsonutiles.loads

https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58

dev_spec = jsonutils.loads(jsonspec)

this allows a wider set of inputs then were technically leagal values based on what we intended to support.

one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it.

the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet.

https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html
https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html

https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340

this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing  with vxlan. we also have no test coverage for this in nova because it 
was never intended to be supported.

to be clear the invlaid input here is the litral value null

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null}
This gets converted to tags: {"physical_network": None}

if you instead quote null e.g.

passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"}

This gets converted to tags: {"physical_network": "null"}

then it will not allow the device to be use with a tunneled network.
which is problematinc and confusign for users.

when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db

it gets storaed as

{"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}
 which means both "null" and null are being coalesed into the same value but that also means if we
read this back it will be read as a sting.

the intent here  is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova.

i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent. 
a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif.
form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation.

now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind.

https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90

for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply.

if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in 
https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4

if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs.

this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this.