the nova pci whitelist has evovled and gained may capablities since it was first intodcued.
due to the complexity of the value we support we do not parse it using oslo.config
to do typed parseing of all values.
this allows a wider set of inputs then were technically leagal values based on what we intended to support.
one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it.
This has been abused to enable functionality that was never intended to work.
specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks.
the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet.
after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network.
this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it
was never intended to be supported.
to be clear the invlaid input here is the litral value null
passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null}
This gets converted to tags: {"physical_network": None}
This gets converted to tags: {"physical_network": "null"}
then it will not allow the device to be use with a tunneled network.
which is problematinc and confusign for users.
when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db
it gets storaed as
{"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}
which means both "null" and null are being coalesed into the same value but that also means if we
read this back it will be read as a sting.
This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way.
unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it.
to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future.
docs should also be updated in nova and neutron to refect this.
when strict enforcment is enabled we should treat the presence of null as an error.
i would suggest making this the default form xena on.
in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases.
the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova.
i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent.
a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif.
form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation.
now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind.
i have also looked at ml2/ovs and ml2/ovn.
in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs.
in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation.
for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply.
a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs.
its not clear that this api will only ever be used by nic managed by hardware offloaed ovs.
i have another concern that i tought of while writing this up.
the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs.
if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs.
this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this.
i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova.
neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.
the nova pci whitelist has evovled and gained may capablities since it was first intodcued.
due to the complexity of the value we support we do not parse it using oslo.config
to do typed parseing of all values.
the whitelist is a multiopt string field /github. com/openstack/ nova/blob/ 56ac9b22cfa71da af7a452fda63bd2 7811a358c4/ nova/conf/ pci.py# L88-L173
https:/
which is then parsed by oslo_serializat ion.jsonutiles. loads
https:/ /github. com/openstack/ nova/blob/ 56ac9b22cfa71da af7a452fda63bd2 7811a358c4/ nova/pci/ whitelist. py#L58
dev_spec = jsonutils. loads(jsonspec)
this allows a wider set of inputs then were technically leagal values based on what we intended to support.
one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it.
This has been abused to enable functionality that was never intended to work.
specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks.
the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet.
https:/ /specs. openstack. org/openstack/ nova-specs/ specs/stein/ implemented/ generic- os-vif- offloads. html /specs. openstack. org/openstack/ nova-specs/ specs/stein/ implemented/ vrouter- hw-offloads. html
https:/
after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network.
https:/ /github. com/openstack/ neutron/ commit/ 51758c8c39e48d6 e2d8b8c24b5fa0f faf2861f17# diff-cec155ddf9 c73f4b1685bdd90 3c18abc3853dd36 1b62e196f040994 a9b0e38e7R287- R340
this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it
was never intended to be supported.
to be clear the invlaid input here is the litral value null
passthrough_ whitelist = {"'"address" '":"'"* :'"03:00" '.*"'", "'"physical_ network" '":null} network" : None}
This gets converted to tags: {"physical_
if you instead quote null e.g.
passthrough_ whitelist = {"'"address" '":"'"* :'"03:00" '.*"'", "'"physical_ network" '":"null" }
This gets converted to tags: {"physical_ network" : "null"}
then it will not allow the device to be use with a tunneled network.
which is problematinc and confusign for users.
when {"'"address" '":"'"* :'"03:00" '.*"'", "'"physical_ network" '":null} is stored in the db
it gets storaed as
{"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}
which means both "null" and null are being coalesed into the same value but that also means if we
read this back it will be read as a sting.
This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way.
unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it.
to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future.
docs should also be updated in nova and neutron to refect this.
when strict enforcment is enabled we should treat the presence of null as an error.
i would suggest making this the default form xena on.
in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases.
the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova.
i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent.
a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif.
form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation.
now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind.
https:/ /github. com/openstack/ neutron/ blob/a3dc80b509 d72c8d1a3ea007c b657a9e217ba66a /neutron/ plugins/ ml2/drivers/ mech_sriov/ mech_driver/ mech_driver. py#L89- L90
i have also looked at ml2/ovs and ml2/ovn.
in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs.
in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation.
for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply.
a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs.
its not clear that this api will only ever be used by nic managed by hardware offloaed ovs.
i have another concern that i tought of while writing this up.
the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs.
if we were to bind a PF to a vxlan network via vnic_type= direct- physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in /github. com/openstack/ neutron/ commit/ 269734184645e7e 168a7a6de9352ef 79aae8b6f4
https:/
if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs.
this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this.
i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova.
neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.