Activity log for bug #1915282

Date Who What changed Old value New value Message
2021-02-10 15:13:38 sean mooney bug added bug
2021-02-10 15:14:35 sean mooney bug added subscriber Balazs Gibizer
2021-02-10 15:15:27 sean mooney bug added subscriber Rodolfo Alonso
2021-02-10 15:27:45 Rodolfo Alonso bug task added neutron
2021-02-10 15:57:33 Jeremy Stanley description the nova pci whitelist has evovled and gained may capablities since it was first intodcued. due to the complexity of the value we support we do not parse it using oslo.config to do typed parseing of all values. the whitelist is a multiopt string field https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173 which is then parsed by oslo_serialization.jsonutiles.loads https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58 dev_spec = jsonutils.loads(jsonspec) this allows a wider set of inputs then were technically leagal values based on what we intended to support. one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it. This has been abused to enable functionality that was never intended to work. specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks. the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network. https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340 this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it was never intended to be supported. to be clear the invlaid input here is the litral value null passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} This gets converted to tags: {"physical_network": None} if you instead quote null e.g. passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"} This gets converted to tags: {"physical_network": "null"} then it will not allow the device to be use with a tunneled network. which is problematinc and confusign for users. when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db it gets storaed as {"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7} which means both "null" and null are being coalesed into the same value but that also means if we read this back it will be read as a sting. This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way. unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it. to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future. docs should also be updated in nova and neutron to refect this. when strict enforcment is enabled we should treat the presence of null as an error. i would suggest making this the default form xena on. in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases. the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova. i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent. a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif. form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation. now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind. https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90 i have also looked at ml2/ovs and ml2/ovn. in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs. in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation. for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply. a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs. its not clear that this api will only ever be used by nic managed by hardware offloaed ovs. i have another concern that i tought of while writing this up. the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs. if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4 if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs. this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this. i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova. neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough. This issue is being treated as a potential security risk under embargo. Please do not make any public mention of embargoed (private) security vulnerabilities before their coordinated publication by the OpenStack Vulnerability Management Team in the form of an official OpenStack Security Advisory. This includes discussion of the bug or associated fixes in public forums such as mailing lists, code review systems and bug trackers. Please also avoid private disclosure to other individuals not already approved for access to this information, and provide this same reminder to those who are made aware of the issue prior to publication. All discussion should remain confined to this private bug report, and any proposed fixes should be added to the bug as attachments. This embargo shall not extend past 2021-05-11 and will be made public by or on that date even if no fix is identified. the nova pci whitelist has evovled and gained may capablities since it was first intodcued. due to the complexity of the value we support we do not parse it using oslo.config to do typed parseing of all values. the whitelist is a multiopt string field https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173 which is then parsed by oslo_serialization.jsonutiles.loads https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58  dev_spec = jsonutils.loads(jsonspec) this allows a wider set of inputs then were technically leagal values based on what we intended to support. one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it. This has been abused to enable functionality that was never intended to work. specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks. the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network. https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340 this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it was never intended to be supported. to be clear the invlaid input here is the litral value null passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} This gets converted to tags: {"physical_network": None} if you instead quote null e.g. passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"} This gets converted to tags: {"physical_network": "null"} then it will not allow the device to be use with a tunneled network. which is problematinc and confusign for users. when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db it gets storaed as {"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}  which means both "null" and null are being coalesed into the same value but that also means if we read this back it will be read as a sting. This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way. unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it. to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future. docs should also be updated in nova and neutron to refect this. when strict enforcment is enabled we should treat the presence of null as an error. i would suggest making this the default form xena on. in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases. the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova. i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent. a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif. form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation. now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind. https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90 i have also looked at ml2/ovs and ml2/ovn. in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs. in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation. for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply. a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs. its not clear that this api will only ever be used by nic managed by hardware offloaed ovs. i have another concern that i tought of while writing this up. the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs. if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4 if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs. this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this. i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova. neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.
2021-02-10 15:57:50 Jeremy Stanley bug task added ossa
2021-02-10 15:58:05 Jeremy Stanley ossa: status New Incomplete
2021-02-10 15:58:19 Jeremy Stanley bug added subscriber Nova Core security contacts
2021-02-10 15:58:31 Jeremy Stanley bug added subscriber Neutron Core Security reviewers
2021-02-10 16:59:27 melanie witt bug added subscriber melanie witt
2021-02-10 19:26:26 sean mooney bug added subscriber Lee Yarwood
2021-03-25 13:28:09 Jeremy Stanley description This issue is being treated as a potential security risk under embargo. Please do not make any public mention of embargoed (private) security vulnerabilities before their coordinated publication by the OpenStack Vulnerability Management Team in the form of an official OpenStack Security Advisory. This includes discussion of the bug or associated fixes in public forums such as mailing lists, code review systems and bug trackers. Please also avoid private disclosure to other individuals not already approved for access to this information, and provide this same reminder to those who are made aware of the issue prior to publication. All discussion should remain confined to this private bug report, and any proposed fixes should be added to the bug as attachments. This embargo shall not extend past 2021-05-11 and will be made public by or on that date even if no fix is identified. the nova pci whitelist has evovled and gained may capablities since it was first intodcued. due to the complexity of the value we support we do not parse it using oslo.config to do typed parseing of all values. the whitelist is a multiopt string field https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173 which is then parsed by oslo_serialization.jsonutiles.loads https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58  dev_spec = jsonutils.loads(jsonspec) this allows a wider set of inputs then were technically leagal values based on what we intended to support. one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it. This has been abused to enable functionality that was never intended to work. specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks. the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network. https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340 this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it was never intended to be supported. to be clear the invlaid input here is the litral value null passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} This gets converted to tags: {"physical_network": None} if you instead quote null e.g. passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"} This gets converted to tags: {"physical_network": "null"} then it will not allow the device to be use with a tunneled network. which is problematinc and confusign for users. when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db it gets storaed as {"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}  which means both "null" and null are being coalesed into the same value but that also means if we read this back it will be read as a sting. This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way. unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it. to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future. docs should also be updated in nova and neutron to refect this. when strict enforcment is enabled we should treat the presence of null as an error. i would suggest making this the default form xena on. in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases. the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova. i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent. a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif. form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation. now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind. https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90 i have also looked at ml2/ovs and ml2/ovn. in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs. in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation. for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply. a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs. its not clear that this api will only ever be used by nic managed by hardware offloaed ovs. i have another concern that i tought of while writing this up. the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs. if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4 if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs. this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this. i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova. neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough. the nova pci whitelist has evovled and gained may capablities since it was first intodcued. due to the complexity of the value we support we do not parse it using oslo.config to do typed parseing of all values. the whitelist is a multiopt string field https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/conf/pci.py#L88-L173 which is then parsed by oslo_serialization.jsonutiles.loads https://github.com/openstack/nova/blob/56ac9b22cfa71daaf7a452fda63bd27811a358c4/nova/pci/whitelist.py#L58  dev_spec = jsonutils.loads(jsonspec) this allows a wider set of inputs then were technically leagal values based on what we intended to support. one exmaple of this is that the physical_network tag which is expect to be a non empty string can be parsed as any leagal json value as we do not validate it. This has been abused to enable functionality that was never intended to work. specificaly when hardware offload ovs was enabled with nova vxlan and other tunneled networks was intentionally not suppoted. at the time that this feature was added novas neutron sriov port capablitiey only support vlan and flat networks. the hardware offloaded ovs spec did not extend the pci tracker or nova in general to supprot allocation of nics to non provider networks that dont have a physnet. https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/generic-os-vif-offloads.html https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/vrouter-hw-offloads.html after that feature was implemented when the documentation was written in neutron it was discovered that our pasrcing logic could be abused to add the support for tunneled networks by passing invlaid values to physical_network. https://github.com/openstack/neutron/commit/51758c8c39e48d6e2d8b8c24b5fa0ffaf2861f17#diff-cec155ddf9c73f4b1685bdd903c18abc3853dd361b62e196f040994a9b0e38e7R287-R340 this has no testing upstream in the melonox third party ci as that only tests sriov and vlan networks it is not testing with vxlan. we also have no test coverage for this in nova because it was never intended to be supported. to be clear the invlaid input here is the litral value null passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} This gets converted to tags: {"physical_network": None} if you instead quote null e.g. passthrough_whitelist = {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":"null"} This gets converted to tags: {"physical_network": "null"} then it will not allow the device to be use with a tunneled network. which is problematinc and confusign for users. when {"'"address"'":"'"*:'"03:00"'.*"'","'"physical_network"'":null} is stored in the db it gets storaed as {"dev_type": "type-VF", "physical_network": "null", "parent_ifname": "eth1"}, "count": 7}  which means both "null" and null are being coalesed into the same value but that also means if we read this back it will be read as a sting. This means the behavior between the pci_tracker code on the compute node and the pci passthough filters which operates on teh data in the host sate object which is build form the db will not be the same. i have not determined how the pcifilter can pass host with this data so i asume either there is some other bug that is enableing that or it is not enabled when it is used this way. unfortuanetly i have been told that we have customer that have deployed or are plannig to deploy hardware offloaded ovs utilising this unintended feature. as such whilt this is a bug we have to condier existing missue use of it and upgrade impact in any action we take to resolve it. to that end im proposing that we start by adding basic unit and funcitonal tests to assert the current behavior. next we should add code to block null form being used with a workaround config option to enable the strict parsing. when the strict parsing is enabled we shoudl emmit a warning on startup indcating that use of null is experimental and may be subject to change in the future. docs should also be updated in nova and neutron to refect this. when strict enforcment is enabled we should treat the presence of null as an error. i would suggest making this the default form xena on. in the medium to long term (xena release and beyond) we should discuss how to support this as a fully supported feature in nova includeing how to model it in placment when we model pci device there and any other interaction it might have e.g. numa aware vsiwtchs or other edgecases. the intent here is not to entrily block the unsupproted behvior since it may be used in production today but to warn that it was unintended and to come up with a plan to support this in the future upstream in nova. i have checked a number of different ways to abuse this to create a security issue and so far have not been able to come up with one more out of luck then intent. a possible way this would be problematic woudl be if we bound a port using the sriov nic agent to a vxlan network, since the port has no vlan assocaitated with it libvirt/nova would not program vlan isolation on the vif. form a nic point of view this would be identically to a port on the flat network meaning we would have no tenant isolation. now this is not going to happen with the sriov nic agent ml2 driver because it filters the neutron segmentation types it can bind. https://github.com/openstack/neutron/blob/a3dc80b509d72c8d1a3ea007cb657a9e217ba66a/neutron/plugins/ml2/drivers/mech_sriov/mech_driver/mech_driver.py#L89-L90 i have also looked at ml2/ovs and ml2/ovn. in the case of ml2/ovs we only support sriov vnic types in use with hardware offloaed ovs. in this case traffic will either be offloaded to the nic if all feature(including vxlan encapusltaiton) are supported by the nic or will fall back to starndard kernel ovs data path if not. as a result this shoudl still enforce tenant isolation. for ml2/ovn sriov interface can also be managed via hardware offloaded ovs and the same fall-back shoudl apply. a caveat for both ml2/ovs and ml2/ovn is that currently they will only bind sriov device if the device support the swtichdev api. that is the api used by the tc_flower offload api used by ovs. its not clear that this api will only ever be used by nic managed by hardware offloaed ovs. i have another concern that i tought of while writing this up. the assumtion that the ovs fallback would be sufficient to ensure tenant isolation is based on the idea that the devices that are used are VFs. if we were to bind a PF to a vxlan network via vnic_type=direct-physical then im not conviced we woudl have isolation. ml2/ovn added suport for direct_physical in https://github.com/openstack/neutron/commit/269734184645e7e168a7a6de9352ef79aae8b6f4 if the pf has the swtichdev api enabled and is bound by the ovn driver i do not think there would be any isolation enforce as i do not think it can be managed by ovs. this is why i have marked this as a possibel security bug. that and the fact that other things might be broken that im unaware of so i want to be very careful with this. i am going to add neturon to this after its filed and add rodolfo from the neutron core team as one fo the experts in this area for input as they are already aware of this issue. also including gibi who is the current nova ptl and has similar nova/neutron but is not in the sec group by defualt for nova. neutron support direct physical port with ovn because they want to eventually support standard sriov without hardare offloaed ovs with ovn instead of the sriov nic agent using ovns external ports feature. eventually that would requrie the current switchdev api check to be removed or made configureable but even today i think there is an edgecase if the swtichdev api is enabled on a pf that is passthough.
2021-03-25 13:28:21 Jeremy Stanley ossa: status Incomplete Won't Fix
2021-03-25 13:28:29 Jeremy Stanley information type Private Security Public
2021-03-25 13:28:46 Jeremy Stanley tags neutron ovn ovs pci neutron ovn ovs pci security
2021-05-25 16:04:15 sean mooney nova: status New Triaged
2021-05-25 16:04:24 sean mooney nova: importance Undecided High