Activity log for bug #1999814

Date Who What changed Old value New value Message
2022-12-15 18:17:24 Paul Goins bug added bug
2022-12-16 17:21:20 Paul Goins attachment added "virsh capabilities" from an Cisco M5 blade https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5636013/+files/m5_capabilities.xml
2022-12-16 17:21:47 Paul Goins attachment added "virsh domcapabilities" from an Cisco M5 blade https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5636014/+files/m5_domcapabilities.xml
2022-12-16 17:22:07 Paul Goins attachment added "virsh capabilities" from an Cisco M6 blade https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5636015/+files/m6_capabilities.xml
2022-12-16 17:22:29 Paul Goins attachment added "virsh domcapabilities" from an Cisco M6 blade https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5636016/+files/m6_domcapabilities.xml
2023-02-13 16:37:20 Artom Lifshitz nova: status New Incomplete
2023-04-15 04:17:34 Launchpad Janitor nova: status Incomplete Expired
2024-07-02 13:51:59 Rodrigo Barbieri summary Allow for specifying common baseline CPU model with disabled feature [SRU] Allow for specifying common baseline CPU model with disabled feature
2024-07-02 14:46:58 Rodrigo Barbieri description Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server-noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "<feature name='mpx'/>" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server-noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server-noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "<feature name='mpx'/>" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server-noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in the lab for testing of this specific scenario, the test case for this will be run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled. The behavior changed is the one assumed by the default disabled value of the config option, and is not (in theory) intended to be the code path that addresses the bug. If we had the capability of testing the bug and fix in the lab, we could minimize risk by just introducing the config option and no further changes. On the other hand, the fact that the code being backported in Yoga is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969
2024-07-03 11:07:09 Rodrigo Barbieri bug task added nova (Ubuntu)
2024-07-03 11:09:24 Rodrigo Barbieri nominated for series Ubuntu Bionic
2024-07-03 11:09:24 Rodrigo Barbieri bug task added nova (Ubuntu Bionic)
2024-07-03 11:09:24 Rodrigo Barbieri nominated for series Ubuntu Focal
2024-07-03 11:09:24 Rodrigo Barbieri bug task added nova (Ubuntu Focal)
2024-07-03 11:13:46 Rodrigo Barbieri nominated for series nova/victoria
2024-07-03 11:13:46 Rodrigo Barbieri bug task added nova/victoria
2024-07-03 11:13:46 Rodrigo Barbieri nominated for series nova/ussuri
2024-07-03 11:13:46 Rodrigo Barbieri bug task added nova/ussuri
2024-07-03 11:13:46 Rodrigo Barbieri nominated for series nova/xena
2024-07-03 11:13:46 Rodrigo Barbieri bug task added nova/xena
2024-07-03 11:13:46 Rodrigo Barbieri nominated for series nova/yoga
2024-07-03 11:13:46 Rodrigo Barbieri bug task added nova/yoga
2024-07-03 11:13:46 Rodrigo Barbieri nominated for series nova/wallaby
2024-07-03 11:13:46 Rodrigo Barbieri bug task added nova/wallaby
2024-07-03 11:14:21 Rodrigo Barbieri tags sts sts-sru-needed
2024-07-04 20:14:37 Mauricio Faria de Oliveira nominated for series Ubuntu Jammy
2024-07-04 20:14:37 Mauricio Faria de Oliveira bug task added nova (Ubuntu Jammy)
2024-07-04 20:15:19 Mauricio Faria de Oliveira nova (Ubuntu): status New Fix Released
2024-07-04 20:24:00 Mauricio Faria de Oliveira nova (Ubuntu Bionic): status New Won't Fix
2024-07-04 20:24:56 Mauricio Faria de Oliveira bug added subscriber Mauricio Faria de Oliveira
2024-07-09 19:57:33 Rodrigo Barbieri description ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server-noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "<feature name='mpx'/>" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server-noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in the lab for testing of this specific scenario, the test case for this will be run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled. The behavior changed is the one assumed by the default disabled value of the config option, and is not (in theory) intended to be the code path that addresses the bug. If we had the capability of testing the bug and fix in the lab, we could minimize risk by just introducing the config option and no further changes. On the other hand, the fact that the code being backported in Yoga is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969 ******** SRU TEMPLATE AT THE BOTTOM ******* Hello, This is very similar to pad.lv/1852437 (and the related blueprint at https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags), but there is a very different and important nuance. A customer I'm working with has two classes of blades that they're trying to use. Their existing ones are Cascade Lake-based; they are presently using the Cascadelake-Server-noTSX CPU model via libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based, which is a newer processor, which typically would also be able to run based on the Cascade Lake feature set - except that these Ice Lake processors lack the MPX feature defined in the Cascadelake-Server-noTSX model. The result of this is evident when I try to start nova on the new blades with the Ice Lake CPUs. Even if I specify the following in my nova.conf: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx That is not enough to allow Nova to start; it fails in the libvirt driver in the _check_cpu_compatibility function: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u}) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last): 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility() 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg) 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult 2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service If I make a custom libvirt CPU map file which removes the "<feature name='mpx'/>" feature and specify that as the cpu_model instead, I am able to make Nova start - so it does indeed seem to specifically be that single feature which is blocking me. However, editing the libvirt CPU mapping files is probably not the right way to fix this - hence why I'm filing this bug, for discussion of how to support cases like this. Currently the only "proper" way I'm aware of to work around this right now is to fall back to a Broadwell-based configuration which lacks the "mpx" feature to use as a common baseline, but that's a much older configuration than Cascade Lake and would mean missing out on all the other features which are common in both Cascade Lake and Ice Lake. I would rather if there were a way to use the Cascade Lake settings but simply remove that "mpx" feature from use. ---- Steps to reproduce ================== On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following settings in nova.conf in libvirt settings: [libvirt] cpu_mode = custom cpu_model = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx Then try to start nova. Expected result =============== Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server-noTSX as a common baseline model for both Cascade Lake and Ice Lake servers. Actual result ============= Nova refuses to start, claiming the specified CPU model is incompatible. The "cpu_model_extra_flags = -mpx" config option does not help. Environment =========== Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal. Specifically, nova packages are at version 2:21.2.4-0ubuntu2. Hypervisor: libvirt + KVM Other relevant notes ==================== There are some other open related bugs. The removal of the MPX feature in some Ice Lake processors has manifested in other ways as well. These bugs are primarily in regards to the missing MPX feature breaking how Ice Lake processors are detected, so the nuance is somewhat different - however, they may be worth reviewing as well. * https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the Icelake CPU maps in libvirt not working to detect certain Ice Lakes, instead detecting them as Broadwell-noTSX-IBRS according to "virsh capabilities" due to lacking the MPX feature. (I've personally tested that removing the mpx feature from the associated CPU mapping files allows for detecting as Ice Lake, but that's not the correct way to fix this.) There is also an interesting comment on this bug at https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It basically implies that rather than looking at "virsh capabilities", "virsh domcapabilities" should be used instead as it seems to more correctly identify the CPU model even if there are disabled flags like MPX. * https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064: Launchpad-side bug regarding the above issue as encountered in Ubuntu. =============== SRU Description =============== [Impact] When using IceLake CPUs alongside CascadeLake CPUs, the Nova code does not start due to comparing CPU models. It fails before even comparing the flags. Unfortunately, IceLake CPUs are detected as having compatibility with Broadwell, not CascadeLake. Using Broadwell as a common denominator disables many modern features. The Libvirt upstream team will not add specific support to IceLake [1]. The fix [2] in Nova is to ignore CPU check (as a configurable workaround) as let libvirt handle the added/removed flags, which is assumed to work for this specific case. [Test case] Due to not having Icelake and Cascadelake CPUs in our usual lab for testing of this specific scenario, the test case for this could be either: 1) run for this SRU is running the charmed-openstack-tester [1] against the environment containing the upgraded package (essentially as it would be in a point release SRU) and expect the test to pass. Test run evidence will be attached to LP. 2) manually deploy nova and the necessary openstack services to get Nova to the code point of validating the issue in a single node. I already achieved this and was able to test the fix by hacking the node code to bypass the need of other services (conductor, keystone, mysql, etc) but for a proper validation a clean installation (without any hackery) is considered mandatory. In such case, the test case would be: a) Deploy nova and required services in an IceLake machine b) Make sure the nova.conf has: cpu_mode = custom cpu_models = Cascadelake-Server-noTSX cpu_model_extra_flags = -mpx c) Check /var/log/nova/nova-compute.log for a successful nova-compute service boot. It will not start properly without the fix, therefore presenting the error: 2024-07-08 15:08:48.378 8399 CRITICAL nova [-] Unhandled error: nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility. d) Install package containing the fix and confirm the successful nova-compute service restart, not containing the error and containing this instead: 2024-07-09 19:41:31.806 243487 DEBUG nova.virt.libvirt.driver [-] cpu compare xml: <cpu match="exact"> <model>Cascadelake-Server-noTSX</model> <feature name="mpx" policy="disable"/> </cpu> [Regression Potential] There is 1 new behavior introduced and 1 changed. The behavior introduced is gated by a new config option that needs to be enabled, and when enabled, it skips running the code. The behavior changed is the one assumed by the default disabled value of the config option. The fact that the code being backported in Yoga-Ussuri is exactly the same as in currently Master (Caracal+), it means that no issues have been found with the code across 4 releases, giving some confidence that the code changed is unlikely to cause issues. [Other Info] [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064 [2] https://review.opendev.org/c/openstack/nova/+/871969
2024-07-09 19:57:58 Rodrigo Barbieri nova/victoria: status New Won't Fix
2024-07-09 19:58:14 Rodrigo Barbieri nova/wallaby: status New Won't Fix
2024-07-09 19:58:27 Rodrigo Barbieri nova/xena: status New Won't Fix
2024-07-10 13:48:10 Rodrigo Barbieri attachment added lp1999814_jammy.debdiff https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5795868/+files/lp1999814_jammy.debdiff
2024-07-10 15:17:54 Rodrigo Barbieri attachment added lp1999814_focal.debdiff https://bugs.launchpad.net/nova/+bug/1999814/+attachment/5795887/+files/lp1999814_focal.debdiff
2024-07-11 16:28:30 Mauricio Faria de Oliveira nova (Ubuntu Jammy): status New In Progress
2024-07-11 22:11:19 Andreas Hasenack nova (Ubuntu Jammy): status In Progress Fix Committed
2024-07-11 22:11:24 Andreas Hasenack bug added subscriber Ubuntu Stable Release Updates Team
2024-07-11 22:11:30 Andreas Hasenack bug added subscriber SRU Verification
2024-07-11 22:11:40 Andreas Hasenack tags sts sts-sru-needed sts sts-sru-needed verification-needed verification-needed-jammy
2024-07-12 12:25:29 Mauricio Faria de Oliveira nova (Ubuntu Jammy): importance Undecided Medium
2024-07-12 12:25:29 Mauricio Faria de Oliveira nova (Ubuntu Jammy): assignee Rodrigo Barbieri (rodrigo-barbieri2010)
2024-07-12 13:28:47 Mauricio Faria de Oliveira nova (Ubuntu Focal): importance Undecided Medium
2024-07-12 13:28:47 Mauricio Faria de Oliveira nova (Ubuntu Focal): status New In Progress
2024-07-12 13:28:47 Mauricio Faria de Oliveira nova (Ubuntu Focal): assignee Rodrigo Barbieri (rodrigo-barbieri2010)