Nova/Placement creating x86 trait for ARM Compute node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned |
Bug Description
Description
===========
I have a 2023.2 based deployment with both x84 and aarch64 based compute nodes. For the arm node, placement is showing it having an x86 HW trait, causing scheduling of arm architecture images onto it to fail. It also causes it to try and schedule x86 images onto here, which will fail.
Steps to reproduce
==================
1. I deployed a new 2023.2 deployment with Kolla-ansible.
2. Add hw_architecture
3. Ensure that image_metadata_
4. Try and deploy an instance with that image, it will fail with no valid host found
5. Observe the following in the placement-api logs:
placement-
placement-
Resource providers:
openstack resource provider list
+------
| uuid | name | generation | root_provider_uuid | parent_
+------
| a6aa43fb-
| 2a019b35-
| a008c58b-
| e97340aa-
| 9345e4d0-
| 41611dae-
| 7fecff4c-
| fbd4030a-
| 4d3b29fd-
| f888bda6-
| 4f53c8d0-
| 7b6a42c8-
| 8312a824-
| 9e60caa5-
| cbfef7fd-
| d7efda90-
| 067f20f4-
| 57a098bf-
| 632c23d6-
| 0fe3d535-
| 8f60a0e9-
+------
Traits showing for the arm node (notice no HW_ARCH_AARCH64):
openstack resource provider trait list 57a098bf-
+------
| name |
+------
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_RESCUE_BFV |
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| HW_CPU_X86_AESNI |
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_NODE |
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
| COMPUTE_
+------
Confirmation that it is an arm based system:
root@infra-
Linux infra-prod-
On the startup of the nova-compute instance on this compute, I can see the libvirt output shows as much:
2024-04-18 21:47:43.978 7 INFO nova.service [-] Starting compute node (version 28.0.2)
2024-04-18 21:47:44.000 7 INFO nova.virt.node [None req-58e563b8-
2024-04-18 21:47:44.021 7 INFO nova.virt.
2024-04-18 21:47:44.460 7 INFO nova.virt.
<host>
<uuid>
<cpu>
<
<
<
<topology sockets='1' dies='1' cores='128' threads='1'/>
nova.conf for the nova-compute service on that node:
[DEFAULT]
debug = False
log_dir = /var/log/kolla/nova
state_path = /var/lib/nova
allow_resize_
compute_driver = libvirt.
my_ip = <ip>
transport_url = rabbit://<url>
default_
[conductor]
workers = 5
[vnc]
novncproxy_host = <ip>
novncproxy_port = 6080
server_listen = <ip>
server_
novncproxy_base_url = https:/
[serial_console]
enabled = true
base_url = wss://example.
serialproxy_host = <ip>
serialproxy_port = 6083
proxyclient_address = <ip>
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
[glance]
debug = False
api_servers = http://<ip>:9292
cafile =
num_retries = 3
[cinder]
catalog_info = volumev3:
os_region_name = RegionOne
auth_url = http://<ip>:5000
auth_type = password
project_domain_name = Default
user_domain_id = default
project_name = service
username = cinder
password = <pw>
cafile =
[neutron]
metadata_
service_
auth_url = http://<ip>:5000
auth_type = password
cafile =
project_domain_name = Default
user_domain_id = default
project_name = service
username = neutron
password = <pw>
region_name = Westford
valid_interfaces = RegionOne
[libvirt]
connection_uri = qemu+tcp:
live_migration_
images_type = rbd
images_rbd_pool = vms
images_
rbd_user = cinder
disk_cachemodes = network=writeback
hw_disk_discard = unmap
rbd_secret_uuid = 48d56060-
virt_type = kvm
cpu_mode = host-passthrough
num_pcie_ports = 16
[workarounds]
skip_cpu_
[upgrade_levels]
compute = auto
[oslo_messaging
transport_url = rabbit://<url>
driver = messagingv2
topics = notifications_
[oslo_messaging
heartbeat_
amqp_durable_queues = true
[privsep_
helper_command = sudo nova-rootwrap /etc/nova/
[guestfs]
debug = False
[placement]
auth_type = password
auth_url = http://<ip>:5000
username = placement
password = <pw>
user_domain_name = Default
project_name = service
project_domain_name = Default
region_name = RegionOne
cafile =
valid_interfaces = internal
[notifications]
notify_
[barbican]
auth_endpoint = http://<ip>:5000
barbican_
verify_ssl_path =
[service_user]
send_service_
auth_url = http://<ip>:5000
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = nova
password = <pw>
cafile =
region_name = RegionOne
valid_interfaces = internal
[scheduler]
image_metadata_
I have tried to run openstack resource provider trait delete 57a098bf-
summary: |
- Placement creating x86 trait for ARM Compute node + Nova/Placement creating x86 trait for ARM Compute node |
I read the series of changes which implemented the architecture selection feature, but the changes do not include one to make nova-compute report the available HW_ARCH trait. So IIUC you shave to add the trait manually to tag compute nodes which support specific CPU arch.
https:/ /review. opendev. org/q/topic: %22bp/pick- guest-arch- based-on- host-arch- in-libvirt- driver% 22
The HW_CPU_X86_AESNI trait is added because the libvirt driver detects aes cpu feature flag in the result of domain capabilities API. We should probably change the architecture in the trait according to the supported cpu architecture ideally.