[Victoria] nova-compute won't start on aarch64 - raises PciDeviceNotFoundById
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
sean mooney | ||
Victoria |
Triaged
|
Medium
|
Unassigned |
Bug Description
Description
===========
When deploying OpenStack Victoria on Ubuntu 20.04 (Focal) on arm64/aarch64, nova-compute 22.0.1 fails to start with (nova-compute.log):
----------
Traceback (most recent call last):
File "/usr/lib/
dev_info = os.listdir(
FileNotFoundError: [Errno 2] No such file or directory: '/sys/bus/
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/
self.
File "/usr/lib/
resources = self.driver.
File "/usr/lib/
data[
File "/usr/lib/
pci_info = [self._
File "/usr/lib/
pci_info = [self._
File "/usr/lib/
device.
File "/usr/lib/
parent_ifname = pci_utils.
File "/usr/lib/
raise exception.
nova.exception.
----------
This results in an empty `openstack hypervisor list`.
This does not happen with OpenStack Ussuri (nova-compute 21.1.0). We also haven't seen this on other architectures (yet?). This code actually appeared between Ussuri and Victoria, [0] i.e. the first version having it is 22.0.0.
$ lspci | grep 0002:01:00.1
0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
Indeed /sys/bus/
A similar issue in the past [1] shows that this might be an issue specific to the Cavium Thunder X NIC.
Related issue: [2]
Steps to reproduce
==================
Install and run nova >= 22.0.0 on an aarch64 machine (with a Cavium Thunder X NIC if possible). I personally use Juju [3] for deploying an entire OpenStack Victoria setup to a lab:
$ git clone https:/
$ cd openstack-
$ juju deploy ./bundle.yaml
Expected result
===============
`openstack hypervisor list` shows at least one hypervisor.
nova-compute.log doesn't contain nova.exception.
Actual result
=============
`openstack hypervisor list` doesn't show any hypervisor.
nova-compute.log contains nova.exception.
Environment
===========
$ dpkg -l | grep nova
ii nova-api-metadata 2:22.0.
ii nova-common 2:22.0.
ii nova-compute 2:22.0.
ii nova-compute-kvm 2:22.0.
ii nova-compute-
ii python3-nova 2:22.0.
ii python3-novaclient 2:17.2.
# cat /etc/nova/
[DEFAULT]
compute_
[libvirt]
virt_type=kvm
$ dpkg -l | grep libvirt
ii libvirt-clients 6.0.0-0ubuntu8.5 arm64 Programs for the libvirt library
ii libvirt-daemon 6.0.0-0ubuntu8.5 arm64 Virtualization daemon
ii libvirt-
ii libvirt-
ii libvirt-
ii libvirt-
ii libvirt0:arm64 6.0.0-0ubuntu8.5 arm64 library for interfacing with different virtualization systems
ii nova-compute-
ii python3-libvirt 6.1.0-1 arm64 libvirt Python 3 bindings
This shouldn't be relevant but:
* Ceph 15.2.7 for storage
* Neutron with OVN
Logs & Configs
==============
sosreport attached.
[0] https:/
[1] https:/
[2] https:/
[3] https:/
tags: | added: aarch64 pci |
tags: | added: compute |
Looking at the diff between stable/ussuri and stable/victoria I found this patch that seems pretty suspicious https:/ /review. opendev. org/c/openstack /nova/+ /739131 Could you try to revert this patch locally in your environment to see if that solves your problem? I let the author of that patch know about this bug report.