nova fails to re-create mediated devices after reboot

Bug #1977933 reported by James Page
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute NVIDIA vGPU Plugin Charm
New
Undecided
Unassigned
nova (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

OpenStack Xena
Ubuntu 20.04

After a reboot of a nova-compute node with running instances with attached vgpu devices the nova-compute daemon fails to startup due to missing mediated device definitions.

It looks like the code intends to detect the missing devices and then re-create them but the libvirt python module throws an exception due to the missing mediated device when the domain definition is being inspected.

2022-06-08 07:24:27.061 2689 ERROR oslo_service.service [-] Error starting thread.: libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service Traceback (most recent call last):
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 806, in run_service
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service service.start()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 159, in start
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self.manager.init_host()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1416, in init_host
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self.driver.init_host(host=self.host)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 800, in init_host
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service self._recreate_assigned_mediated_devices()
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 980, in _recreate_assigned_mediated_devices
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service dev_info = self._get_mediated_device_information(dev_name)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7761, in _get_mediated_device_information
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service virtdev = self._host.device_lookup_by_name(devname)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/host.py", line 1216, in device_lookup_by_name
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service return self.get_connection().nodeDeviceLookupByName(name)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service result = proxy_call(self._autowrap, f, *args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service rv = execute(f, *args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service six.reraise(c, e, tb)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service raise value
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service rv = meth(*args, **kwargs)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/libvirt.py", line 4612, in nodeDeviceLookupByName
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service libvirt.libvirtError: Node device not found: no node device with matching name 'mdev_9a95927e_f50a_4e34_84fc_3b27508f4241'
2022-06-08 07:24:27.061 2689 ERROR oslo_service.service

James Page (james-page)
description: updated
description: updated
James Page (james-page)
summary: - mediated devices missing after reboot
+ nova fails to re-create mediated devices after reboot
description: updated
Revision history for this message
James Page (james-page) wrote :

This is probably related to the libvirt version in use (6.0.0 from focal).

7.3 and upward have features to support persistence of mediated devices between reboots.

Revision history for this message
Billy Olsen (billy-olsen) wrote :
Revision history for this message
James Page (james-page) wrote :

tested with 8.0.0 from yoga but same behaviour observed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.