Intel mediated device info doesn't provide a name attribute

Bug #1896741 reported by Sylvain Bauza on 2020-09-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Sylvain Bauza
Train
Low
Sylvain Bauza
Ussuri
Low
Sylvain Bauza
Victoria
Low
Sylvain Bauza

Bug Description

When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types :

Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last):
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager raise value
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1169, in _update_to_placement
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7857, in update_provider_tree
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager provider_tree, nodename, allocations=allocations)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8250, in _update_provider_tree_for_vgpu
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager inventories_dict = self._get_gpu_inventories()
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7028, in _get_gpu_inventories
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager count_per_dev = self._count_mdev_capable_devices(enabled_vgpu_types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6984, in _count_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager types=enabled_vgpu_types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7268, in _get_mdev_capable_devices
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager device = self._get_mdev_capabilities_for_dev(name, types)
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7253, in _get_mdev_capabilities_for_dev
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager 'name': cap['name'],
Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager KeyError: 'name'

For example :

[root@mymachine ~]# ll /sys/class/mdev_bus/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V5_8/
total 0
-r--r--r--. 1 root root 4096 Sep 22 14:18 available_instances
--w-------. 1 root root 4096 Sep 23 06:01 create
-r--r--r--. 1 root root 4096 Sep 23 05:43 description
-r--r--r--. 1 root root 4096 Sep 22 14:18 device_api
drwxr-xr-x. 2 root root 0 Sep 23 06:01 devices

When looking at the kernel driver API documentation https://www.kernel.org/doc/html/latest/driver-api/vfio-mediated-device.html it says that the "name" attribute is optional:

"name

This attribute should show human readable name. This is optional attribute."

The fix should be easy, we don't use this attribute in Nova.

Fix proposed to branch: master
Review: https://review.opendev.org/753574

Changed in nova:
status: Triaged → In Progress

Reviewed: https://review.opendev.org/753574
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=416cd1ab18180fc09b915f4517aca03651f01eea
Submitter: Zuul
Branch: master

commit 416cd1ab18180fc09b915f4517aca03651f01eea
Author: Sylvain Bauza <email address hidden>
Date: Wed Sep 23 12:52:52 2020 +0200

    libvirt: make mdev types name attribute be optional

    Some GPU drivers like i915 don't provide a name attribute for mdev types.
    As we don't use this attribute yet, let's just make sure we support the fact
    it's optional.

    Change-Id: Ia745ed7095c74e2bfba38379e623a3f81e7799eb
    Closes-Bug: #1896741

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers