Inventories of SR-IOV GPU VFs are impacted by allocations for other VFs

Bug #2041519 reported by Sylvain Bauza
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Unassigned

Bug Description

This is hard to summarize the problem in a bug report title, my bad.

Long story short, the case arrives if you start using nVidia SR-IOV next-gen GPUs like A100 which create Virtual Functions on the host, each of them supporting the same GPU types but with a specific amount of available mediated devices to be created equal to 1.
If you're using other GPUs (like V100) and you're not running nvidia's sriov-manage to expose the VFs, please nevermind this bug, you shall not be impacted.

So, say you have a A100 GPU card, before configuring Nova, you have to run the aforementioned sriov-manage script which will allocate 16 virtual functions for the GPU. Each of those PCI adddresses will correspond to a Placement resource provider (if you configure Nova so) with an VGPU inventory with total=1.

Example :
https://paste.opendev.org/show/bVxrVLW3yOR3TPV2Lz3A/

Sysfs shows the exact same thing on the nvidia-472 type I configured for :
[stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Now, the problem arises when you're exhausting the number of mediated devices you can create.
In the case of nvidia-472, which corresponds to nvidia's GRID A100-20C, you can create up to 2 VGPUs, ie. mediated devices.

Accordingly, when Nova creates the 2 mediated devices automatically when booting an instance, and if *no* mediated devices are found available yet, then *all other* VFs that don't use those 2 mediated devices will have their available_instances value equal to 0 :

[stack@lenovo-sr655-01 nova]$ openstack server create --image cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm1
(skipped)
[stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
[stack@lenovo-sr655-01 nova]$ openstack server create --image cirros-0.6.2-x86_64-disk --flavor c1g --key-name mykey --network public vm2
(skipped)
[stack@lenovo-sr655-01 ~]$ cat /sys/class/mdev_bus/*/mdev_supported_types/nvidia-472/available_instances
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

No, when we look at the inventories for all VFs, we see that while it's normal to see 2 Resource Providers having their total to 1 (since we created a mdev, it's counted) and their usage to 1, that said it's not normal to see *other VFs* having a total of 1 and an usage of 0.

[stack@lenovo-sr655-01 nova]$ for uuid in $(openstack resource provider list -f value -c uuid); do openstack resource provider inventory list $uuid -f value -c resource_class -c total -c used; done | grep VGPU
VGPU 1 1
VGPU 1 1
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0
VGPU 1 0

I eventually went down into the code and found the culprit :

https://github.com/openstack/nova/blob/9c9cd3d9b6d1d1e6f62012cd8a86fd588fb74dc2/nova/virt/libvirt/driver.py#L9110-L9111

Before this method is called, we correctly calculate the numbers that we get from libvirt, and all the non-used VFs have their total=0, but since we enter this conditional, we skip to update them.

There are different ways to solve this problem :
 - we stop automatically creating mediated devices and ask operators to pre-allocate all mediated devices before starting nova-compute but there is a big operator impact (and they need to add some tooling)
 - we blindly remove the RP from the PlacementTree and let update_resource_providers() call in compute manager to try to update Placement with this new view. In that very particular case, we're sure that none of the RPs that have total=0 have allocations against them, so it shouldn't fail but this logic can be errorprone if we try to reproduce it elsewhere.

Tags: vgpu
tags: added: vgpu
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/899625

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.