OpenStack Compute (nova)

Bug #1981631
Comment #3

Comment 3 for bug 1981631

Revision history for this message

Dylan McCulloch (dylan-mcculloch) wrote on 2022-07-29:

I think I've run into something similar with A100 GPUs on Victoria. We're running libvirt 6.0.0 so I don't think it's due to a regression but I may be wrong.
In our case I think it's due to the way the A100s have their inventory presented to placement. There appears to be a mismatch between the vgpu capacity of the host when certain vgpu types are selected and the number of resource providers that are created in placement which correspond to each GPU PCI Bus:Device.Function address.
e.g. We have a host with 2 x A100s. If we configure nova to enable the nvidia-471 (A100-10C) vgpu_type we can use 4 VGPUs per physical card (i.e. we can launch a total of 8 instances with VGPU=1 on that host).
The problem is that there are 32 GPU PCI Bus:Device.Function addresses on the host (16 for each card) and a resource provider for each GPU PCI BDF address is created in placement with VPU=1.

So, placement thinks there are 32 VGPUs available but the enabled nvidia vgpu type only allows 8.
When an instance is spawned on the host an mdev is created for a specific BDF. So, we launch 8 instances and 8 mdevs are created, each corresponding to a different PCI address. Launching a 9th instance will pass placement and schedule, but fail spawning due to lack of vgpu capacity on the host.
After deleting one or more instances and attempting to boot a new instance placement will assign one of the 32 resource providers and only succeed spawning if that resource provider corresponds with the BDF of an existing and available mdev.
To workaround this I've set a custom trait on 8 of the 32 resource providers that correspond to 4 BDF addresses on each of the two cards in the host and updated relevant flavors to require that trait.