Comment 1 for bug 1836204

Revision history for this message
Eric Fried (efried) wrote :

This is of high importance not because the race is particularly likely in current code, but we need to establish the framework to fix it so we can reuse that framework for other similar types of hardware.

In general, the fix is to claim (earmark for use by a specific instance) specific hardware artifacts [1] on the compute node in instance_claim, which is under COMPUTE_RESOURCE_SEMAPHORE. But only the virt driver can know what needs to be done to effect that claim for its specific hypervisor. And today instance_claim doesn't talk to the virt driver at all.

So the solution discussed in IRC [2] is to establish a new ComputeDriver interface, working title claim_for_instance() (and possibly a corresponding unclaim_for_instance() for rollbacks), which will be invoked from instance_claim (and _move_claim).

Using VGPUs-in-libvirt as an example, claim_for_instance would use an in-memory dict to associate a specific mdev with the specific instance for each VGPU in the allocation. This mapping could then be deleted during spawn, since the information can subsequently be gleaned from the domain XML.

[1] where "hardware" encompasses things like VFs - don't get pedantic on me
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-07-11.log.html#t2019-07-11T12:39:18