OpenStack Compute (nova)

Bug #1924123
Comment #1

Comment 1 for bug 1924123

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-05-06:

I confirm this based on looking at the nova code.

The root of the problem is that before nova schedules to select the destination of the migration it first moves the placement allocation from the instance_uuid consumer to the new migration_uuid consumer. So that the instance_uuid consumer can be used to allocate resource during the scheduling.

Such move is done by a POST /allocations placement API call where we remove allocation from one consumer and add allocation to another consumer. During this placement has no information about the intention of moving an existing allocation, it treats this as a new allocation. So it checks if there are enough resources for that. Normally there is no problem as we removing exactly the same resources from one of the consumer than what we add to the other. So if the allocation fit to the inventory before the move it will fit after the move.

However, when the DIMM broke nova started reporting less memory inventory to Placement. Placement allows that an inventory decreases even if it leads to overallocation (i.e. where consumption > inventory). However Placment does not allow new allocations if there is overallocation.

This leads to the situation that an allocation that was allowed before the move is not allowed after the move.

Based on this I imagine the solution could be:

A) Make placement aware that the intention is to move an existing allocations between consumers. And allow such move even in case of an overallocation

B) Tweak the inventory reporting logic in nova. Maybe nova should not automatically decrease inventory in placement if it leads to overallocation.

[1] https://github.com/openstack/nova/blob/a74bced0eb3232c0c3384e26db1838bbc26c2d82/nova/scheduler/client/report.py#L1968-L1985