If source compute node is overcommitted instances can't be migrated

Bug #1924123 reported by Belmiro Moreira
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Medium
Unassigned

Bug Description

I'm facing a similar issue to "https://bugs.launchpad.net/nova/+bug/1918419"
but somehow different which makes me open a new bug.

I'm giving some context to this bug to better explain how this affects operations. Here's the story...

When a compute node needs a hardware intervention we have an automated process that the repair team uses (they don't have access to OpenStack APIs) to live migrate all the instances before starting the repair. The motivation is to minimize the impact on users.

However, instances can't be live migrated if the compute node becomes overcommitted!

It happens that if a DIMM fails in a compute node that has all the memory allocated to VMs, it's not possible to move those VMs.

"No valid host was found. Unable to replace instance claim on source (HTTP 400)"

The compute node becomes overcommitted (because the DIMM is not visible anymore) and placement can't create the migration allocation in the source.

The operator can workaround and "tune" the memory overcommit for the affected compute node, but that requires investigation and a manual intervention of an operator defeating automation and delegation to other teams. Extremely complicated in large deployments.

I don't believe this behaviour is correct.
If there are available resources to host the instances in a different compute node, placement shouldn't block the live migration because the source is overcommitted.

+++

Using Nova Stein.
For what I checked looks it's still the behaviour in recent releases.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I confirm this based on looking at the nova code.

The root of the problem is that before nova schedules to select the destination of the migration it first moves the placement allocation from the instance_uuid consumer to the new migration_uuid consumer. So that the instance_uuid consumer can be used to allocate resource during the scheduling.

Such move is done by a POST /allocations placement API call where we remove allocation from one consumer and add allocation to another consumer. During this placement has no information about the intention of moving an existing allocation, it treats this as a new allocation. So it checks if there are enough resources for that. Normally there is no problem as we removing exactly the same resources from one of the consumer than what we add to the other. So if the allocation fit to the inventory before the move it will fit after the move.

However, when the DIMM broke nova started reporting less memory inventory to Placement. Placement allows that an inventory decreases even if it leads to overallocation (i.e. where consumption > inventory). However Placment does not allow new allocations if there is overallocation.

This leads to the situation that an allocation that was allowed before the move is not allowed after the move.

Based on this I imagine the solution could be:

A) Make placement aware that the intention is to move an existing allocations between consumers. And allow such move even in case of an overallocation

OR

B) Tweak the inventory reporting logic in nova. Maybe nova should not automatically decrease inventory in placement if it leads to overallocation.

[1] https://github.com/openstack/nova/blob/a74bced0eb3232c0c3384e26db1838bbc26c2d82/nova/scheduler/client/report.py#L1968-L1985

tags: added: compute placement resource-tracker
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
sean mooney (sean-k-mooney) wrote :

i think nova reducing the reported memory is correct but perhaps it should not do it if and only if
that would make it over allocated

in that case setting reserved = total-used might be better untill the instace can be moved
then it can reduce the total once it can report the corrected amount without over allocating

Revision history for this message
sean mooney (sean-k-mooney) wrote :

by the way nova more or less assumes/requires that the resouces remain constant while its running.

for pci devices that is a requirement as we do not support hotplug of device into the host while nova-comptue is running.

for cpus and memory we have allways assumed the same that it will not change while nova-compute is running. we cerntelly do not support offlinine or onlining cpus and i generally assumed the either the host would crash if a dim died or you were using mirror dimms in which case it wont crash but your capsticy shoudl not chnage.

so to me im not sure how you got into a state where the vms were running after the dim failure.

if the dimm failure did not crash the host i would at least have expected it to crash the vms on the dimm.

so esstially i was expecting that you woudl cold migrate the instance in this case sicne they would not be running.

Revision history for this message
Victor Coutellier (alistarle) wrote :

I will put my cent in this bug as I was impacted by it recently. I will add that this bug not only apply in case of reduction of the host hardware inventory (so hardware failure), but also in case of the reduction of the overallocation ratio, which can be much more common and allowed by the API.

Indeed, if a host is considered as full with a given allocation ratio (either for CPU/RAM or DISK), reducing it will put the host in this overallocated state, and we won't be able to live migrate, or even resize any VM on the host.

As an operator perspective, reducing the overallocation ratio then migrate some VM to match this new ratio can be a common use-case.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.