Proper handling of suspended/stopped VMs in scheduling

Bug #1791679 reported by Tobias Rydberg
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Public Cloud WG
Won't Fix
Undecided
Tobias Rydberg

Bug Description

Currently, suspended VMs are handled for resource allocation by the Nova scheduler in exactly the same way as running VMs are. For example, say a compute node has 128GB RAM to use for guests, and it has 4 32GB guests scheduled that are all suspended (and hence, use no RAM and no CPU), then the Nova scheduler considers that node "full" and won't assign any more guests to that node, under the assumption that suspended guests can wake up any time. The same applies to guests that are currently stopped. In summary, guests in the SHUTOFF and ACTIVE states are treated equally.

While that assumption is fine to be applied by default, it should be configurable. For example, we could have an option like shutoff_ram_ratio (I can't think of a better word right now) that would default to 1.0, meaning a suspended/stopped VM is treated exactly like a running one for scheduling purposes. Then if an operator changes that value to, say, 0.2, suspended VMs would only count toward scheduling with 20% of configured RAM allocation. A similar factor could be applied for CPU cores (but not, obviously, to disk space, as that is utilized even if a guest is suspended).

Ideally, just like {ram,disk,cpu}_allocation_ratio, this option would be set at the scheduler level but be overridable for each compute node.

Revision history for this message
Matt Riedemann (mriedem) wrote :

But if we don't support https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681 and the user resumes the guest, and now the compute node is overcommitted, is that acceptable? Because we don't plan on supporting https://bugs.launchpad.net/openstack-publiccloud-wg/+bug/1791681 to auto-migrate on resume if the node is overcommitted.

Revision history for this message
Matt Riedemann (mriedem) wrote :

What you really want here is shelve which offloads the guest from the host and then on unshelve reschedules the instance to a new host.

Changed in openstack-publiccloud-wg:
status: New → Won't Fix
Revision history for this message
Matt Riedemann (mriedem) wrote :

The problem with shelve, is that operators want the users to use it but users don't care about it or know what shelve is, they just do stop/start. Huawei public cloud would like the stop API to do a shelve under the covers, but that would not be interoperable without a microversion, in other words we wouldn't just add a config option to control if the stop API does a shelve or not. However, nova CLI at least defaults to the latest negotiated microversion if the user didn't specify one, so we have that going for us - but openstack CLI does not do that latest microversion negotiation (yet - Monty Taylor might be working on that).

Revision history for this message
Matt Riedemann (mriedem) wrote :

I wonder if we could add a compute API microversion to the stop/suspend APIs for an offload behavior which has values:

* auto
* shelve
* retain (?)

The API could default to auto which would check a config option per-deployment and that would default to retain behavior (what we have today - stop/suspend on the same host), but could be configured to shelve. That way deployments can configure their desired default behavior, but users could still have an override option if they really care.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Here is a nova blueprint we could start discussions:

https://blueprints.launchpad.net/nova/+spec/shelve-on-stop

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.