shelve offload does not reduce core and RAM quota

Bug #1630454 reported by Tim Bell
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Wishlist
Unassigned

Bug Description

Currently, when an instance is shelved and offloaded, the usage quota for RAM and cores does not reduced although the resources themselves are no longer used on the hypervisor.

The IP should continue to be accounted for as it is reserved.

This fix is needed as we are looking to encourage our users to shelve less used instances when they get near their quota limits rather than asking for a quota increase.

1) create a VM:
openstack server create --image 7239c5e1-7b7b-4912-8c48-ee201c211a8f --flavor m1.tiny test-001

2) check quota usage:
nova absolute-limits

+--------------------+------+-------+
| Name | Used | Max |
+--------------------+------+-------+
| Cores | 1 | 20 |
| ImageMeta | - | 128 |
| Instances | 1 | 10 |
| Keypairs | - | 100 |
| Personality | - | 5 |
| Personality Size | - | 10240 |
| RAM | 512 | 51200 |
| Server Meta | - | 128 |
| ServerGroupMembers | - | 10 |
| ServerGroups | 0 | 10 |
+--------------------+------+-------+

3) shelve the instance and wait for it to offload
openstack shelve test-001

4) check quota usage:
nova absolute-limits

+--------------------+------+-------+
| Name | Used | Max |
+--------------------+------+-------+
| Cores | 1 | 20 |
| ImageMeta | - | 128 |
| Instances | 1 | 10 |
| Keypairs | - | 100 |
| Personality | - | 5 |
| Personality Size | - | 10240 |
| RAM | 512 | 51200 |
| Server Meta | - | 128 |
| ServerGroupMembers | - | 10 |
| ServerGroups | 0 | 10 |
+--------------------+------+———+

Tags: compute quotas
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Good point, we should only count those resources once they're unshelved, I guess.
The real question is tho, should we encourage overprovisioning of resources using the shelve API ? Like you said, we're still allocating one IP address for that, and possibly some clouds couldn't like to have their pools exhausted by people overcommitting and not recycling their VMs.

I guess it's an open question, and TBH I don't think shelve is very cloud-compatible.

tags: added: compute quotas
Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

TBC, I think there is some action to be done, hence me tagging the bug as confirmed, but I don't know if the proper solution is to basically to not count offloaded instances. I'd rather like to see a discussion about the rationale of overcommitting a set of resources that can be bounded.

Revision history for this message
Andrew Laski (alaski) wrote :

There is still a set of bounded resources: IP, Glance storage for an image snapshot, and/or a volume. The ask here that CPU and RAM quota be decremented is reasonable. Decrementing DISK is reasonable as well.

The reason why things are done they way they are currently is that the choice was made to ensure that unshelve would always be possible, or at least not blocked by not having quota available to spin the instance back up. Now, this feature was implemented before the specs process existed so there was less discussion and oversight when it was added so it was mostly the choice of the lone developer of the feature. It's difficult to know how many users out there are happy with that design choice, but we do know from operator feedback that many of them wish it worked as this bug report suggests.

I will say that I don't believe this is a bug. It's arguably a poor design choice but the behavior is not faulty. It is valuable feedback though so I don't think we should close this out.

We have two real choices here and a third option I'll mention but that we shouldn't use. We can continue as is. We can change shelve to decrement quota when the resources are offloaded from the compute node. Or we could add a config to let deployers toggle between the previous two choices, but we should not do this.

I would support it if someone wanted to propose a spec to change the quota behavior of shelving. It is however not as simple as just changing the quota behavior. Quotas would need to be retroactively decremented for instances which are shelved at the time of upgrade. Care would need to be taken to ensure that there is no DoS potential, though I don't believe there is.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

So, given it seems there was a previous consensus about not quoting shelves, I think it would need a spec for discussing about that specific behaviour and see if the consensus needs to be reviewed, ie. if we accept to overprovision bounded resources given the DoS possibility (and whether it would need some config option).

Putting that bug as Wishlist, but I'd rather prefer some blueprint going up so we could discuss about that correctly.

Changed in nova:
importance: Low → Wishlist
Revision history for this message
melanie witt (melwitt) wrote :

As discussed in the comments, this is working as currently designed. There's no opposition to changing the behavior but doing so would need a spec to propose the behavior change along with addressing details around how to handle the change for existing shelved offloaded instances during an upgrade, etc.

If someone is interested in proposing and working on that, please send an email to <email address hidden> with the tag "[nova]" in the subject line and we can get the ball rolling about it.

I'm going to close this bug as "Invalid" since it's working as designed and I don't think keeping this open as a bug in our backlog is useful.

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Maxim Nestratov (mnestratov) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.