quota usage includes instances that are in task_state=deleting and vm_state=active

Bug #1172764 reported by Aleksander Korzynski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Consider a scenario where a user issues a request to delete an instance but the instance doesn't disappear because of a problem with the infrastructure (e.g. a libvirt error). The instance then stays in vm_state=active and task_state=deleting. In that case, it is not the user's fault that the instance doesn't go away. However, the user is penalized for it, because that instance gets counted in the user's quota.

The problem was confirmed to exist on the trunk version of Nova: 2013.2.a332.g18d9a8b.

One way to reproduce this problem is to start an instance then shut down nova-compute and then try to delete the instance. Check the instance in the instances table in the data base. It will have vm_state=active and task_state=deleting. Then check the quota usage with "nova absolute-limits". The totalInstancesUsed variable will include that instance.

Revision history for this message
Kevin L. Mitchell (klmitch) wrote :

Well, in that case, it should be possible to issue "delete" again and cause the instance to actually be deleted; if that fails, then that's the bug we need to fix, or it's due to some other problem (such as a compute being down) which is going to require administrative action anyway. As long as the instance is active, and the user has some recourse through the API that can delete it, I feel the instance should be counted against the user…

Revision history for this message
Phil Day (philip-day) wrote :

Hi Kevin,

I don’t really see the case for still counting resources against a user once the task state gets to "deleting". At this point the used has said that want to delete the instance, and there is no way back - all they can do is issue more deletes so they have no effective control over it any more. If it sticks in “deleting” its an error in the system somewhere, so not the users problem and they shouldn’t have it counted against their quota.

A number of bugs around instances sticking in this stage have been fixed, and the system will now delete instances in the API is the compute service has timed out, so the window Aleksander describes is probably quiet small – but it still feels to me unfair to continue to count resources that the user has asked to be deleted and which they have no way of recovering to be counted against their quota when there is a glitch in the system.

Regards,
Phil

Revision history for this message
Kevin L. Mitchell (klmitch) wrote :

Well, perhaps the solution is a periodic task or something similar that looks for instances in the "deleting" state that have been around for too long, and automatically re-issue a delete on behalf of the user. While some billing systems may be based on the in-use quantity from the quota system, others are probably based on the notifications of completion of deletion, and something like that should be good for covering that as well…

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/28063

Changed in nova:
assignee: nobody → Aleksander Korzynski (a-korzynski)
status: New → In Progress
Revision history for this message
Aleksander Korzynski (a-korzynski) wrote :

Kevin, how about the solution I've just submitted? It adds a new flag: release_instance_quota_before_deletion. If the flag is set to true, the quota is released after the user submits the delete request, before the actual deletion begins. If set to false, quota is released after the deletion succeeds. The default is the current behaviour (i.e. the flag set to false).

Note that I will add/modify tests and comments before marking it as ready for review.

Revision history for this message
Joe Gordon (jogo) wrote :

as per discussion on the patch, marking as invalid

Changed in nova:
status: In Progress → Invalid
assignee: Aleksander Korzynski (a-korzynski) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.