Nested quota -1 limit race condition

Bug #1552944 reported by Ryan McNair
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Won't Fix
Undecided
Unassigned

Bug Description

There's a race condition when updating quota limits to/from -1 if the project or it's children are being actively used with volume create/delete requests.

Take the example with a project hierarchy of:
                                         A (limit = 5, in-use=0, allocated = 5)
                                       / \
       B (limit = -1, in-use = 3) C (limit = 2, in-use = 0)

Now we update the quota limit of B = 5, and at the same time volumes are being deleted from B. The race condition occurs because, in order to update the quota to B, we need to update A so that B is now contributing exactly 5 volumes to A's allocated value from B. Since B limit is -1, we'll subtract the difference between 5 and B's current usage from A. But this happens at the same time that B's usage value is changing (because volumes are getting deleted) causing the race condition. See the code here - https://github.com/openstack/cinder/blob/master/cinder/api/contrib/quotas.py#L322-L334.

There is no locking between checking the usage of a project and updating the allocated value of it's parent. In order to fix this, it seems like we'd need to add locking so we can:
     1) Get the current usage of the project (e.g. B has limit of 3)
     2) The quota limit of the project is updated, this way new reservation requests will stop at the appropriate place in the hierarchy (e.g. if B had a child with -1 limit, those reservations should no longer propagate up to A's allocated)

This would require locking in the quota_reserve code as well as any of the quota-update / quota-delete code, which does not seem great, and this would also not help if there were API services running on different servers.

One additional complication occurs because a reservation could get rolled back at a later date. In the cases of nesting -1 values, we create the reservation to affect the parent's allocated as well as the child's reserved and handle these reservations as a group. For instance, if a volume reservation was created on B, there is also a reservation created on A to affect it's allocated value and the reservations are either all committed or all rolled-back. Now if there's a reservation created for B, then we update the limit of B to no longer be -1, we will subtract this reserved volume from A's usage, but if later the reservation is rolled back, the quota of A now will incorrectly be updated.

It seems like there was a possibility of a race condition before (https://github.com/openstack/cinder/blob/master/cinder/quota.py#L372), but the problem seems significantly worsened by the -1 usage and makes it dangerous to update to/from -1 limits on active cloud.

Revision history for this message
Ryan McNair (rdmcnair) wrote :

Can exercise this race condition by running the tests at https://review.openstack.org/#/c/285640/ and removing "time.sleep(5)" from setUp and resource_cleanup.

Revision history for this message
Gorka Eguileor (gorka) wrote :

Nested quota were remove from the Cinder code some releases ago, so related bugs no longer apply.

Changed in cinder:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.