Placement allows to overconsume resources beyond configured allocaiton ratio
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| placement |
In Progress
|
Undecided
|
Unassigned | ||
Bug Description
There's still some small time window where two concurrent requests that try to consume the same last piece of resource can both succeed, and the total allocated resources are more than (total-
The following example is 100% reliably reproduced on current master in DevStack:
- create some custom resource class (say CUSTOM_
- create a resource provider with inventory of 1 unit of that resource with allocation ratio 1.0
- make 2 concurrent requests trying to allocate one unit of this custom resource
- both requests succeed, and resource provider has 2 allocations against it even though its total capacity is 1.
+------
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used |
+------
| CUSTOM_
+------
see full repro script at https:/
While this is a synthetic test, we've seen actual examples in real deployments.
What I am getting so far, is that the following happens:
two requests are being processed in parallel, /opendev. org/openstack/ placement/ src/branch/ master/ placement/ objects/ allocation. py#L507
first one succeeds, the second one correctly detects the concurrent update, goes into retry here in the 'replace_all' function
https:/
so calls _set_allocation s(...) again, which in turn again calls the _check_ capacity_ exceeded( ...), and there, the main big DB query which it does still does not see the new usages that should be already there, usages are still None as they were in the first attempt.
log snippet here https:/ /paste. openstack. org/show/ bhSOAeaeOPi7uou rSvyW/
added logging of 'records' returned by the query, all the three times it is called - 1 for request1, 2 for first attempt of request2, 3 for the second attempt of request2 - the result is the same, usage is not changed and is still None, thus the capacity check passes
could that be some query caching on mysql or sqla side?..