So between the GET and PUT the consumer generation changed which is an extremely tight window.
I'm trying to think of a way to write a functional test to recreate this type of failure and it's hard since I'd have to not only wait until the scheduler claims resources to create the allocations I'd have to also block the server delete until after the GET. So something like:
1. create server
2. block on the claim_resources method
3. delete the server
4. block after the GET /allocations call
5. resume claim_resources to PUT allocations during scheduling
6. resume server delete to PUT empty allocations with a stale generation
This rarely hits in CI because of the small window where we can fail here.
We GET the consumer allocations with a generation here:
https:/ /github. com/openstack/ nova/blob/ 149327a3abb1241 8cdf65316e7c1d4 924767bfdf/ nova/scheduler/ client/ report. py#L1986
Zero out the allocations and then PUT them back here:
https:/ /github. com/openstack/ nova/blob/ 149327a3abb1241 8cdf65316e7c1d4 924767bfdf/ nova/scheduler/ client/ report. py#L2010
So between the GET and PUT the consumer generation changed which is an extremely tight window.
I'm trying to think of a way to write a functional test to recreate this type of failure and it's hard since I'd have to not only wait until the scheduler claims resources to create the allocations I'd have to also block the server delete until after the GET. So something like:
1. create server
2. block on the claim_resources method
3. delete the server
4. block after the GET /allocations call
5. resume claim_resources to PUT allocations during scheduling
6. resume server delete to PUT empty allocations with a stale generation