Comment 6 for bug 1836754

Revision history for this message
Matt Riedemann (mriedem) wrote :

This rarely hits in CI because of the small window where we can fail here.

We GET the consumer allocations with a generation here:

https://github.com/openstack/nova/blob/149327a3abb12418cdf65316e7c1d4924767bfdf/nova/scheduler/client/report.py#L1986

Zero out the allocations and then PUT them back here:

https://github.com/openstack/nova/blob/149327a3abb12418cdf65316e7c1d4924767bfdf/nova/scheduler/client/report.py#L2010

So between the GET and PUT the consumer generation changed which is an extremely tight window.

I'm trying to think of a way to write a functional test to recreate this type of failure and it's hard since I'd have to not only wait until the scheduler claims resources to create the allocations I'd have to also block the server delete until after the GET. So something like:

1. create server
2. block on the claim_resources method
3. delete the server
4. block after the GET /allocations call
5. resume claim_resources to PUT allocations during scheduling
6. resume server delete to PUT empty allocations with a stale generation