To summarize your comments, I think there are three ways to fix this issue:
1. The same way like build or evacuate(both of them were fixed before Stein) to make a late affinity check in compute service to check for the race in scheduler and the re-scheduler or fail at last. [1] we can also add the validate function for other move operations.
2. Like the starlingX codes, to add the external lock(maybe replace to use distributed lock like Tooz library in cinder which is not used in nova).
3. For long-term, model affinity and anti-affinity in placement service.
For short-term, I'd like to choose the first way to fix it. How about it?
Hi Matt
To summarize your comments, I think there are three ways to fix this issue:
1. The same way like build or evacuate(both of them were fixed before Stein) to make a late affinity check in compute service to check for the race in scheduler and the re-scheduler or fail at last. [1] we can also add the validate function for other move operations.
2. Like the starlingX codes, to add the external lock(maybe replace to use distributed lock like Tooz library in cinder which is not used in nova).
3. For long-term, model affinity and anti-affinity in placement service.
For short-term, I'd like to choose the first way to fix it. How about it?
[1] https:/ /github. com/openstack/ nova/blob/ stable/ stein/nova/ compute/ manager. py#L1358- L1411