Comment 28 for bug 1821755

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

having done further testing I conclude that the effectiveness of the fix will vary by version of OpenStack and number of available threads/workload in the hypervisor.

The main purpose of the fix is to mitigate the issue by adding locks and additional checks. The code with the fix is still prone to race conditions.

The primary race condition is that 2 threads running prep_resize (dest_node) concurrently will not detect affinity violation because the locked code does not set any property after performing the validation. The property is set at a later part of the code running in the source node as part of another async RPC.

The secondary race condition is that depending on workload, if the threads are not competing at prep_resize, the validation of the 2nd thread may fail if the RPC on the source node has not been run yet.