Comment 4 for bug 1631314

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote : Re: Tempest test "test_promote_out_of_sync_share_replica" is concurrency-prone

@Aashaka Shah, Yes. this bug is still present. However, the prevalence hasn't been documented of late.

The behavior here, as Valeriy noted is because there's an update thread in the share manager service that polls for the health of replicas and updates their health status. This thread is not controlled via the API.

When an administrator requests the state to be explicitly changed to 'out_of_sync' (for testing purposes) via the reset-replica-state API, the state change is directly made on the database, however, the update thread may run and change the state back to 'in_sync' (because that's what it truly is). The test currently waits for state 'out_of_sync' and times out.

We need a way for this test to pass deterministically and catch real regressions. We can go with the methods I described in my previous response, or, somehow make the state change persist these sort of asynchronous updates from the update thread.

I'd be glad to help you fix this bug, or provide any details you might need.