BackOffLoopingCall can get stuck in an infinite loop
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
oslo.service |
Fix Released
|
Medium
|
melanie witt |
Bug Description
This was noticed while working on an intermittent test failure in Nova:
https:/
The test had been using a real backoff timer with the max timeout set low to verify expected behavior in event of a timeout. A recent change in Nova changed the timeout value from 0.1s to 1s and the test began failing intermittently with the entire test timing out (hitting the 160s overall test timeout). The test (the underlying driver being tested) uses a jitter value of 0.5.
The timer should have timed out within 1s but it didn't and I found that because of the jitter value of 0.5, if random.gauss() values < 0.5 were picked enough over time, the self._interval can eventually walk to zero and once it's stuck at zero, the timer will be in an infinite loop state (because to break the loop self._error_time + idle/self._interval has to be > timeout.
Changed in oslo.service: | |
status: | In Progress → Fix Released |
importance: | Undecided → Medium |
Patch is here: https:/ /review. openstack. org/#/c/ 459790