error in the way we poll for instance states in test (and maybe in regular operation) lead to false positive timeouts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack DBaaS (Trove) |
In Progress
|
Undecided
|
Unassigned |
Bug Description
Consider the errors reported in
The gist of it is that two instances are being polled for state transition to BUILD and then ACTIVE.
Waiting for states (['BUILD', 'ACTIVE']) for instances: [u'8beac1c9-
What ends up happening is this:
2016-12-19 07:40:12.165091 | 2016-12-19 07:40:12.164 | [REPORT] Waiting for states (['BUILD', 'ACTIVE']) for instances: [u'8beac1c9-
2016-12-19 07:40:12.167477 | 2016-12-19 07:40:12.167 | [REPORT] Waiting for states (['BUILD', 'ACTIVE']) for instance: 8beac1c9-
2016-12-19 07:40:12.185347 | 2016-12-19 07:40:12.184 | [REPORT] Polling instance '8beac1c9-
2016-12-19 07:40:12.187654 | 2016-12-19 07:40:12.187 | [REPORT] Instance state was not 'BUILD', moving to the next expected state.
2016-12-19 07:40:12.209527 | 2016-12-19 07:40:12.209 | [REPORT] Polling instance '8beac1c9-
2016-12-19 07:40:12.212055 | 2016-12-19 07:40:12.211 | [REPORT] Waiting for states (['BUILD', 'ACTIVE']) for instance: ff4a317d-
2016-12-19 07:40:12.231413 | 2016-12-19 07:40:12.230 | [REPORT] Polling instance 'ff4a317d-
2016-12-19 07:40:12.247751 | 2016-12-19 07:40:12.247 | [REPORT] Polling instance '8beac1c9-
2016-12-19 07:40:12.263342 | 2016-12-19 07:40:12.262 | [REPORT] Polling instance 'ff4a317d-
2016-12-19 07:40:12.266257 | 2016-12-19 07:40:12.265 | [REPORT] Instance '8beac1c9-
2016-12-19 07:40:12.284736 | 2016-12-19 07:40:12.283 | [REPORT] Polling instance 'ff4a317d-
2016-12-19 07:40:12.305090 | 2016-12-19 07:40:12.304 | [REPORT] Polling instance 'ff4a317d-
2016-12-19 07:40:12.324840 | 2016-12-19 07:40:12.324 | [REPORT] Polling instance 'ff4a317d-
[...]
2016-12-19 07:40:14.828223 | 2016-12-19 07:40:14.827 | [REPORT] Status of instance 'ff4a317d-
Changed in trove: | |
assignee: | Amrith Kumar (amrith) → nobody |
The race here is as follows
477 def _assert_ instance_ states( self, instance_id, expected_states, status= None, all_states= False): all_states' is not set) the code assumes the log("Waiting for states (%s) for instance: %s" % status= fast_fail_ status) : status= fast_fail_ status) , self.def_ sleep_time, self.def_ timeout) log("Instance '%s' has gone '%s' in %s." % since(start_ time))) PollTimeOut: since(start_ time)))
478 fast_fail_
479 require_
480 """Keep polling for the expected instance states until the instance
481 acquires either the last or fast-fail state.
482
483 If the instance state does not match the state expected at the time of
484 polling (and 'require_
485 instance had already acquired before and moves to the next expected
486 state.
487 """
488
489 self.report.
490 (expected_states, instance_id))
491
492 if fast_fail_status is None:
493 fast_fail_status = ['ERROR', 'FAILED']
494 found = False
495 for status in expected_states:
496 if require_all_states or found or self._has_status(
497 instance_id, status, fast_fail_
498 found = True
499 start_time = timer.time()
500 try:
501 poll_until(lambda: self._has_status(
502 instance_id, status,
503 fast_fail_
504 sleep_time=
505 time_out=
506 self.report.
507 (instance_id, status,
508 self._time_
509 except exception.
510 self.report.log(
511 "Status of instance '%s' did not change to '%s' "
512 "after %s."
513 % (instance_id, status, self._time_
514 return False
515 else:
516 self.report.log(
517 "Instance state was not '%s', moving to the next expected "
518 "state." % status)
519
520 return found
The code at 496 finds the instance to have the right state and goes into the try at 501 but the instance has changed state by that point and the test at 501 now times out.