Stop stack tracing when trying to auto-stop a stopped instance
Commit cc5388bbe81aba635fb757e202d860aeed98f3e8 added locks to
stop_instance and the _sync_power_states periodic task to try and fix a
race between stopping the instance via the API where the task_state is
set to powering-off, and the periodic task seeing the instance
power_state as shutdown in _sync_instance_power_state and calling the
stop API again, at which point the task_state is already None from the
first stop API call and we get an UnexpectedTaskStateError.
The handle_lifecycle_event method is getting callbacks from the libvirt
driver on state changes on the VM and calling the
_sync_instance_power_state method which may try to stop the instance
asynchronously, and lead to UnexpectedTaskStateError if the instance is
already stopped by the time it gets the lock and the task_state has
changed.
Attempting to lock in handle_lifecycle_event just moves the race around
so this change adds logic to stop_instance such that if the instance
says it's active but the virt driver says it's not running, then we add
None to the expected_task_state so we don't stacktrace on
instance.save().
An alternative and/or additional change to this would be doing a call
rather than a cast when _sync_instance_power_state calls the stop API
but in some previous testing it doesn't appear to make a significant
difference in the race found when we hit the stop_instance method.
Adds a bunch of debug logging since this code is inherently racey and
is needed when looking at failures around these operations.
Reviewed: https:/ /review. openstack. org/108014 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=aa1792eb4c1 d10e9a192142ce7 e20d37871d916a
Committed: https:/
Submitter: Jenkins
Branch: master
commit aa1792eb4c1d10e 9a192142ce7e20d 37871d916a
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 2 12:11:55 2014 -0700
Stop stack tracing when trying to auto-stop a stopped instance
Commit cc5388bbe81aba6 35fb757e202d860 aeed98f3e8 added locks to power_state and calling the tateError.
stop_instance and the _sync_power_states periodic task to try and fix a
race between stopping the instance via the API where the task_state is
set to powering-off, and the periodic task seeing the instance
power_state as shutdown in _sync_instance_
stop API again, at which point the task_state is already None from the
first stop API call and we get an UnexpectedTaskS
The handle_ lifecycle_ event method is getting callbacks from the libvirt instance_ power_state method which may try to stop the instance tateError if the instance is
driver on state changes on the VM and calling the
_sync_
asynchronously, and lead to UnexpectedTaskS
already stopped by the time it gets the lock and the task_state has
changed.
Attempting to lock in handle_ lifecycle_ event just moves the race around save().
so this change adds logic to stop_instance such that if the instance
says it's active but the virt driver says it's not running, then we add
None to the expected_task_state so we don't stacktrace on
instance.
An alternative and/or additional change to this would be doing a call power_state calls the stop API
rather than a cast when _sync_instance_
but in some previous testing it doesn't appear to make a significant
difference in the race found when we hit the stop_instance method.
Adds a bunch of debug logging since this code is inherently racey and
is needed when looking at failures around these operations.
Closes-Bug: #1339235
Closes-Bug: #1266611
Related-Bug: #1320628
Change-Id: Ib495a5ab15de88 051c5fa7abfb58a 5445691dcad