OpenStack Compute (nova)

test_create_image_with_reboot fails with InstanceInvalidState in gate-nova-python*

Bug #1266611 reported by Matt Riedemann on 2014-01-07

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Matt Riedemann	OpenStack Compute (nova) 2014.2 "juno"

Bug Description

Looks like an intermittent failure:

http://logs.openstack.org/25/64725/4/check/gate-nova-python27/e603e9e/testr_results.html.gz

2014-01-06 21:49:45.870 | Traceback (most recent call last):
2014-01-06 21:49:45.870 | File "nova/tests/api/ec2/test_cloud.py", line 2343, in test_create_image_with_reboot
2014-01-06 21:49:45.870 | self._do_test_create_image(False)
2014-01-06 21:49:45.871 | File "nova/tests/api/ec2/test_cloud.py", line 2316, in _do_test_create_image
2014-01-06 21:49:45.871 | no_reboot=no_reboot)
2014-01-06 21:49:45.871 | File "nova/api/ec2/cloud.py", line 1709, in create_image
2014-01-06 21:49:45.872 | name)
2014-01-06 21:49:45.872 | File "nova/compute/api.py", line 161, in inner
2014-01-06 21:49:45.872 | method=f.__name__)
2014-01-06 21:49:45.873 | InstanceInvalidState: Instance b1d4d924-069c-409c-bbdb-4f0478526057 in task_state powering-off. Cannot snapshot_volume_backed while the instance is in this state.

Tags:

Matt Riedemann (mriedem) on 2014-01-07

summary:

- test_create_image_with_reboot fails in gate-nova-python27
+ test_create_image_with_reboot fails with InstanceInvalidState in gate-
+ nova-python27

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-01-07:

#1

4 hits in the last 7 days:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOigrXCJJbnN0YW5jZUludmFsaWRTdGF0ZTogSW5zdGFuY2VcIiArXCJpbiB0YXNrX3N0YXRlIHBvd2VyaW5nLW9mZi4gQ2Fubm90IHNuYXBzaG90X3ZvbHVtZV9iYWNrZWQgd2hpbGUgdGhlIGluc3RhbmNlIGlzIGluIHRoaXMgc3RhdGUuXCIpIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg5MTIxODUxMjE0fQ==

summary:

test_create_image_with_reboot fails with InstanceInvalidState in gate-
- nova-python27
+ nova-python*

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-01-07:

#2

elastic-recheck query patch is here: https://review.openstack.org/#/c/65344/

Joe Gordon (jogo) on 2014-02-08

Changed in nova:
status:	New → Confirmed
status:	Confirmed → Won't Fix
status:	Won't Fix → Triaged
status:	Triaged → Confirmed
status:	Confirmed → Triaged

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-03-12:

#4

Now hitting it with UnexpectedTaskState here:

http://logs.openstack.org/47/69047/15/check/gate-nova-python26/38dcf86/console.html#_2014-03-09_23_37_24_801

tags:

added: ec2

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-03-12:

#5

Opened bug 1291620 for the other failure.

Matt Riedemann (mriedem) on 2014-06-01

tags:	added: db
tags:	removed: db

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-07-18:

#6

Looks related to bug 1339235.

Changed in nova:
importance:	Undecided → Medium
tags:	added: compute

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-18: Fix proposed to nova (master)

#7

Fix proposed to branch: master
Review: https://review.openstack.org/108014

Changed in nova:
assignee:	nobody → Matt Riedemann (mriedem)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-12: Fix merged to nova (master)

#8

Reviewed: https://review.openstack.org/108014
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa1792eb4c1d10e9a192142ce7e20d37871d916a
Submitter: Jenkins
Branch: master

commit aa1792eb4c1d10e9a192142ce7e20d37871d916a
Author: Matt Riedemann <email address hidden>
Date: Tue Sep 2 12:11:55 2014 -0700

Stop stack tracing when trying to auto-stop a stopped instance

    Commit cc5388bbe81aba635fb757e202d860aeed98f3e8 added locks to
    stop_instance and the _sync_power_states periodic task to try and fix a
    race between stopping the instance via the API where the task_state is
    set to powering-off, and the periodic task seeing the instance
    power_state as shutdown in _sync_instance_power_state and calling the
    stop API again, at which point the task_state is already None from the
    first stop API call and we get an UnexpectedTaskStateError.

    The handle_lifecycle_event method is getting callbacks from the libvirt
    driver on state changes on the VM and calling the
    _sync_instance_power_state method which may try to stop the instance
    asynchronously, and lead to UnexpectedTaskStateError if the instance is
    already stopped by the time it gets the lock and the task_state has
    changed.

    Attempting to lock in handle_lifecycle_event just moves the race around
    so this change adds logic to stop_instance such that if the instance
    says it's active but the virt driver says it's not running, then we add
    None to the expected_task_state so we don't stacktrace on
    instance.save().

    An alternative and/or additional change to this would be doing a call
    rather than a cast when _sync_instance_power_state calls the stop API
    but in some previous testing it doesn't appear to make a significant
    difference in the race found when we hit the stop_instance method.

Adds a bunch of debug logging since this code is inherently racey and
is needed when looking at failures around these operations.

    Closes-Bug: #1339235
    Closes-Bug: #1266611
    Related-Bug: #1320628

Change-Id: Ib495a5ab15de88051c5fa7abfb58a5445691dcad

Reviewed:  https://review.openstack.org/108014
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa1792eb4c1d10e9a192142ce7e20d37871d916a
Submitter: Jenkins
Branch:    master

commit aa1792eb4c1d10e9a192142ce7e20d37871d916a
Author: Matt Riedemann <mriedem@us.ibm.com>
Date:   Tue Sep 2 12:11:55 2014 -0700

Stop stack tracing when trying to auto-stop a stopped instance
    
    Commit cc5388bbe81aba635fb757e202d860aeed98f3e8 added locks to
    stop_instance and the _sync_power_states periodic task to try and fix a
    race between stopping the instance via the API where the task_state is
    set to powering-off, and the periodic task seeing the instance
    power_state as shutdown in _sync_instance_power_state and calling the
    stop API again, at which point the task_state is already None from the
    first stop API call and we get an UnexpectedTaskStateError.
    
    The handle_lifecycle_event method is getting callbacks from the libvirt
    driver on state changes on the VM and calling the
    _sync_instance_power_state method which may try to stop the instance
    asynchronously, and lead to UnexpectedTaskStateError if the instance is
    already stopped by the time it gets the lock and the task_state has
    changed.
    
    Attempting to lock in handle_lifecycle_event just moves the race around
    so this change adds logic to stop_instance such that if the instance
    says it's active but the virt driver says it's not running, then we add
    None to the expected_task_state so we don't stacktrace on
    instance.save().
    
    An alternative and/or additional change to this would be doing a call
    rather than a cast when _sync_instance_power_state calls the stop API
    but in some previous testing it doesn't appear to make a significant
    difference in the race found when we hit the stop_instance method.
    
    Adds a bunch of debug logging since this code is inherently racey and
    is needed when looking at failures around these operations.
    
    Closes-Bug: #1339235
    Closes-Bug: #1266611
    Related-Bug: #1320628
    
    Change-Id: Ib495a5ab15de88051c5fa7abfb58a5445691dcad

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-01

Changed in nova:
milestone:	none → juno-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.