Instance stuck in task state image_snapshot

Bug #1101136 reported by Julie Pichon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Eoghan Glynn
Folsom
Fix Released
Medium
Eoghan Glynn

Bug Description

This is happening in a Folsom environment.

I tried to create a snapshot of a running instance, and got a message back from Horizon reading "Unable to create snapshot" because the administrator had recently changed snapshotting/image creation to be an admin-only action.

Since then, my instance Task State is stuck in OS-EXT-STS:task_state "image_snapshot".

Steps to reproduce:
1. Update the Glance policy file with "add_image": [["role:admin"]]
2. Attempt to create a snapshot

Expected result:
3. Snapshot fails and the instance returns to a normal active state

Actual result:
4. Snapshot fails and the instance task state stays in "image_snapshot"

Revision history for this message
Eoghan Glynn (eglynn) wrote :

The problem is that the task state reversion logic *only* kicks in if the failure occurs on the compute node.

Whereas in this case, an eager attempt is made to create the image upfront from the API node, before even casting the snapshot_instance RPC to the compute node.

This upfront image creation fails with 403 because of the RBAC settings, but the task state is never reverted.

Changed in nova:
status: New → Confirmed
assignee: nobody → Eoghan Glynn (eglynn)
milestone: none → grizzly-3
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/20246

Changed in nova:
status: Confirmed → In Progress
Eoghan Glynn (eglynn)
tags: added: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/20246
Committed: http://github.com/openstack/nova/commit/d6c527bb6b0ee63fc07b91855fcf4e15f7a03821
Submitter: Jenkins
Branch: master

commit d6c527bb6b0ee63fc07b91855fcf4e15f7a03821
Author: Eoghan Glynn <email address hidden>
Date: Tue Jan 22 15:44:21 2013 +0000

    Avoid stuck task_state on snapshot image failure

    Fixes bug LP 1101136

    Previously if the glance interaction failed prior to an
    instance being snapshot'd or backed up, the task state
    remained stuck at image_snapshot/backup.

    The normal task state reversion logic did not kick in,
    as this is limited to the compute layer, whereas the
    intial glance interaction occurs within the API layer.

    Now, we avoid this problem by delaying setting the task
    state until the initial initial image creation/retrieval
    has completed.

    Change-Id: Id498ae6b3674306743013e4fe99837da8e2031b5

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/20485

Mark McLoughlin (markmc)
tags: removed: folsom-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/20485
Committed: http://github.com/openstack/nova/commit/21d5e907575a2042f1d0daaa9658a8758f619a1c
Submitter: Jenkins
Branch: stable/folsom

commit 21d5e907575a2042f1d0daaa9658a8758f619a1c
Author: Eoghan Glynn <email address hidden>
Date: Fri Jan 25 15:47:33 2013 +0000

    Avoid stuck task_state on snapshot image failure

    Fixes bug LP 1101136

    Previously if the glance interaction failed prior to an
    instance being snapshot'd or backed up, the task state
    remained stuck at image_snapshot/backup.

    The normal task state reversion logic did not kick in,
    as this is limited to the compute layer, whereas the
    intial glance interaction occurs within the API layer.

    Now, we avoid this problem by delaying setting the task
    state until the initial image creation has completed.

    Change-Id: Id498ae6b3674306743013e4fe99837da8e2031b5

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.