shelved_image_id is deleted before completing unshelving instance on compute node

Bug #1427056 reported by Pranali Deore
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Pranali Deore
Kilo
Fix Released
Undecided
Unassigned

Bug Description

Steps to reproduce:

1. Boot an instance from image.
2. Call Shelve instance, instance becomes SHELVED_OFFLOADED or SHELVED state depending on 'shelved_offload_time' configured in nova.conf.
3. Call unshelve instance.

   For shelved_offload_time >= 0:
   3.1 nova-conductor calls RPC.Cast to nova-compute
   If some failure happens in nova-compute. e.g. "Instance failed to spawn" error from libvirt
   3.2 nova-conductor deletes instance_system_metadata.shelved_image_id after RPC.cast for unshelving the instance.
   3.3 Instance becomes SHELVED_OFFLOADED again by revert_task_state, but instance_system_metadata.shelved_image_id is already deleted for this instance

For shelved_offload_time = -1:
   3.1 nova-conductor calls RPC.Cast to nova-compute
   If some failure happens in nova-compute. e.g. "InstanceNotFound" error while starting the instance.
   3.2 nova-conductor deletes snapshot and instance_system_metadata.shelved_image_id after RPC.cast to start the instance.
   3.3 Instance becomes SHELVED again by revert_task_state, but snapshot and instance_system_metadata.shelved_image_id is already deleted for this instance

Problems:
1. As there is no shelved_image_id, during unshelving the instance again, it gives error while getting image-meta in
   libvirt driver and instance remains in SHELVED_OFFLOADED state.

2. As there is no shelved_image_id, deleting the instance will try to delete "image_id=None" image from glance, but 404 error will be returned from glance, instance will be successfully deleted, and shelved image remains.

Changed in nova:
assignee: nobody → Pranali Deore (pranali-deore)
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/160658

Changed in nova:
status: Confirmed → In Progress
description: updated
description: updated
Changed in nova:
assignee: Pranali Deore (pranali-deore) → Abhishek Kekane (abhishek-kekane)
assignee: Abhishek Kekane (abhishek-kekane) → nobody
Changed in nova:
assignee: nobody → Pranali Deore (pranali-deore)
Changed in nova:
assignee: Pranali Deore (pranali-deore) → Abhishek Kekane (abhishek-kekane)
Changed in nova:
assignee: Abhishek Kekane (abhishek-kekane) → Pranali Deore (pranali-deore)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/160658
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b1fbbd375ccbecf8d70db633270fd7063728f05c
Submitter: Jenkins
Branch: master

commit b1fbbd375ccbecf8d70db633270fd7063728f05c
Author: PranaliDeore <email address hidden>
Date: Mon Mar 2 03:48:04 2015 -0800

    Delete shelved_* keys in n-cpu unshelve call

    During unshelve process, if some failure happens in compute manager,
    instance_system_metadata.shelved_image_id gets deleted in
    unshelve_insatnce call of conductor manager after RPC.cast to
    compute manager and instance remains in SHELVED_OFFLOADED or SHELVED
    state.
    It leads to below problems,
    1. Unshelving the instance again, it gives error while retrieving
    image-meta in libvirt driver.
    2. As there is no shelved_image_id, deleting the instance will try to
    delete "image_id=None" image from glance, but 404 error will be returned
    from glance.

    Removed shelved_* keys deletion code from unshelve_instance call of
    conductor manager and added it to unshelve_insatnce call of compute
    manager after spawning the instance.

    Also removed snapshot deletion code from unshelve_instance call of
    conductor manager and added it to start_instance call of compute
    manager.

    Closes-Bug: #1427056
    Change-Id: Iead979b5f6d1519cae58cc494f49e2fd8f323dd5

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/183404

Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/183404
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aaaf7cebcad2c88876e0455c6569a09d04309a19
Submitter: Jenkins
Branch: stable/kilo

commit aaaf7cebcad2c88876e0455c6569a09d04309a19
Author: PranaliDeore <email address hidden>
Date: Mon Mar 2 03:48:04 2015 -0800

    Delete shelved_* keys in n-cpu unshelve call

    During unshelve process, if some failure happens in compute manager,
    instance_system_metadata.shelved_image_id gets deleted in
    unshelve_instance call of conductor manager after RPC.cast to
    compute manager and instance remains in SHELVED_OFFLOADED or SHELVED
    state.
    It leads to below problems,
    1. Unshelving the instance again, it gives error while retrieving
    image-meta in libvirt driver.
    2. As there is no shelved_image_id, deleting the instance will try to
    delete "image_id=None" image from glance, but 404 error will be returned
    from glance.

    Removed shelved_* keys deletion code from unshelve_instance call of
    conductor manager and added it to unshelve_instance call of compute
    manager after spawning the instance.

    Also removed snapshot deletion code from unshelve_instance call of
    conductor manager and added it to start_instance call of compute
    manager.

    Closes-Bug: #1427056
    Change-Id: Iead979b5f6d1519cae58cc494f49e2fd8f323dd5
    (cherry picked from commit b1fbbd375ccbecf8d70db633270fd7063728f05c)

Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.