Cannot delete an instance that failed a previous delete

Bug #1329559 reported by Joe Gordon
52
This bug affects 12 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Unassigned
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

Currently we have a situation where if an instance fails to delete,
instead of having its state reverted, like we do in most places we set
it to error,deleting. This was intentionally done in
https://review.openstack.org/#/c/58829/ . We also intentionally ignore
duplicate requests to delete an instance if its already being deleted
(https://review.openstack.org/#/c/55444/). The combination of these two
things means that if an instance fails to delete for some reason a
tenant is unable to delete that instance.

It turns out this is really bad because instances in deleting state
count against quota, so the tenant slowly looses usable quota.

To fix this, allow duplicate delete calls to go through if the instance
is in error state.

Revision history for this message
Joe Gordon (jogo) wrote :

Marking as critical because this is hurting openstack-infra in a big way, they have many instances stuck in this state.

Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Joe Gordon (jogo) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

This was already reported in bug 1299139 with a lot more details, but that's fine since this has the patch.

tags: added: api compute
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/99796
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f33a25a3c40722644c774395b38fd7a7ed0246e1
Submitter: Jenkins
Branch: master

commit f33a25a3c40722644c774395b38fd7a7ed0246e1
Author: Joe Gordon <email address hidden>
Date: Thu Jun 12 16:27:07 2014 -0700

    Failure during termination should always leave state as error()

    Currently we have a situation where if an instance fails to delete,
    instead of having its state reverted, like we do in most places we set
    it to error,deleting. This was intentionally done in
    I5fb1bbd56035792f566a6e076edfe7a69df006ef. We also intentionally ignore
    duplicate requests to delete an instance if its already being deleted
    (I2f97f93bd714e0ea3b6d4fa3ac457ab43eed00e1). The combination of these two
    things means that if an instance fails to delete for some reason a
    tenant is unable to delete that instance.

    It turns out this is really bad because instances in deleting state
    count against quota, so the tenant slowly looses usable quota.

    To fix this, upon a failed termination set the vm_state to error and
    revert the task_state. This is a partial revert of
    I55742203bdd071c7df90902868e46c2020f799bd.

    Change-Id: I55742203bdd071c7df90902868e46c2020f799bd
    Closes-Bug: #1329559

Changed in nova:
status: Confirmed → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/100469

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/100469
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f40191d0efa783ca879338578097548ee8c84c0
Submitter: Jenkins
Branch: stable/icehouse

commit 2f40191d0efa783ca879338578097548ee8c84c0
Author: Joe Gordon <email address hidden>
Date: Thu Jun 12 16:27:07 2014 -0700

    Failure during termination should always leave state as error()

    Currently we have a situation where if an instance fails to delete,
    instead of having its state reverted, like we do in most places we set
    it to error,deleting. This was intentionally done in
    I5fb1bbd56035792f566a6e076edfe7a69df006ef. We also intentionally ignore
    duplicate requests to delete an instance if its already being deleted
    (I2f97f93bd714e0ea3b6d4fa3ac457ab43eed00e1). The combination of these two
    things means that if an instance fails to delete for some reason a
    tenant is unable to delete that instance.

    It turns out this is really bad because instances in deleting state
    count against quota, so the tenant slowly looses usable quota.

    To fix this, upon a failed termination set the vm_state to error and
    revert the task_state. This is a partial revert of
    I55742203bdd071c7df90902868e46c2020f799bd.

    Change-Id: I55742203bdd071c7df90902868e46c2020f799bd
    Closes-Bug: #1329559
    (cherry picked from commit f33a25a3c40722644c774395b38fd7a7ed0246e1)

tags: added: in-stable-icehouse
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Chuck Short (zulcss)
tags: removed: icehouse-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.