Unable to delete instance when cinder is down

Bug #1563547 reported by Diana Clarke
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Diana Clarke
Liberty
Fix Released
Medium
Diana Clarke
Mitaka
Fix Released
Medium
Diana Clarke

Bug Description

When an instance is attached to a volume and cinder is down, you are unable to delete the instance, and the instance status is ERROR.

This bug is reproducible on master (currently newton) using devstack.

    1. Create an instance
    2. Create a volume
    3. Attach volume to instance
    4. Bring the cinder api down via screen
    5. Attempt to delete the instance
    6. Note that the instance is not deleted
    7. Note that the instance state is ERROR

For example:

    http://paste.openstack.org/show/492359/

This bug was initially reported downstream here:

    https://bugzilla.redhat.com/show_bug.cgi?id=1318883

Changed in nova:
assignee: nobody → Diana Clarke (diana-clarke)
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/298997

Changed in nova:
status: Confirmed → In Progress
description: updated
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Diana:
Can you enforce the delete with "nova force-delete <server>" [1]?

References:
[1] http://docs.openstack.org/cli-reference/nova.html#nova-force-delete

Revision history for this message
Diana Clarke (diana-clarke) wrote :

@Markus:

Here are my 'force-delete' notes, using devstack master (currently newton), with the cinder API down during the 'force-delete' call:

    http://paste.openstack.org/show/492437/

Summary: The instance is not deleted, the resulting instance status is ERROR, and there is a stack trace in the n-cpu logs along with the message: "Successfully reverted task state from deleting on failure for instance".

Revision history for this message
Sean Dague (sdague) wrote :

The bugzilla is not public, can that be changed?

Revision history for this message
Stephen Gordon (sgordon) wrote :

Unfortunately the original bugzilla description contained some customer identifying information. I have posted a version of the description, with modifications to protect the innocent, as a public comment and opened the permissions on the bug. The modifications are represented by the injection of a <placeholder> tag.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Diana:
Thanks for double-checking. Apparently I did interpret the command wrong. My assumption was that it deletes instances without considering errors of other services. But commit [1] explains that this got introduced due to duplicate AMQP messages.
I assume that the frequency and probability of this case you observed is rare, hence I set "low" as importance.

References:
[1] https://git.openstack.org/cgit/openstack/nova/commit/?id=222d445

Changed in nova:
importance: Undecided → Low
Revision history for this message
Diana Clarke (diana-clarke) wrote :

@Markus:

My apologies if I'm missing the obvious, but it's unclear to me how your last two comments relate to this particular bug report.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Diana:
Initially I wanted to figure out if there is an alternative route to delete an instance when Cinder is down. I was under the impression that this could have been important for setting the "importance" field of this bug report (comment #2).
After a second thought (comment #6) it come to my mind that this was not crucial for setting the "importance" field.
Hopefully this explains my comments and doesn't add even more confusion. In case it got more confusing, just ignore my comments, they don't help in solving this bug.

Revision history for this message
Diana Clarke (diana-clarke) wrote :

@ Markus: Ah, I see now. Thanks for taking the time to explain. Happy Friday!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/300785

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/298997
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=27e869bb88f13d05b94107a769f67ddd6975425d
Submitter: Jenkins
Branch: master

commit 27e869bb88f13d05b94107a769f67ddd6975425d
Author: Diana Clarke <email address hidden>
Date: Tue Mar 29 17:48:33 2016 -0400

    Fix: unable to delete instance when cinder is down

    Ignore any cinder related exceptions that arise when you try to delete
    an instance, so that the delete instance workflow isn't abruptly
    interrupted (skipping necessary clean-up, etc).

    In this case, we are intentionally catching all exceptions in addition
    to a specific set of exceptions, so that the instance and related
    resources aren't left in an unmanageable state when an unexpected
    exception occurs. For example, currently you can't delete an instance
    that's attached to a volume when cinder is down.

    The existing error messages were also updated to include the block
    device mapping volume id that triggered the exception.

    Change-Id: Iba7b4cc4b59f88b0817c4618e7a4429161d6c2a9
    Closes-Bug: #1563547

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/300841

Matt Riedemann (mriedem)
Changed in nova:
importance: Low → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/300785
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=167bd7002889d23da76ec8abf770a1a2ef940dac
Submitter: Jenkins
Branch: stable/mitaka

commit 167bd7002889d23da76ec8abf770a1a2ef940dac
Author: Diana Clarke <email address hidden>
Date: Tue Mar 29 17:48:33 2016 -0400

    Fix: unable to delete instance when cinder is down

    Ignore any cinder related exceptions that arise when you try to delete
    an instance, so that the delete instance workflow isn't abruptly
    interrupted (skipping necessary clean-up, etc).

    In this case, we are intentionally catching all exceptions in addition
    to a specific set of exceptions, so that the instance and related
    resources aren't left in an unmanageable state when an unexpected
    exception occurs. For example, currently you can't delete an instance
    that's attached to a volume when cinder is down.

    The existing error messages were also updated to include the block
    device mapping volume id that triggered the exception.

    Change-Id: Iba7b4cc4b59f88b0817c4618e7a4429161d6c2a9
    Closes-Bug: #1563547
    (cherry picked from commit 27e869bb88f13d05b94107a769f67ddd6975425d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/300841
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4d1384b1da0a84646c775993e170e52792cc8f37
Submitter: Jenkins
Branch: stable/liberty

commit 4d1384b1da0a84646c775993e170e52792cc8f37
Author: Diana Clarke <email address hidden>
Date: Tue Mar 29 17:48:33 2016 -0400

    Fix: unable to delete instance when cinder is down

    Ignore any cinder related exceptions that arise when you try to delete
    an instance, so that the delete instance workflow isn't abruptly
    interrupted (skipping necessary clean-up, etc).

    In this case, we are intentionally catching all exceptions in addition
    to a specific set of exceptions, so that the instance and related
    resources aren't left in an unmanageable state when an unexpected
    exception occurs. For example, currently you can't delete an instance
    that's attached to a volume when cinder is down.

    The existing error messages were also updated to include the block
    device mapping volume id that triggered the exception.

    Change-Id: Iba7b4cc4b59f88b0817c4618e7a4429161d6c2a9
    Closes-Bug: #1563547
    (cherry picked from commit 27e869bb88f13d05b94107a769f67ddd6975425d)

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 12.0.3

This issue was fixed in the openstack/nova 12.0.3 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/nova 14.0.0.0b1

This issue was fixed in the openstack/nova 14.0.0.0b1 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.1.0

This issue was fixed in the openstack/nova 13.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.