Volume remains in-use status, if instance booted from volume is deleted in error state

Bug #1404867 reported by Abhishek Kekane on 2014-12-22
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
melanie witt

Bug Description

If the instance is booted from volume and goes in to error state due to some reason.
Volume from which instance is booted, remains in-use state even the instance is deleted.
IMO, volume should be detached so that it can be used to boot other instance.

Steps to reproduce:

1. Log in to Horizon, create a new volume.
2. Create an Instance using newly created volume.
3. Verify instance is in active state.
$ source devstack/openrc demo demo
$ nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | ACTIVE | - | Running | private=10.0.0.3 |
+--------------------------------------+------+--------+------------+-------------+------------------+

Note:
Use shelve-unshelve api to see the instance goes into error state.
unshelving volumed back instance does not work and sets instance state to error state (ref: https://bugs.launchpad.net/nova/+bug/1404801)

4. Shelve the instance
$ nova shelve <instance-uuid>

5. Verify the status is SHELVED_OFFLOADED.
$ nova list
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | SHELVED_OFFLOADED | - | Shutdown | private=10.0.0.3 |
+--------------------------------------+------+-------------------+------------+-------------+------------------+

6. Unshelve the instance.
$ nova unshelve <instance-uuid>

5. Verify the instance is in Error state.
$ nova list
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | Error | unshelving | Spawning | private=10.0.0.3 |
+--------------------------------------+------+-------------------+------------+-------------+------------------+

6. Delete the instance using Horizon.

7. Verify that volume still in in-use state
$ cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| 4aeefd25-10aa-42c2-9a2d-1c89a95b4d4f | in-use | test | 1 | lvmdriver-1 | true | 8f7bdc24-1891-4bbb-8f0c-732b9cbecae7 |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

8. In Horizon, volume "Attached To" information is displayed as "Attached to None on /dev/vda".

9. User is not able to delete this volume, or attached it to another instance as it is still in use.

description: updated
Changed in nova:
assignee: nobody → Abhishek Kekane (abhishek-kekane)
Tushar Patil (tpatil) on 2014-12-22
summary: - Volume remains in-use status, if instance booted from volume deleted
- when it is in the error state
+ Volume remains in-use status, if instance booted from volume is deleted
+ in error state
Liyingjun (liyingjun) on 2014-12-23
Changed in nova:
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/145738

Changed in nova:
assignee: Abhishek Kekane (abhishek-kekane) → Ankit Agrawal (ankitagrawal)
status: Confirmed → In Progress
Changed in nova:
importance: Undecided → Low
melanie witt (melwitt) on 2015-04-20
tags: added: compute
removed: ntt

Reviewed: https://review.openstack.org/145738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Submitter: Jenkins
Branch: master

commit d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Author: ankitagrawal <email address hidden>
Date: Tue Dec 23 06:34:32 2014 -0800

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Currently in test_servers.py, "test_delete_server_instance" executes
    similar to "test_delete_server_instance_while_building". This is because
    "test_delete_server_instance" calls instance.save() method which updates
    vm_state to building where it should be in active state.

    Fixed "test_delete_server_instance" to test deleting an instance which
    is in active state and has a valid host.

    Closes-Bug: #1404867
    Closes-Bug: #1408527
    Change-Id: Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb

Changed in nova:
status: In Progress → Fix Committed

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/183764
Reason: The change on master was reverted, so this would have to be fixed on master first to avoid the race issues and then if you propose a backport to stable/kilo, you have to squash all of those fixes together so we don't have the same race in stable/kilo.

Changed in nova:
status: Fix Committed → In Progress
Matt Riedemann (mriedem) wrote :

Marking this as New again since the original change was reverted.

Changed in nova:
status: In Progress → New
Changed in nova:
status: New → In Progress
Ankit Agrawal (ankitagrawal) wrote :

This issue is not reproducible with the steps mentioned in bug description after LP bug #1404801 is fixed.

Please find below a different scenario to reproduce this issue on current master:

1. Boot instance from image.
2. Attach volume to the instance.
3. Shelve instance.
4. Delete the snapshot taken during shelve instance.
5. Unshelve instance (Instance goes in to error state).

Now if we delete this instance created at step 1, instance is deleted successfully but volume remains in-use and we are not even able to delete that volume then.

Reviewed: https://review.openstack.org/226690
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cb02486816e646f6b60d973f0e43bdb61b375c5b
Submitter: Jenkins
Branch: master

commit cb02486816e646f6b60d973f0e43bdb61b375c5b
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:18:12 2015 -0700

    Remove unnecessary call to info_cache.delete

    Removed unnecessary call to instance.info_cache.delete from
    _local_delete method because info_cache is deleted by calling
    instance.destroy from _local_delete. Also it raises
    InstanceInfoCacheNotFound exception in a race condition when
    instance.refresh is called after info_cache is deleted by this call.

    Partial-Bug: 1404867
    Change-Id: Ia76ded06a9ce014fb5d9cb35a03ae868d5106ba1

Reviewed: https://review.openstack.org/194063
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Submitter: Jenkins
Branch: master

commit ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    I had submitted same patch [1] earlier which was reverted [2] due to a
    race condition on jenkins if an instance is deleted when it is in
    building state. In this patch I have fixed the failure of race condition
    by reverting the ObjectActionError exception handling in _delete.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41

    Closes-Bug: 1404867
    Closes-Bug: 1408527
    Closes-Bug: 1458308
    Change-Id: Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173

Changed in nova:
status: In Progress → Fix Committed

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

melanie witt (melwitt) wrote :

Marking this as New again because the second change was reverted:

https://review.openstack.org/#/c/251543/

Changed in nova:
status: Fix Committed → New

Fix proposed to branch: master
Review: https://review.openstack.org/256059

Changed in nova:
assignee: Ankit Agrawal (ankitagrawal) → Samuel Matzek (smatzek)
status: New → In Progress

Reviewed: https://review.openstack.org/256059
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b7f83337658181f0e7117c7f3b07f69856ffe405
Submitter: Jenkins
Branch: master

commit b7f83337658181f0e7117c7f3b07f69856ffe405
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The current patch avoids the jenkins race condition by leaving the flow
    for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. This addresses the
    concerns of the referenced bugs.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2

    Co-Authored-By: Ankit Agrawal <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I928a397c75b857e94bf5c002e50ec43a2bed9848

Changed in nova:
status: In Progress → Fix Released
melanie witt (melwitt) wrote :
Changed in nova:
status: Fix Released → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/335697

Changed in nova:
assignee: Samuel Matzek (smatzek) → melanie witt (melwitt)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/335697
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Submitter: Jenkins
Branch: master

commit 5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The patch [5] intended to avoid the jenkins race condition by leaving
    the flow for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. It however also had
    to be reverted [6] because of yet another race condition.

    This version takes a more minimal approach of adding the ERROR state
    to the logic for doing a local delete plus cleanup of resources on
    a compute host. Comments have also been added to the existing code
    to explain more about the different flows.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2
    [5] I928a397c75b857e94bf5c002e50ec43a2bed9848
    [6] I6b9b886e0d6f2ec86141c048fb50969bccf5cb30

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I2192ef513a2cd15d21e9d5d5fe22c5a5fbae0941

Changed in nova:
status: In Progress → Fix Released
melanie witt (melwitt) wrote :
Changed in nova:
status: Fix Released → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/340614

Changed in nova:
status: Confirmed → In Progress

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.openstack.org/339307
Reason: This got squashed into re-proposal https://review.openstack.org/340614

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Atsushi SAKAI (sakaia) wrote :

Which status is correct?
From #18, this issue is "In Progress".
From #20, this issue is "Fix Released"
But for #20, which patch fix this problem?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers