Volume remains in-use status, if instance booted from volume is deleted in error state

Bug #1404867 reported by Abhishek Kekane
56
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
melanie witt
Pike
Fix Committed
Medium
Mohammed Naser
Queens
Fix Committed
Medium
Mohammed Naser

Bug Description

If the instance is booted from volume and goes in to error state due to some reason.
Volume from which instance is booted, remains in-use state even the instance is deleted.
IMO, volume should be detached so that it can be used to boot other instance.

Steps to reproduce:

1. Log in to Horizon, create a new volume.
2. Create an Instance using newly created volume.
3. Verify instance is in active state.
$ source devstack/openrc demo demo
$ nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | ACTIVE | - | Running | private=10.0.0.3 |
+--------------------------------------+------+--------+------------+-------------+------------------+

Note:
Use shelve-unshelve api to see the instance goes into error state.
unshelving volumed back instance does not work and sets instance state to error state (ref: https://bugs.launchpad.net/nova/+bug/1404801)

4. Shelve the instance
$ nova shelve <instance-uuid>

5. Verify the status is SHELVED_OFFLOADED.
$ nova list
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | SHELVED_OFFLOADED | - | Shutdown | private=10.0.0.3 |
+--------------------------------------+------+-------------------+------------+-------------+------------------+

6. Unshelve the instance.
$ nova unshelve <instance-uuid>

5. Verify the instance is in Error state.
$ nova list
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+-------------------+------------+-------------+------------------+
| dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | Error | unshelving | Spawning | private=10.0.0.3 |
+--------------------------------------+------+-------------------+------------+-------------+------------------+

6. Delete the instance using Horizon.

7. Verify that volume still in in-use state
$ cinder list
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+
| 4aeefd25-10aa-42c2-9a2d-1c89a95b4d4f | in-use | test | 1 | lvmdriver-1 | true | 8f7bdc24-1891-4bbb-8f0c-732b9cbecae7 |
+--------------------------------------+--------+------+------+-------------+----------+--------------------------------------+

8. In Horizon, volume "Attached To" information is displayed as "Attached to None on /dev/vda".

9. User is not able to delete this volume, or attached it to another instance as it is still in use.

description: updated
Changed in nova:
assignee: nobody → Abhishek Kekane (abhishek-kekane)
Tushar Patil (tpatil)
summary: - Volume remains in-use status, if instance booted from volume deleted
- when it is in the error state
+ Volume remains in-use status, if instance booted from volume is deleted
+ in error state
Liyingjun (liyingjun)
Changed in nova:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/145738

Changed in nova:
assignee: Abhishek Kekane (abhishek-kekane) → Ankit Agrawal (ankitagrawal)
status: Confirmed → In Progress
Changed in nova:
importance: Undecided → Low
melanie witt (melwitt)
tags: added: compute
removed: ntt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/145738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Submitter: Jenkins
Branch: master

commit d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Author: ankitagrawal <email address hidden>
Date: Tue Dec 23 06:34:32 2014 -0800

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Currently in test_servers.py, "test_delete_server_instance" executes
    similar to "test_delete_server_instance_while_building". This is because
    "test_delete_server_instance" calls instance.save() method which updates
    vm_state to building where it should be in active state.

    Fixed "test_delete_server_instance" to test deleting an instance which
    is in active state and has a valid host.

    Closes-Bug: #1404867
    Closes-Bug: #1408527
    Change-Id: Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/183764

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/kilo)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/183764
Reason: The change on master was reverted, so this would have to be fixed on master first to avoid the race issues and then if you propose a backport to stable/kilo, you have to squash all of those fixes together so we don't have the same race in stable/kilo.

Changed in nova:
status: Fix Committed → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

Marking this as New again since the original change was reverted.

Changed in nova:
status: In Progress → New
Changed in nova:
status: New → In Progress
Revision history for this message
Ankit Agrawal (ankitagrawal) wrote :

This issue is not reproducible with the steps mentioned in bug description after LP bug #1404801 is fixed.

Please find below a different scenario to reproduce this issue on current master:

1. Boot instance from image.
2. Attach volume to the instance.
3. Shelve instance.
4. Delete the snapshot taken during shelve instance.
5. Unshelve instance (Instance goes in to error state).

Now if we delete this instance created at step 1, instance is deleted successfully but volume remains in-use and we are not even able to delete that volume then.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/226690

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/226690
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cb02486816e646f6b60d973f0e43bdb61b375c5b
Submitter: Jenkins
Branch: master

commit cb02486816e646f6b60d973f0e43bdb61b375c5b
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:18:12 2015 -0700

    Remove unnecessary call to info_cache.delete

    Removed unnecessary call to instance.info_cache.delete from
    _local_delete method because info_cache is deleted by calling
    instance.destroy from _local_delete. Also it raises
    InstanceInfoCacheNotFound exception in a race condition when
    instance.refresh is called after info_cache is deleted by this call.

    Partial-Bug: 1404867
    Change-Id: Ia76ded06a9ce014fb5d9cb35a03ae868d5106ba1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/194063
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Submitter: Jenkins
Branch: master

commit ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    I had submitted same patch [1] earlier which was reverted [2] due to a
    race condition on jenkins if an instance is deleted when it is in
    building state. In this patch I have fixed the failure of race condition
    by reverting the ObjectActionError exception handling in _delete.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41

    Closes-Bug: 1404867
    Closes-Bug: 1408527
    Closes-Bug: 1458308
    Change-Id: Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b1

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Revision history for this message
melanie witt (melwitt) wrote :

Marking this as New again because the second change was reverted:

https://review.openstack.org/#/c/251543/

Changed in nova:
status: Fix Committed → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/256059

Changed in nova:
assignee: Ankit Agrawal (ankitagrawal) → Samuel Matzek (smatzek)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/256059
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b7f83337658181f0e7117c7f3b07f69856ffe405
Submitter: Jenkins
Branch: master

commit b7f83337658181f0e7117c7f3b07f69856ffe405
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The current patch avoids the jenkins race condition by leaving the flow
    for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. This addresses the
    concerns of the referenced bugs.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2

    Co-Authored-By: Ankit Agrawal <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I928a397c75b857e94bf5c002e50ec43a2bed9848

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
melanie witt (melwitt) wrote :
Changed in nova:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/335697

Changed in nova:
assignee: Samuel Matzek (smatzek) → melanie witt (melwitt)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/335697
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Submitter: Jenkins
Branch: master

commit 5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The patch [5] intended to avoid the jenkins race condition by leaving
    the flow for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. It however also had
    to be reverted [6] because of yet another race condition.

    This version takes a more minimal approach of adding the ERROR state
    to the logic for doing a local delete plus cleanup of resources on
    a compute host. Comments have also been added to the existing code
    to explain more about the different flows.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2
    [5] I928a397c75b857e94bf5c002e50ec43a2bed9848
    [6] I6b9b886e0d6f2ec86141c048fb50969bccf5cb30

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I2192ef513a2cd15d21e9d5d5fe22c5a5fbae0941

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
melanie witt (melwitt) wrote :
Changed in nova:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340614

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.openstack.org/339307
Reason: This got squashed into re-proposal https://review.openstack.org/340614

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Revision history for this message
Atsushi SAKAI (sakaia) wrote :

Which status is correct?
From #18, this issue is "In Progress".
From #20, this issue is "Fix Released"
But for #20, which patch fix this problem?

melanie witt (melwitt)
Changed in nova:
importance: Low → Medium
Revision history for this message
melanie witt (melwitt) wrote :

This bug is still open and is in progress and the proposed patch is here:

https://review.openstack.org/340614

Changed in nova:
assignee: melanie witt (melwitt) → Charlotte Han (hanrong)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/544747

Changed in nova:
assignee: Charlotte Han (hanrong) → Mohammed Naser (mnaser)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/544748

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/545123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/545132

Changed in nova:
assignee: Mohammed Naser (mnaser) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/544748
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ad9e2a568fcef4f69b185952352db9f651c37ad4
Submitter: Zuul
Branch: master

commit ad9e2a568fcef4f69b185952352db9f651c37ad4
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 18:39:38 2018 -0500

    Store block device mappings in cell0

    If an instance fails to get scheduled, it gets buried in cell0 but
    none of it's block device mappings are stored. At the API layer,
    Nova reserves and creates attachments for new instances when
    it gets a create request so these attachments are orphaned if the
    block device mappings are not registered in the database somewhere.

    This patch makes sure that if an instance is being buried in cell0,
    all of it's block device mappings are recorded as well so they can
    be later removed when the instance is deleted.

    Change-Id: I64074923fb741fbf5459f66b8ab1a23c16f3303f
    Related-Bug: #1404867

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/544747
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20edeb362327ea9448d38b6e2bcb03e0a71060d0
Submitter: Zuul
Branch: master

commit 20edeb362327ea9448d38b6e2bcb03e0a71060d0
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 16:26:38 2018 -0500

    Add functional tests to ensure BDM removal on delete

    In certain cases, such as when an instance fails to be scheduled,
    the volume may already have an attachment created (or the volume
    has been reserved in the old flows).

    This patch adds a test to check that these volume attachments
    are deleted and removed once the instance has been deleted. It
    also adds some functionality to allow checking when an volume
    has been reserved in the Cinder fixtures.

    Change-Id: I85cc3998fbcde30eefa5429913ca287246d51255
    Related-Bug: #1404867

Changed in nova:
assignee: melanie witt (melwitt) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/546201

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/546202

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/546203

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/546204

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/546219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/546220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/546221

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/546222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/546226

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/546227

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/546228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/546230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ocata)

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/546226
Reason: Release not affected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/546227
Reason: Release not affected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/546228
Reason: Release not affected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/546230
Reason: Release not affected.

Revision history for this message
Mohammed Naser (mnaser) wrote :

The stable/ocata changes have been dropped because the commit that introduced this bug was added in Pike:

https://github.com/openstack/nova/commit/63805735c25a54ad1b9b97e05080c1a6153d8e22

Before this, the compute would reserve the volume which means that it will be cleaned up on a normal delete. If scheduling fails, the volume won't be reserved anyways.

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/546275

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/546277

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/340614
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b3f39244a3eacd6fb141de61850cbd84fecdb544
Submitter: Zuul
Branch: master

commit b3f39244a3eacd6fb141de61850cbd84fecdb544
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/545123
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=08f0f71a83ff75c5439e72f0913c5feabe2972ed
Submitter: Zuul
Branch: master

commit 08f0f71a83ff75c5439e72f0913c5feabe2972ed
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 15 16:01:14 2018 -0500

    Add functional recreate test of deleting a BFV server pre-scheduling

    This is another wrinkle for bug 1404867 where we create a
    volume-backed server, create an attachment on the volume which
    puts the volume in 'attaching' status, and then delete the server
    before it's actually created in a cell.

    In this case, the _delete_while_booting code in the compute API
    finds and deletes the BuildRequest before the instance was ever
    created in a cell.

    The bug is that _delete_while_booting in the API doesn't also
    process block device mappings and unreserve/delete attachments
    on the volume, which orphans the volume and can only be fixed
    with admin intervention in the block storage service.

    Change-Id: Ib65acc671711eae7aee65df9cd5c6b2ccb559f5c
    Related-Bug: #1404867

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/545132
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0652e4ab3d506ea21f1bd80b6802c6df5e7b523e
Submitter: Zuul
Branch: master

commit 0652e4ab3d506ea21f1bd80b6802c6df5e7b523e
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 15 16:33:56 2018 -0500

    Detach volumes when deleting a BFV server pre-scheduling

    If the user creates a volume-backed server from an existing
    volume, the API reserves the volume by creating an attachment
    against it. This puts the volume into 'attaching' status.

    If the user then deletes the server before it's created in a
    cell, by deleting the build request, the attached volume is
    orphaned and requires admin intervention in the block storage
    service.

    This change simply pulls the BDMs off the BuildRequest when
    we delete the server via the build request and does the same
    local cleanup of those volumes as we would in a "normal" local
    delete scenario that the instance was created in a cell but
    doesn't have a host.

    We don't have to worry about ports in this scenario since
    ports are created on the compute, in a cell, and if we're
    deleting a build request then we never got far enough to
    create ports.

    Change-Id: I1a576bdb16befabe06a9728d7adf008fc0667077
    Partial-Bug: #1404867

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/546222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546201
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8862b2de236cccf5d0378f485acf90eab5c8c77a
Submitter: Zuul
Branch: stable/queens

commit 8862b2de236cccf5d0378f485acf90eab5c8c77a
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 18:39:38 2018 -0500

    Store block device mappings in cell0

    If an instance fails to get scheduled, it gets buried in cell0 but
    none of it's block device mappings are stored. At the API layer,
    Nova reserves and creates attachments for new instances when
    it gets a create request so these attachments are orphaned if the
    block device mappings are not registered in the database somewhere.

    This patch makes sure that if an instance is being buried in cell0,
    all of it's block device mappings are recorded as well so they can
    be later removed when the instance is deleted.

    Change-Id: I64074923fb741fbf5459f66b8ab1a23c16f3303f
    Related-Bug: #1404867
    (cherry picked from commit ad9e2a568fcef4f69b185952352db9f651c37ad4)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/546202
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1a20e307302670049b8aec7c5c4201c098fbe18e
Submitter: Zuul
Branch: stable/queens

commit 1a20e307302670049b8aec7c5c4201c098fbe18e
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 16:26:38 2018 -0500

    Add functional tests to ensure BDM removal on delete

    In certain cases, such as when an instance fails to be scheduled,
    the volume may already have an attachment created (or the volume
    has been reserved in the old flows).

    This patch adds a test to check that these volume attachments
    are deleted and removed once the instance has been deleted. It
    also adds some functionality to allow checking when an volume
    has been reserved in the Cinder fixtures.

    Change-Id: I85cc3998fbcde30eefa5429913ca287246d51255
    Related-Bug: #1404867
    (cherry picked from commit 20edeb362327ea9448d38b6e2bcb03e0a71060d0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546203
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26e32ce85823bf1d87ec5c729aa408b92f526e38
Submitter: Zuul
Branch: stable/queens

commit 26e32ce85823bf1d87ec5c729aa408b92f526e38
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Conflicts:
          nova/tests/unit/compute/test_compute_api.py

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117
    (cherry picked from commit b3f39244a3eacd6fb141de61850cbd84fecdb544)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546204
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0b4621f509b51241f5c721d3412eda8eba721b46
Submitter: Zuul
Branch: stable/queens

commit 0b4621f509b51241f5c721d3412eda8eba721b46
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 15 16:01:14 2018 -0500

    Add functional recreate test of deleting a BFV server pre-scheduling

    This is another wrinkle for bug 1404867 where we create a
    volume-backed server, create an attachment on the volume which
    puts the volume in 'attaching' status, and then delete the server
    before it's actually created in a cell.

    In this case, the _delete_while_booting code in the compute API
    finds and deletes the BuildRequest before the instance was ever
    created in a cell.

    The bug is that _delete_while_booting in the API doesn't also
    process block device mappings and unreserve/delete attachments
    on the volume, which orphans the volume and can only be fixed
    with admin intervention in the block storage service.

    Change-Id: Ib65acc671711eae7aee65df9cd5c6b2ccb559f5c
    Related-Bug: #1404867
    (cherry picked from commit 08f0f71a83ff75c5439e72f0913c5feabe2972ed)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546277
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c1b3d8c2140197dca2d7037476953cc46cbae793
Submitter: Zuul
Branch: stable/queens

commit c1b3d8c2140197dca2d7037476953cc46cbae793
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 15 16:33:56 2018 -0500

    Detach volumes when deleting a BFV server pre-scheduling

    If the user creates a volume-backed server from an existing
    volume, the API reserves the volume by creating an attachment
    against it. This puts the volume into 'attaching' status.

    If the user then deletes the server before it's created in a
    cell, by deleting the build request, the attached volume is
    orphaned and requires admin intervention in the block storage
    service.

    This change simply pulls the BDMs off the BuildRequest when
    we delete the server via the build request and does the same
    local cleanup of those volumes as we would in a "normal" local
    delete scenario that the instance was created in a cell but
    doesn't have a host.

    We don't have to worry about ports in this scenario since
    ports are created on the compute, in a cell, and if we're
    deleting a build request then we never got far enough to
    create ports.

    Change-Id: I1a576bdb16befabe06a9728d7adf008fc0667077
    Partial-Bug: #1404867
    (cherry picked from commit 0652e4ab3d506ea21f1bd80b6802c6df5e7b523e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc3

This issue was fixed in the openstack/nova 17.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/546219
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=51027abfd6ab2e96b482bdb75f4e9aa84f53a20b
Submitter: Zuul
Branch: stable/pike

commit 51027abfd6ab2e96b482bdb75f4e9aa84f53a20b
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 18:39:38 2018 -0500

    Store block device mappings in cell0

    If an instance fails to get scheduled, it gets buried in cell0 but
    none of it's block device mappings are stored. At the API layer,
    Nova reserves and creates attachments for new instances when
    it gets a create request so these attachments are orphaned if the
    block device mappings are not registered in the database somewhere.

    This patch makes sure that if an instance is being buried in cell0,
    all of it's block device mappings are recorded as well so they can
    be later removed when the instance is deleted.

    Conflicts:
          nova/conductor/manager.py

    Change-Id: I64074923fb741fbf5459f66b8ab1a23c16f3303f
    Related-Bug: #1404867
    (cherry picked from commit ad9e2a568fcef4f69b185952352db9f651c37ad4)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/546220
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=72dcdecdb287578e00d6b37024aa2a5e27a6ae27
Submitter: Zuul
Branch: stable/pike

commit 72dcdecdb287578e00d6b37024aa2a5e27a6ae27
Author: Mohammed Naser <email address hidden>
Date: Wed Feb 14 16:26:38 2018 -0500

    Add functional tests to ensure BDM removal on delete

    In certain cases, such as when an instance fails to be scheduled,
    the volume may already have an attachment created (or the volume
    has been reserved in the old flows).

    This patch adds a test to check that these volume attachments
    are deleted and removed once the instance has been deleted. It
    also adds some functionality to allow checking when an volume
    has been reserved in the Cinder fixtures.

    This backported patch drops the tests for the new-attach flow
    for Cinder as it does not exist in Pike.

    Change-Id: I85cc3998fbcde30eefa5429913ca287246d51255
    Related-Bug: #1404867
    (cherry picked from commit 20edeb362327ea9448d38b6e2bcb03e0a71060d0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/546221
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1e59ed99c794e6554b9a3e1e2ed197de75e14189
Submitter: Zuul
Branch: stable/pike

commit 1e59ed99c794e6554b9a3e1e2ed197de75e14189
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Conflicts:
          nova/tests/unit/compute/test_compute_api.py

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117
    (cherry picked from commit b3f39244a3eacd6fb141de61850cbd84fecdb544)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/546275
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e5a055d2c6f04330cb83b33f8ddd4c9875fdb1b7
Submitter: Zuul
Branch: stable/pike

commit e5a055d2c6f04330cb83b33f8ddd4c9875fdb1b7
Author: Mohammed Naser <email address hidden>
Date: Tue Feb 20 17:11:37 2018 -0500

    Ensure attachment_id always exists for block device mapping

    If an instance is deleted before it is scheduled, the BDM
    clean-up code uses the mappings from the build request as
    they don't exist in the database yet.

    When using the older attachment flow with reserve_volume,
    there is no attachment_id bound to the block device mapping
    and because it is not loaded from database but rather from
    the build request, accessing the attachment_id field raises
    an exception with 'attachment_id not lazy-loadable'

    If we did a new style attach, _validate_bdm will add the
    attachment_id from Cinder. If we did not, then this patch
    will make sure to set it to 'None' to avoid raising an
    exception when checking if we have an attachment_id set in
    the BDM clean-up code

    Conflicts:
          nova/tests/functional/wsgi/test_servers.py

    Change-Id: I3cc775fc7dafe691b97a15e50ae2e93c92f355be
    Closes-Bug: #1750666
    (cherry picked from commit 16c2c8b3ee9d70e928a61ceb1ef5931d40e509a4)

    Detach volumes when deleting a BFV server pre-scheduling

    If the user creates a volume-backed server from an existing
    volume, the API reserves the volume by creating an attachment
    against it. This puts the volume into 'attaching' status.

    If the user then deletes the server before it's created in a
    cell, by deleting the build request, the attached volume is
    orphaned and requires admin intervention in the block storage
    service.

    This change simply pulls the BDMs off the BuildRequest when
    we delete the server via the build request and does the same
    local cleanup of those volumes as we would in a "normal" local
    delete scenario that the instance was created in a cell but
    doesn't have a host.

    We don't have to worry about ports in this scenario since
    ports are created on the compute, in a cell, and if we're
    deleting a build request then we never got far enough to
    create ports.

    Conflicts:
          nova/tests/functional/wsgi/test_servers.py

    Change-Id: I1a576bdb16befabe06a9728d7adf008fc0667077
    Partial-Bug: #1404867
    (cherry picked from commit 0652e4ab3d506ea21f1bd80b6802c6df5e7b523e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Matt Riedemann (mriedem)
no longer affects: nova/ocata
Revision history for this message
s10 (vlad-esten) wrote :

This bug still can happen if instance transitioned into the ERROR state after TooManyInstances exception in nova-conductor [1] with quota.recheck_quota=True being set, because in this case instance isn't buried in the cell0 database and instance_bdms isn't created in the nova cell database. So _local_cleanup_bdm_volumes() doesn't have bdm to cleanup.

[1] https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L1308

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.