Delete instance without block_device_mapping record in database after schedule error

Bug #1408527 reported by zhangwenjian on 2015-01-08
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
melanie witt
Pike
Medium
Mohammed Naser
Queens
Medium
Mohammed Naser

Bug Description

When a instance with cinder volume is failed to be scheduled to a host, its status becomes error.
Now I delete it successfully, but in block_device_mapping table of nova database, the volume information of the instance is still kept, and not deleted.

description: updated
Changed in nova:
assignee: nobody → Bhavaniprasad (bhavaniprasadadapaka)
assignee: Bhavaniprasad (bhavaniprasadadapaka) → nobody
Changed in nova:
assignee: nobody → Anusha rayani (anusha-rayani)
Ankit Agrawal (ankitagrawal) wrote :

After deleting a volume backed instance, volume information of that instance does not get deleted from database instead it updates "deleted" flag in "block_device_mapping" table of nova database.

I reproduced this issue and IMO, https://review.openstack.org/#/c/145738/ will fix this issue that updates the "deleted" flag appropriately. Could you please confirm.

Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Changed in nova:
assignee: Anusha rayani (anusha-rayani) → Ankit Agrawal (ankitagrawal)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/145738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Submitter: Jenkins
Branch: master

commit d1baa9fe7eb342b63fc85cbb5ef70bb676de6566
Author: ankitagrawal <email address hidden>
Date: Tue Dec 23 06:34:32 2014 -0800

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Currently in test_servers.py, "test_delete_server_instance" executes
    similar to "test_delete_server_instance_while_building". This is because
    "test_delete_server_instance" calls instance.save() method which updates
    vm_state to building where it should be in active state.

    Fixed "test_delete_server_instance" to test deleting an instance which
    is in active state and has a valid host.

    Closes-Bug: #1404867
    Closes-Bug: #1408527
    Change-Id: Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb

Changed in nova:
status: In Progress → Fix Committed

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/183764
Reason: The change on master was reverted, so this would have to be fixed on master first to avoid the race issues and then if you propose a backport to stable/kilo, you have to squash all of those fixes together so we don't have the same race in stable/kilo.

Matt Riedemann (mriedem) wrote :

Marking this as New again since the original change was reverted.

Changed in nova:
status: Fix Committed → New
tags: added: volumes
Changed in nova:
status: New → In Progress

Reviewed: https://review.openstack.org/194063
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Submitter: Jenkins
Branch: master

commit ecdf331bafddfd2bb8c92d3fd96f301bc7ac644f
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason. Volume from which instance is booted, remains
    in-use state even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    I had submitted same patch [1] earlier which was reverted [2] due to a
    race condition on jenkins if an instance is deleted when it is in
    building state. In this patch I have fixed the failure of race condition
    by reverting the ObjectActionError exception handling in _delete.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41

    Closes-Bug: 1404867
    Closes-Bug: 1408527
    Closes-Bug: 1458308
    Change-Id: Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173

Changed in nova:
status: In Progress → Fix Committed

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/256059
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b7f83337658181f0e7117c7f3b07f69856ffe405
Submitter: Jenkins
Branch: master

commit b7f83337658181f0e7117c7f3b07f69856ffe405
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The current patch avoids the jenkins race condition by leaving the flow
    for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. This addresses the
    concerns of the referenced bugs.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2

    Co-Authored-By: Ankit Agrawal <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I928a397c75b857e94bf5c002e50ec43a2bed9848

Reviewed: https://review.openstack.org/335697
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Submitter: Jenkins
Branch: master

commit 5ce74fa06c0e7a70fdc927b2c1f364af83f7de1d
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Detach volume after deleting instance with no host

    If an instance is booted from a volume, shelved, and goes into an error
    state due to some reason, the volume from which instance is booted
    remains even the instance is deleted because instance has no host
    associated with it.

    Called _local_delete() to detach volume and destroy bdm if instance is
    in shelved_offloaded state or has no host associated with it. This will
    cleanup both volumes and the networks.

    Note:
    Ankit had submitted same patch [1] earlier which was reverted [2] due
    to a race condition on jenkins if an instance is deleted when it is in
    building state. The patch was then rebumitted [3] fixing the
    the failure of race condition by reverting the ObjectActionError
    exception handling in _delete. This patch was later re-reverted [4]
    due to continued jenkins race conditions.

    The patch [5] intended to avoid the jenkins race condition by leaving
    the flow for instances in the BUILDING state unchanged and only calling
    _local_delete() on instances in the shelved_offloaded or error states
    when the instance has no host associated with it. It however also had
    to be reverted [6] because of yet another race condition.

    This version takes a more minimal approach of adding the ERROR state
    to the logic for doing a local delete plus cleanup of resources on
    a compute host. Comments have also been added to the existing code
    to explain more about the different flows.

    [1] Ic630ae7d026a9697afec46ac9ea40aea0f5b5ffb
    [2] Id4e405e7579530ed1c1f22ccc972d45b6d185f41
    [3] Ic107d8edc7ee7a4ebb04eac58ef0cdbf506d6173
    [4] Ibcbe35b5d329b183c4d0e8233e8ada26ebc512c2
    [5] I928a397c75b857e94bf5c002e50ec43a2bed9848
    [6] I6b9b886e0d6f2ec86141c048fb50969bccf5cb30

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Change-Id: I2192ef513a2cd15d21e9d5d5fe22c5a5fbae0941

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/546228
Reason: Release not affected.

Reviewed: https://review.openstack.org/340614
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b3f39244a3eacd6fb141de61850cbd84fecdb544
Submitter: Zuul
Branch: master

commit b3f39244a3eacd6fb141de61850cbd84fecdb544
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117

Matt Riedemann (mriedem) on 2018-02-22
Changed in nova:
assignee: Ankit Agrawal (ankitagrawal) → melanie witt (melwitt)
importance: Low → Medium

Reviewed: https://review.openstack.org/546203
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26e32ce85823bf1d87ec5c729aa408b92f526e38
Submitter: Zuul
Branch: stable/queens

commit 26e32ce85823bf1d87ec5c729aa408b92f526e38
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Conflicts:
          nova/tests/unit/compute/test_compute_api.py

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117
    (cherry picked from commit b3f39244a3eacd6fb141de61850cbd84fecdb544)

This issue was fixed in the openstack/nova 17.0.0.0rc3 release candidate.

Reviewed: https://review.openstack.org/546221
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1e59ed99c794e6554b9a3e1e2ed197de75e14189
Submitter: Zuul
Branch: stable/pike

commit 1e59ed99c794e6554b9a3e1e2ed197de75e14189
Author: ankitagrawal <email address hidden>
Date: Wed Sep 23 03:58:19 2015 -0700

    Clean up ports and volumes when deleting ERROR instance

    Usually, when instance.host = None, it means the instance was never
    scheduled. However, the exception handling routine in compute manager
    [1] will set instance.host = None and set instance.vm_state = ERROR
    if the instance fails to build on the compute host. If that happens, we
    end up with an instance with host = None and vm_state = ERROR which may
    have ports and volumes still allocated.

    This adds some logic around deleting the instance when it may have
    ports or volumes allocated.

      1. If the instance is not in ERROR or SHELVED_OFFLOADED state, we
         expect instance.host to be set to a compute host. So, if we find
         instance.host = None in states other than ERROR or
         SHELVED_OFFLOADED, we consider the instance to have failed
         scheduling and not require ports or volumes to be freed, and we
         simply destroy the instance database record and return. This is
         the "delete while booting" scenario.

      2. If the instance is in ERROR because of a failed build or is
         SHELVED_OFFLOADED, we expect instance.host to be None even though
         there could be ports or volumes allocated. In this case, run the
         _local_delete routine to clean up ports and volumes and delete the
         instance database record.

    Co-Authored-By: Ankit Agrawal <email address hidden>
    Co-Authored-By: Samuel Matzek <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>

    Closes-Bug: 1404867
    Closes-Bug: 1408527

    Conflicts:
          nova/tests/unit/compute/test_compute_api.py

    [1] https://github.com/openstack/nova/blob/55ea961/nova/compute/manager.py#L1927-L1929

    Change-Id: I4dc6c8bd3bb6c135f8a698af41f5d0e026c39117
    (cherry picked from commit b3f39244a3eacd6fb141de61850cbd84fecdb544)

This issue was fixed in the openstack/nova 16.1.1 release.

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Matt Riedemann (mriedem) on 2018-10-02
no longer affects: nova/ocata
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers