Cells: Race deleting instance can lead to instances "undeleted" at the top

Bug #1460350 reported by melanie witt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
melanie witt

Bug Description

Seen in check-tempest-dsvm-cells job failure, example trace from [1]:

Traceback (most recent call last):
  File "tempest/api/compute/servers/test_list_servers_negative.py", line 153, in test_list_servers_detail_server_is_deleted
    self.assertEqual([], actual)
  File "/opt/stack/new/tempest/.tox/all/local/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/opt/stack/new/tempest/.tox/all/local/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: !=:
reference = []
actual = [{u'OS-DCF:diskConfig': u'MANUAL',
  u'OS-EXT-AZ:availability_zone': u'nova',
  u'OS-EXT-STS:power_state': 0,
  u'OS-EXT-STS:task_state': None,
  u'OS-EXT-STS:vm_state': u'deleted',
  u'OS-SRV-USG:launched_at': None,
  u'OS-SRV-USG:terminated_at': u'2015-05-17T15:46:15.000000',
  u'accessIPv4': u'',
  u'accessIPv6': u'',
  u'addresses': {},
  u'config_drive': u'',
  u'created': u'2015-05-17T15:46:15Z',
  u'flavor': {u'id': u'42',
              u'links': [{u'href': u'http://127.0.0.1:8774/82eeb74985844a9daa71b162f663e981/flavors/42',
                          u'rel': u'bookmark'}]},
  u'hostId': u'',
  u'id': u'45b1decf-8f52-4075-8869-acb9f48de159',
  u'image': {u'id': u'990c6a37-da73-4a74-be3d-eff98dcf7727',
             u'links': [{u'href': u'http://127.0.0.1:8774/82eeb74985844a9daa71b162f663e981/images/990c6a37-da73-4a74-be3d-eff98dcf7727',
                         u'rel': u'bookmark'}]},
  u'key_name': None,
  u'links': [{u'href': u'http://127.0.0.1:8774/v2/82eeb74985844a9daa71b162f663e981/servers/45b1decf-8f52-4075-8869-acb9f48de159',
              u'rel': u'self'},
             {u'href': u'http://127.0.0.1:8774/82eeb74985844a9daa71b162f663e981/servers/45b1decf-8f52-4075-8869-acb9f48de159',
              u'rel': u'bookmark'}],
  u'metadata': {},
  u'name': u'ListServersNegativeTestJSON-instance-1205034409',
  u'os-extended-volumes:volumes_attached': [],
  u'status': u'DELETED',
  u'tenant_id': u'82eeb74985844a9daa71b162f663e981',
  u'updated': u'2015-05-17T15:46:15Z',
  u'user_id': u'aac5bd38fc264166b9b365426557b4d2'}]

The test creates an instance and immediately deletes it before it's scheduled. After the delete has happened at the top via local delete, updates from the child cells can arrive and "undelete" the instance. This is possible because the code in nova/cells/messaging.py does read_deleted='yes' and db.instance_update() will update all fields provided (unlike objects). I also tried removing the local delete logic in nova/compute/cells_api.py and it didn't help -- the destroy in the child will trigger a instance_destroy_at_top() but it's still possible for the instance.save() update to occur after the destroy, resulting again in "undeleted" instance.

This issue should go away when instance_update_at_top() is converted to use objects, as the "deleted" etc fields won't ever be in what_changed.

[1] http://logs.openstack.org/09/183909/2/check/check-tempest-dsvm-cells/9799bb0

Tags: cells
Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/176518
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eaaa659333c7586a71155c065dfb0f7b7e3758fc
Submitter: Jenkins
Branch: master

commit eaaa659333c7586a71155c065dfb0f7b7e3758fc
Author: melanie witt <email address hidden>
Date: Wed Mar 11 03:28:36 2015 +0000

    Send Instance object to cells instance_update_at_top

    Currently, a primitivized object is sent to sync to the API cell
    in Instance.save because instance_update_at_top has not yet been
    converted to handle objects. This change does the conversion and
    makes Instance.save send an object for the sync.

    This change should also address a race where deleting an instance
    can result in an "undeleted" instance if an update from a child
    occurs after the instance has been destroyed at the top, because
    in instance_update_at_top() it uses read_deleted='yes' and
    db.instance_update() will update all fields provided, unlike
    objects which only update fields that have changed.

    Closes-Bug: #1460350

    Change-Id: I4e8c1a82a3c9c86038faa7f528b9dfb835f82ee6

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.