listing deleted servers from the API fails after running fill_virtual_interface_list online data migration

Bug #1825034 reported by Matt Riedemann
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Stein
Fix Committed
High
Matt Riedemann

Bug Description

I found this bug while trying to recreate bug 1825018 with a functional test.

The fill_virtual_interface_list online data migration creates a fake mostly empty instance record to satisfy a foreign key constraint in the virtual_interfaces table which is used as a marker when paging across cells to fulfill the migration. The problem is if you list deleted servers (as admin) with the all_tenants=1 and deleted=1 filters, the API will fail with a 500 error trying to load the instance.flavor field:

    b'2019-04-16 15:08:53,720 ERROR [nova.api.openstack.wsgi] Unexpected exception in API method'
    b'Traceback (most recent call last):'
    b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/urllib3/connectionpool.py", line 377, in _make_request'
    b' httplib_response = conn.getresponse(buffering=True)'
    b"TypeError: getresponse() got an unexpected keyword argument 'buffering'"
    b''
    b'During handling of the above exception, another exception occurred:'
    b''
    b'Traceback (most recent call last):'
    b' File "/home/osboxes/git/nova/nova/api/openstack/wsgi.py", line 671, in wrapped'
    b' return f(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/api/validation/__init__.py", line 192, in wrapper'
    b' return func(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/api/validation/__init__.py", line 192, in wrapper'
    b' return func(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/api/validation/__init__.py", line 192, in wrapper'
    b' return func(*args, **kwargs)'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/servers.py", line 136, in detail'
    b' servers = self._get_servers(req, is_detail=True)'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/servers.py", line 330, in _get_servers'
    b' req, instance_list, cell_down_support=cell_down_support)'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/views/servers.py", line 390, in detail'
    b' cell_down_support=cell_down_support)'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/views/servers.py", line 425, in _list_view'
    b' for server in servers]'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/views/servers.py", line 425, in <listcomp>'
    b' for server in servers]'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/views/servers.py", line 222, in show'
    b' show_extra_specs),'
    b' File "/home/osboxes/git/nova/nova/api/openstack/compute/views/servers.py", line 494, in _get_flavor'
    b' instance_type = instance.get_flavor()'
    b' File "/home/osboxes/git/nova/nova/objects/instance.py", line 1191, in get_flavor'
    b' return getattr(self, attr)'
    b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
    b' self.obj_load_attr(name)'
    b' File "/home/osboxes/git/nova/nova/objects/instance.py", line 1114, in obj_load_attr'
    b' self._obj_load_attr(attrname)'
    b' File "/home/osboxes/git/nova/nova/objects/instance.py", line 1158, in _obj_load_attr'
    b' self._load_flavor()'
    b' File "/home/osboxes/git/nova/nova/objects/instance.py", line 967, in _load_flavor'
    b' self.flavor = instance.flavor'
    b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
    b' self.obj_load_attr(name)'
    b' File "/home/osboxes/git/nova/nova/objects/instance.py", line 1101, in obj_load_attr'
    b' objtype=self.obj_name())'
    b'nova.exception.OrphanedObjectError: Cannot call obj_load_attr on orphaned Instance object'
    b'2019-04-16 15:08:53,722 INFO [nova.api.openstack.wsgi] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.'
    b"<class 'nova.exception.OrphanedObjectError'>"
    b'2019-04-16 15:08:53,723 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/detail?all_tenants=1&deleted=1" status: 500 len: 208 microversion: 2.1 time: 0.138964'

Revision history for this message
Matt Riedemann (mriedem) wrote :

The workaround is to archive the deleted marker instance after running the online data migration.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: upgrade
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/653098

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/653131

Matt Riedemann (mriedem)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

I don't really have many good ideas on how to handle this.

We could create the fake instance with a fake flavor but (a) we don't really want to be showing this fake instance in the API and (b) we could fail to lazy-load some other field and then this is a whack-a-mole issue.

We could specifically exclude showing this server with a fake uuid in the API code but that's also pretty gross (albeit something we could eventually remove).

We could hard delete the soft deleted fake instance record if fill_virtual_interface_list is run again and there is nothing else to migrate. There is no hard-delete API for the instance in Stein like this though: https://review.openstack.org/#/c/570202/ (but maybe we could just use the DB API directly).

Other ideas?

Revision history for this message
iain MacDonnell (imacdonn) wrote :

I'm leaning a bit towards specifically excluding the server with the (well-known?) fake UUID. The hard-delete option seems vulnerable to race conditions (this has to work during online migrations, right?).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/653158

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
melanie witt (melwitt) wrote :

I would have been in favor of the last option of hard deleting the fake instance record via direct DB access upon completion of the online data migration, but it doesn't help in the case where multiple runs of the migration are needed.

With that, so far it seems like option 2 (exclude the fake uuid server) is the least bad in that it catches all cases.

I don't like option 1 at all because I agree, I don't think we should show the fake instance in the API.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/653131

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/657420

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/657421

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/653098
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91056970b5359b8410348010ce7a96547272e1c8
Submitter: Zuul
Branch: master

commit 91056970b5359b8410348010ce7a96547272e1c8
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 16 15:18:30 2019 -0400

    Add regression test for bug 1825034

    The fill_virtual_interface_list online data migration creates a
    fake mostly empty instance record to satisfy a foreign key constraint
    in the virtual_interfaces table which is used as a marker when paging
    across cells to fulfill the migration. The problem is if you list deleted
    servers (as admin) with the all_tenants=1 and deleted=1 filters, the API
    will fail with a 500 error trying to load the instance.flavor field.

    This adds a functional regression test for the bug.

    Change-Id: I2030412566dfc6ec23dbf37685f6e6d145f710dc
    Related-Bug: #1825034

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/653158
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9eb685bee156590600889449470b35c993cc4cf
Submitter: Zuul
Branch: master

commit f9eb685bee156590600889449470b35c993cc4cf
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 16 17:47:08 2019 -0400

    Exclude fake marker instance when listing servers

    The fill_virtual_interface_list online data migration added
    in Stein creates a fake instance marker record without some
    fields (like flavor) which will fail to load and result in
    a 500 error when listing deleted servers across all tenants:

      openstack server list --all-projects --deleted

    This fixes the issue by excluding the specific fake marker
    instance when listing servers in the API.

    This admittedly isn't great but it's one of many not-so-great
    options (listed in the bug) and also something that we'll
    eventually remove when we drop the online data migration.

    Change-Id: Ibd34b7f24016641bc251f85e6ea17e8a969c3095
    Closes-Bug: #1825034

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Surya Seetharaman (tssurya) wrote :

I know I am too late to the party here but just thought of leaving a comment here:

Agreed we don't have many great ways of fixing this particular bug (I would have preferred hard delete of the marker too - but then this would still be racy until the time the whole migration is complete since each cell would have a marker) but on a more general note maybe we should add something to the upgrade status checker that would delete persistent markers once it deems the migration complete. We had a similar issues where we had to use objects.RequestSpec.get_by_instance_uuid() for a script and this blew up for the request_spec marker (which had a uuid instead of a JSON blob as the spec) that was left out since one of the migrations in newton.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/657420
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=840f109afc537c4818288a2bea09d550ca66ac14
Submitter: Zuul
Branch: stable/stein

commit 840f109afc537c4818288a2bea09d550ca66ac14
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 16 15:18:30 2019 -0400

    Add regression test for bug 1825034

    The fill_virtual_interface_list online data migration creates a
    fake mostly empty instance record to satisfy a foreign key constraint
    in the virtual_interfaces table which is used as a marker when paging
    across cells to fulfill the migration. The problem is if you list deleted
    servers (as admin) with the all_tenants=1 and deleted=1 filters, the API
    will fail with a 500 error trying to load the instance.flavor field.

    This adds a functional regression test for the bug.

    Change-Id: I2030412566dfc6ec23dbf37685f6e6d145f710dc
    Related-Bug: #1825034
    (cherry picked from commit 91056970b5359b8410348010ce7a96547272e1c8)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/657421
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=af07ddb734474051ed77c425bf8a50426c2d1a88
Submitter: Zuul
Branch: stable/stein

commit af07ddb734474051ed77c425bf8a50426c2d1a88
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 16 17:47:08 2019 -0400

    Exclude fake marker instance when listing servers

    The fill_virtual_interface_list online data migration added
    in Stein creates a fake instance marker record without some
    fields (like flavor) which will fail to load and result in
    a 500 error when listing deleted servers across all tenants:

      openstack server list --all-projects --deleted

    This fixes the issue by excluding the specific fake marker
    instance when listing servers in the API.

    This admittedly isn't great but it's one of many not-so-great
    options (listed in the bug) and also something that we'll
    eventually remove when we drop the online data migration.

    Change-Id: Ibd34b7f24016641bc251f85e6ea17e8a969c3095
    Closes-Bug: #1825034
    (cherry picked from commit f9eb685bee156590600889449470b35c993cc4cf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.1

This issue was fixed in the openstack/nova 19.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Apologies, while trying to fix the description of bug 1751923 I inadvertently updated the description of this bug. Have restored to (hopefully) the original.

description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.