MigrationsAdminTest fails with NoValidHost because resource claim swap in placement fails with 500

Bug #1739453 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Chris Dent
Pike
Fix Committed
Medium
Matt Riedemann

Bug Description

http://logs.openstack.org/84/529184/2/check/legacy-tempest-dsvm-py35/888d647/job-output.txt.gz#_2017-12-20_16_08_20_862659

Fails in conductor here when swapping the resource allocation from the instance to the migration record:

http://logs.openstack.org/84/529184/2/check/legacy-tempest-dsvm-py35/888d647/logs/screen-n-super-cond.txt.gz#_Dec_20_15_15_20_744636

Dec 20 15:15:20.744636 ubuntu-xenial-citycloud-lon1-0001533915 nova-conductor[21763]: WARNING nova.scheduler.client.report [None req-bd8ccca7-0a5a-4b8f-a129-bfd147f72fe5 tempest-MigrationsAdminTest-1384405657 tempest-MigrationsAdminTest-1384405657] Unable to submit allocation for instance d44e9a86-5ebd-4229-b516-6428ace9cb09 (500 {"computeFault": {"code": 500, "message": "The server has either erred or is incapable of performing the requested operation."}})
Dec 20 15:15:20.747237 ubuntu-xenial-citycloud-lon1-0001533915 nova-conductor[21763]: ERROR nova.conductor.tasks.migrate [None req-bd8ccca7-0a5a-4b8f-a129-bfd147f72fe5 tempest-MigrationsAdminTest-1384405657 tempest-MigrationsAdminTest-1384405657] [instance: 8befd9e7-4df0-40b6-97a0-1e268e00108f] Unable to replace resource claim on source host ubuntu-xenial-citycloud-lon1-0001533915 node ubuntu-xenial-citycloud-lon1-0001533915 for instance

The failure in the placement logs:

http://logs.openstack.org/84/529184/2/check/legacy-tempest-dsvm-py35/888d647/logs/screen-placement-api.txt.gz#_Dec_20_15_15_20_666337

Dec 20 15:15:20.726882 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack Traceback (most recent call last):
Dec 20 15:15:20.727033 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/__init__.py", line 82, in __call__
Dec 20 15:15:20.727187 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return req.get_response(self.application)
Dec 20 15:15:20.727332 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/request.py", line 1327, in send
Dec 20 15:15:20.727509 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack application, catch_exc_info=False)
Dec 20 15:15:20.727670 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/request.py", line 1291, in call_application
Dec 20 15:15:20.727830 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack app_iter = application(self.environ, start_response)
Dec 20 15:15:20.727982 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/dec.py", line 131, in __call__
Dec 20 15:15:20.730042 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack resp = self.call_func(req, *args, **self.kwargs)
Dec 20 15:15:20.730225 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/dec.py", line 196, in call_func
Dec 20 15:15:20.730397 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return self.func(req, *args, **kwargs)
Dec 20 15:15:20.730590 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/microversion.py", line 117, in __call__
Dec 20 15:15:20.730783 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack response = req.get_response(self.application)
Dec 20 15:15:20.730957 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/request.py", line 1327, in send
Dec 20 15:15:20.731124 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack application, catch_exc_info=False)
Dec 20 15:15:20.731361 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/request.py", line 1291, in call_application
Dec 20 15:15:20.731508 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack app_iter = application(self.environ, start_response)
Dec 20 15:15:20.731669 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/handler.py", line 220, in __call__
Dec 20 15:15:20.731812 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return dispatch(environ, start_response, self._map)
Dec 20 15:15:20.731952 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/handler.py", line 147, in dispatch
Dec 20 15:15:20.732137 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return handler(environ, start_response)
Dec 20 15:15:20.732295 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/dec.py", line 131, in __call__
Dec 20 15:15:20.732436 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack resp = self.call_func(req, *args, **self.kwargs)
Dec 20 15:15:20.732606 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
Dec 20 15:15:20.732749 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack super(PlacementWsgify, self).call_func(req, *args, **kwargs)
Dec 20 15:15:20.732883 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/webob/dec.py", line 196, in call_func
Dec 20 15:15:20.733016 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return self.func(req, *args, **kwargs)
Dec 20 15:15:20.733170 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/microversion.py", line 257, in decorated_func
Dec 20 15:15:20.733320 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return _find_method(f, version, status_code)(req, *args, **kwargs)
Dec 20 15:15:20.733480 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/util.py", line 167, in decorated_function
Dec 20 15:15:20.733638 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return f(req)
Dec 20 15:15:20.733782 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation.py", line 424, in set_allocations_for_consumer
Dec 20 15:15:20.733929 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return _set_allocations_for_consumer(req, ALLOCATION_SCHEMA_V1_8)
Dec 20 15:15:20.734070 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/placement/handlers/allocation.py", line 390, in _set_allocations_for_consumer
Dec 20 15:15:20.734210 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack allocations.create_all()
Dec 20 15:15:20.734351 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 2148, in create_all
Dec 20 15:15:20.734498 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack self._set_allocations(self._context, self.objects)
Dec 20 15:15:20.734666 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/oslo_db/api.py", line 147, in wrapper
Dec 20 15:15:20.734815 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack ectxt.value = e.inner_exc
Dec 20 15:15:20.734956 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 20 15:15:20.735105 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack self.force_reraise()
Dec 20 15:15:20.735247 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 20 15:15:20.735387 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack six.reraise(self.type_, self.value, self.tb)
Dec 20 15:15:20.735527 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
Dec 20 15:15:20.735684 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack raise value
Dec 20 15:15:20.735858 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/oslo_db/api.py", line 135, in wrapper
Dec 20 15:15:20.736006 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return f(*args, **kwargs)
Dec 20 15:15:20.736174 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 984, in wrapper
Dec 20 15:15:20.736324 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack return fn(*args, **kwargs)
Dec 20 15:15:20.736480 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack File "/opt/stack/new/nova/nova/objects/resource_provider.py", line 2029, in _set_allocations
Dec 20 15:15:20.736665 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack reason='already created')
Dec 20 15:15:20.736805 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack nova.exception.ObjectActionError: Object action create failed because: already created
Dec 20 15:15:20.736955 ubuntu-xenial-citycloud-lon1-0001533915 <email address hidden>[15195]: ERROR nova.api.openstack

Looks like we're racing to create allocations and they are already created somehow.

Tags: placement
Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Matt Riedemann (mriedem) wrote :

I'm not sure how this happens, or if https://review.openstack.org/#/c/528089/ might fix this.

Revision history for this message
Chris Dent (cdent) wrote :

This is tres bizarre. In order for AllocationsList._set_allocations to throw ObjectActionError from that line, the incoming Allocation objects need to have an 'id' field, which ought to be impossible. More details to come.

Revision history for this message
Chris Dent (cdent) wrote :

Ah, okay, I think it's this: If db deadlock happens after at least one trip through the loop at https://github.com/openstack/nova/blob/609ddc2244bf9ba84d66cced52e7d4e2987ec669/nova/objects/resource_provider.py#L2061 then a member of the allocs list will have a .id attribute the next time the method is tried.

So we either need to dup the incoming list, or not assign alloc.id like we do. I'll look into it.

Changed in nova:
assignee: nobody → Chris Dent (cdent)
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/529397

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/529397
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3491f3d6f2334b80c364deeb21d65004262c2846
Submitter: Zuul
Branch: master

commit 3491f3d6f2334b80c364deeb21d65004262c2846
Author: Chris Dent <email address hidden>
Date: Wed Dec 20 18:35:53 2017 +0000

    Do not set allocation.id in AllocationList.create_all()

    The _set_allocations method used by AllocationList.create_all is
    side-effecty: it sets the 'id' attribute on the list of Allocation
    objects that is passed to it.

    At the start of the method the incoming Allocation objects are
    checked to see if they have already been created, by checking
    for an 'id' field.

    Meanwhile, _set_allocations is also configured to retry on db
    deadlock. The deadlock can happen for a variety of reasons within
    the transaction. The original theory, discussed in the original
    fix, I2c276dc0125b5b9f7a54a1cd431b1b2f5239e93a, is that it is
    during resource provider generation checks. In the associated bug it
    looks like it may happen sometimes while inserting allocations.

    In either case, if we have gone through the 'for alloc in allocs'
    loop at least once, the contents of the 'allocs' list has been
    modified to have at least one of the alloc.id fields set. Upon
    retry, the 'id' field check at the start of the method will fail,
    leading to an ObjectActionError and an eventual 500 at the placement
    API level.

    This change takes the simplest approach and simply removes the setting
    of the 'id' attribute on the allocations in the 'allocs' list. There are
    other ways to deal with this, this is the least intrusive. It works
    because:

    * create_all is only called from the allocation handler in placement,
      and the objects are not used (the response bodies are empty)
    * other than the 'id' change, the alloc members in the allocs list
      are otherwise unchanged
    * this kind of side-effecty business is dangerous, so let's not
      rely on it

    Tests which were relying on the side-effecty business have been adjusted
    accordingly.

    Change-Id: I3c7aea7d8959a20c3c404bc6616b47336ff40b67
    Closes-Bug: #1739453

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/530794

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b3

This issue was fixed in the openstack/nova 17.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/530794
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cedcef7ecb9673530ca0241c2976dfd862c4f75a
Submitter: Zuul
Branch: stable/pike

commit cedcef7ecb9673530ca0241c2976dfd862c4f75a
Author: Chris Dent <email address hidden>
Date: Wed Dec 20 18:35:53 2017 +0000

    Do not set allocation.id in AllocationList.create_all()

    The _set_allocations method used by AllocationList.create_all is
    side-effecty: it sets the 'id' attribute on the list of Allocation
    objects that is passed to it.

    At the start of the method the incoming Allocation objects are
    checked to see if they have already been created, by checking
    for an 'id' field.

    Meanwhile, _set_allocations is also configured to retry on db
    deadlock. The deadlock can happen for a variety of reasons within
    the transaction. The original theory, discussed in the original
    fix, I2c276dc0125b5b9f7a54a1cd431b1b2f5239e93a, is that it is
    during resource provider generation checks. In the associated bug it
    looks like it may happen sometimes while inserting allocations.

    In either case, if we have gone through the 'for alloc in allocs'
    loop at least once, the contents of the 'allocs' list has been
    modified to have at least one of the alloc.id fields set. Upon
    retry, the 'id' field check at the start of the method will fail,
    leading to an ObjectActionError and an eventual 500 at the placement
    API level.

    This change takes the simplest approach and simply removes the setting
    of the 'id' attribute on the allocations in the 'allocs' list. There are
    other ways to deal with this, this is the least intrusive. It works
    because:

    * create_all is only called from the allocation handler in placement,
      and the objects are not used (the response bodies are empty)
    * other than the 'id' change, the alloc members in the allocs list
      are otherwise unchanged
    * this kind of side-effecty business is dangerous, so let's not
      rely on it

    Tests which were relying on the side-effecty business have been adjusted
    accordingly.

    Conflicts:
          nova/objects/resource_provider.py
          nova/tests/unit/objects/test_resource_provider.py

    NOTE(mriedem): The resource_provider.py conflict is due to not
    having 31751a7149a3bb4b51087aecdc3b14c061c60d9b in Pike. The
    test_resource_provider.py conflict is due to not having
    5a1ef8fa862321e50a9063f22b27dd5fd93446e7 in Pike. Furthermore,
    the functional test had to be adjusted because of not having
    db4d6799f1a220f69aa804754a0913065ff9436a in Pike.

    Change-Id: I3c7aea7d8959a20c3c404bc6616b47336ff40b67
    Closes-Bug: #1739453
    (cherry picked from commit 3491f3d6f2334b80c364deeb21d65004262c2846)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.