n-cpu fails to start in the multinode job: Conflicting resource provider name: <uuid> already exists

Bug #1737395 reported by Dmitry Tantsur
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Confirmed
High
Unassigned

Bug Description

Failed to create resource provider record in placement API for UUID da67cfdf-001f-4f0a-8760-8f2a97942d61. Got 409: {"errors": [{"status": 409, "request_id": "req-77651e46-b0bb-4d9f-83c0-b0879b26f1de", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: 131104ba-59b4-468f-8537-03c97319258c already exists. ", "title": "Conflict"}]}.

It's not apparent if this causes a CI job failure or not, but Tempest fails with:

tempest.exceptions.BuildErrorException: Server bbeb8b78-cab7-41d5-9e62-4876252bf164 failed to build and is in ERROR status
Details: {u'created': u'2017-12-09T11:53:21Z', u'details': u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1851, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/opt/stack/new/nova/nova/compute/manager.py", line 2115, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'code': 500, u'message': u'Compute host 3 could not be found.\nTraceback (most recent call last):\n\n File "/opt/stack/new/nova/nova/conductor/manager.py", line 123, in _object_dispatch\n return getattr(target, method)(*args, **kwargs)\n\n File "/usr/local/lib/python2.7/dist-package'}

Example failure: http://logs.openstack.org/29/526429/4/check/ironic-tempest-dsvm-ipa-wholedisk-agent_ipmitool-tinyipa-multinode/98f0e39/logs/subnode-2/screen-n-cpu.txt.gz?level=INFO#_Dec_09_11_38_56_021749

Tags: gate
Dmitry Tantsur (divius)
description: updated
Revision history for this message
Dmitry Tantsur (divius) wrote :
Revision history for this message
Pavlo Shchelokovskyy (pshchelo) wrote :

W/o proper investigation yet, I suspect this might be due to hashring in nova computes is being rebalanced when we stop/restart a compute - a given compute sees the "new" nodes (that were handled by a stopped one) and tries to register them in placement but they are already there as they were registered by the old compute..

But again, this is just an idea to investigate as I might be totally wrong.

Revision history for this message
Dmitry Tantsur (divius) wrote :

Potential breaking patch: https://review.openstack.org/#/c/524263/. If so, it means that we had this problem before, but it was not treated as error.

Revision history for this message
Dmitry Tantsur (divius) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.