Resource_provider entry related to a deleted compute node, unable to migrate vms to the node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Description
===========
Migrating vm to a node was failing with the following error:
"There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: mymachine.maas already exists."
https:/
Steps to reproduce
==================
We found that the compute node was added multiple times:
Compute node was added multiple time, the valid one is created_at: 2019-08-22 18:47:31
mysql> select created_at, deleted_at from compute_nodes where host="mymachine";
+------
| created_at | deleted_at |
+------
| 2019-08-22 18:47:31 | NULL |
| 2019-08-21 11:50:26 | 2019-08-22 11:04:27 |
| 2019-08-22 16:25:52 | 2019-08-22 16:58:42 |
| 2019-08-22 18:42:39 | 2019-08-22 18:45:36 |
+------
4 rows in set (0.00 sec)
and the resource provider entry was related to an already deleted compute node:
mysql> select created_at from resource_providers where name="mymachine
+------
| created_at |
+------
| 2019-08-22 18:42:40 |
+------
1 row in set (0.00 sec)
We tried to delete it:
mysql> delete from resource_providers where name="mymachine
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`nova_
It is strange that root_provider_id seems to reference the same row of the same table making deletion of any row of this table impossible:
mysql> select id,root_provider_id from resource_providers;
+----+-
| id | root_provider_id |
+----+-
| 1 | 1 |
| 4 | 4 |
| 7 | 7 |
| 10 | 10 |
| 13 | 13 |
| 16 | 16 |
| 19 | 19 |
| 22 | 22 |
| 28 | 28 |
| 31 | 31 |
| 34 | 34 |
| 37 | 37 |
| 40 | 40 |
| 43 | 43 |
| 45 | 45 |
| 52 | 52 |
| 55 | 55 |
| 58 | 58 |
| 61 | 61 |
| 64 | 64 |
| 67 | 67 |
| 70 | 70 |
| 73 | 73 |
| 76 | 76 |
| 79 | 79 |
| 82 | 82 |
| 91 | 91 |
+----+-
Expected result
===============
Resource provider entry should be deleted when a compute node is deleted allowing to migrate vm to the node.
Workaround
===============
we updated name to invalid:
mysql> update resource_providers set name="invalid" where name="mymachine
Query OK, 1 row affected (0.01 sec)
Restarted nova-compute on the node with
systemctl restart nova-compute
Resource provider entry got recreated:
mysql> select * from resource_providers where name="mymachine
+------
| created_at | updated_at | id | uuid | name | generation | can_host | root_provider_id | parent_provider_id |
+------
| 2019-10-24 15:16:51 | 2019-10-24 15:18:12 | 384 | e6dabd5d-
+------
And migration worked.
Environment
===============
xenial-queens cloud
Nova compute node:
dpkg -l | grep nova
ii nova-api-metadata 2:17.0.
ii nova-common 2:17.0.
ii nova-compute 2:17.0.
ii nova-compute-kvm 2:17.0.
ii nova-compute-
ii python-nova 2:17.0.
ii python-novaclient 2:9.1.1-
Nova Cloud Controller
dpkg -l | grep nova
ii nova-api-os-compute 2:17.0.
ii nova-common 2:17.0.
ii nova-conductor 2:17.0.
ii nova-consoleauth 2:17.0.
ii nova-novncproxy 2:17.0.
ii nova-placement-api 2:17.0.
ii nova-scheduler 2:17.0.
ii nova-spiceproxy 2:17.0.
ii python-nova 2:17.0.
ii python-novaclient 2:9.1.1-
Changed in nova: | |
status: | New → Incomplete |
The oldest nova log file has the resource provider created event & the start of the error messages:
https:/ /pastebin. canonical. com/p/xKzF5qZNZ v/