node permanently stuck in deploying state

Bug #1354147 reported by James Slagle
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
Confirmed
Medium
Unassigned

Bug Description

I had a deployment fail. I OOM'd my virt host deploying to too many baremetal node vm's. Now, the associated nodes in ironic are permanently wedged in the deploying provision_state with apparently no way to free them up:

[root@localhost ~]# ironic node-list
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+
| uuid | instance_uuid | power_state | provision_state | maintenance |
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+
| b826ec2d-6690-41e4-a0c7-57ab3bf2f49b | None | power off | None | False |
| 39668eeb-85cc-46ab-a501-385ae4ce478b | c71efe23-825e-4804-9beb-a13582043127 | power on | deploying | False |
| 291b70f4-4268-4602-af3b-da669ec6c13a | 72f126a0-f689-4e53-b13b-f60bad786a7b | power on | deploying | False |
| fb354bc0-0cf8-41ab-a375-f1ad881ac356 | b2c33b2a-f240-4e9d-b200-7c0ccecb041c | power on | deploying | False |
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+
[root@localhost ~]# ironic node-set-power-state 39668eeb-85cc-46ab-a501-385ae4ce478b off
Node 39668eeb-85cc-46ab-a501-385ae4ce478b is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409)
[root@localhost ~]# ironic node-set-provision-state 39668eeb-85cc-46ab-a501-385ae4ce478b active
Node 39668eeb-85cc-46ab-a501-385ae4ce478b is already being provisioned or decommissioned. (HTTP 409)
[root@localhost ~]# ironic node-set-provision-state 39668eeb-85cc-46ab-a501-385ae4ce478b deleted
Node 39668eeb-85cc-46ab-a501-385ae4ce478b is already being provisioned or decommissioned. (HTTP 409)
[root@localhost ~]# ironic node-delete 39668eeb-85cc-46ab-a501-385ae4ce478b
Node 39668eeb-85cc-46ab-a501-385ae4ce478b is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409)

Database hacks are required to get around this.

Maybe we need a --force somewhere?

Tags: conductor
Revision history for this message
Dmitry Tantsur (divius) wrote :

Have you tried node-delete on it? What's the result? Because I guess the only place where we can have --force is deleting of such nodes.

Changed in ironic:
status: New → Incomplete
Revision history for this message
James Slagle (james-slagle) wrote :

The result from node-delete is in the bug description

Changed in ironic:
status: Incomplete → New
Revision history for this message
Dmitry Tantsur (divius) wrote :

After some thoughts, I believe the main issue is caused by the remaining locks in database. We need to check our locking code for this situation.

Changed in ironic:
status: New → Confirmed
importance: Undecided → Medium
tags: added: conductor
Revision history for this message
Abhishek Mukherjee (linkinpark342) wrote :

Is the only workaround still to raw-sql the database for this?

Revision history for this message
Abhishek Mukherjee (linkinpark342) wrote :

This may be a dupe of #1406181, as pointed out by my colleague

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.