While testing my NUC, the node got into a state where Ironic believed cleaning was in progress, but the node was actually booted into the instance (from local disk). While in this situation, I could not manage the node via Ironic -- the log of my attempts are below. The only way out was a manual reboot of the hardware, outside of Ironic. I did not see any errors in ironic-conductor.log during this whole time.
Perhaps we should allow power state changes while a node is in cleaning state?
ironic node-list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| a8cb6624-0d9f-c882-affc-046ebb96ec01 | None | None | power on | cleaning | False |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
ironic node-set-provision-state a8cb6624-0d9f-c882-affc-046ebb96ec01 deleted
The requested action "deleted" can not be performed on node "a8cb6624-0d9f-c882-affc-046ebb96ec01" while it is in state "cleaning". (HTTP 400)
ironic node-set-power-state a8cb6624-0d9f-c882-affc-046ebb96ec01 off
The requested action "power off" can not be performed on node "a8cb6624-0d9f-c882-affc-046ebb96ec01" while it is in state "cleaning". (HTTP 400)
ironic node-set-provision-state a8cb6624-0d9f-c882-affc-046ebb96ec01 manage
The requested action "manage" can not be performed on node "a8cb6624-0d9f-c882-affc-046ebb96ec01" while it is in state "cleaning". (HTTP 400)
ironic node-set-power-state a8cb6624-0d9f-c882-affc-046ebb96ec01 reboot
The requested action "rebooting" can not be performed on node "a8cb6624-0d9f-c882-affc-046ebb96ec01" while it is in state "cleaning". (HTTP 400)
Allowing changing power state in cleaning is a good way to break it :) also, IIRC we don't allow changing power state e.g. in deploy, do we?
maybe instead we should make cleaning retry itself?