Failure to connect to power source during cleaning leads to nodes stuck in cleaning state

Bug #2069074 reported by Chris Krelle

This bug report will be marked for expiration in 26 days if no further activity occurs. (find out why)

6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Incomplete
Undecided
Unassigned

Bug Description

if a ironic is unable to connect to a nodes power source durning cleaning the node can become stuck in cleaning state. The only way to free the stuck node is to restart the ironic conductor.

If a node has a issue connecting to the power source it should end up in clean failed, or we need a way to recover the node with out restarting the conductor service

currently the abort, manage, and undeploy all fail with:
The requested action "XXXX" can not be performed on node "cda5b9f1-e467-4694-8514-cd5d2beb4f83" while it is in state "cleaning". (HTTP 400)

Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

Can we get some sort of logging detail around the window of this failure which causes this.

As well as the state your finding the node in. It looks like it is a power action *not* related to the teardown of the node, but any failure gearing up or during the process.

i.e. not related to https://github.com/openstack/ironic/blob/master/ironic/conductor/cleaning.py#L260-L268 but we need to make sure.

Changed in ironic:
status: New → Incomplete
Revision history for this message
Julia Kreger (juliaashleykreger) wrote :

I chatted with nobodycam this past weekend, apparently the base issue is the early power operations, in his environment on some nodes can fail which orphans the node in "cleaning" state.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.