Comment 3 for bug 1326279

Revision history for this message
aeva black (tenbrae) wrote : Re: nova operations can lead to nodes entering maintenance mode

Based on the provided status output, this error is not the result of a Nova operation --

| last_error | During sync_power_state, max retries exceeded for node

The Nova operation happened to coincide with this node becoming inaccessible by the IPMITool Power Driver, and during a periodic sync_power_state poll, after the conductor failed to determine the node's power state $max_retries times consecutively, Ironic removed that node from service. Any instance state is preserved by this action, so that Ironic can attempt to resume the prior operation once the operator restores connectivity to the node.

Alternatively, the operator may delete the node, which should remove the "error:deleting" instance from Nova as well.

The underlying failure is not presented here -- perhaps the networking was inaccessible, or the BMC crashed, I can't tell.

If this happens during a "nova boot", the user of nova should re-issue their request, and the nova scheduler will find another (non-maintenance) node to deploy it to. The purpose of automatically moving a node to maintenance mode under certain failure conditions is to prevent further Nova failures which would otherwise occur when attempting to deploy instances to a node that is physically not manageable by Ironic.