Comment 13 for bug 1947403

Revision history for this message
Harald Jensås (harald-jensas) wrote :

2021-10-19 08:17:28.299 8 DEBUG ironic.conductor.task_manager [req-0daec60d-e154-48e1-8ca8-4fafcb28c015 - - - - -] Attempting to get shared lock on node 677d3491-814a-428e-8be6-f309d6ac95d6 (for power state sync) __init__ /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:226

2021-10-19 08:17:32.110 8 DEBUG ironic.conductor.task_manager [req-0daec60d-e154-48e1-8ca8-4fafcb28c015 - - - - -] Successfully released shared lock for power state sync on node 677d3491-814a-428e-8be6-f309d6ac95d6 (lock was held 3.81 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:418

2021-10-19 08:17:41.816 8 DEBUG ironic.conductor.manager [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] RPC change_node_power_state called for node 677d3491-814a-428e-8be6-f309d6ac95d6. The desired new state is rebooting. change_node_power_state /usr/lib/python3.6/site-packages/ironic/conductor/manager.py:303

Desired state is rebooting, this should end up calling:
  ironic.drivers.modules.ipmitool.IPMIPower.reboot

The reboot method will call the following methods in the "good path":
  ironic.drivers.modules.ipmitool._power_off
  ironic.drivers.utils.ensure_next_boot_device
  ironic.drivers.modules.ipmitool._power_on

The _power_off method is calling:
  ironic.drivers.modules.ipmitool._set_and_wait
Which returns
  ironic.conductor.utils.node_wait_for_power_state

This is what raises this error:
2021-10-19 08:18:22.683 8 ERROR ironic.conductor.utils [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] Timed out after 30 secs waiting for power off on node 677d3491-814a-428e-8be6-f309d6ac95d6.: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 20.60 seconds

The lock is released:
2021-10-19 08:18:22.734 8 DEBUG ironic.conductor.task_manager [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] Successfully released exclusive lock for changing node power state on node 677d3491-814a-428e-8be6-f309d6ac95d6 (lock was held 40.87 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:418

Result is that the "ironic.drivers.modules.ipmitool.IPMIPower.reboot" never gets to run:
  ironic.drivers.utils.ensure_next_boot_device
  ironic.drivers.modules.ipmitool._power_on

i.e the node remains powered off, the _power_on step of the reboot is never called.

We have 30 seconds in Train:
# Number of seconds to wait for power operations to complete,
# i.e., so that a baremetal node is in the desired power
# state. If timed out, the power operation is considered a
# failure. (integer value)
# Minimum value: 2
#power_state_change_timeout = 30

Upstream the default is 60 seconds, modified in Change-Id: I0aa5131504b60b13d43c73c9a3be1f50f7855cbc.

Let's try to bump that in CI?