2021-10-19 08:17:28.299 8 DEBUG ironic.conductor.task_manager [req-0daec60d-e154-48e1-8ca8-4fafcb28c015 - - - - -] Attempting to get shared lock on node 677d3491-814a-428e-8be6-f309d6ac95d6 (for power state sync) __init__ /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:226
2021-10-19 08:17:32.110 8 DEBUG ironic.conductor.task_manager [req-0daec60d-e154-48e1-8ca8-4fafcb28c015 - - - - -] Successfully released shared lock for power state sync on node 677d3491-814a-428e-8be6-f309d6ac95d6 (lock was held 3.81 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:418
2021-10-19 08:17:41.816 8 DEBUG ironic.conductor.manager [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] RPC change_node_power_state called for node 677d3491-814a-428e-8be6-f309d6ac95d6. The desired new state is rebooting. change_node_power_state /usr/lib/python3.6/site-packages/ironic/conductor/manager.py:303
Desired state is rebooting, this should end up calling:
ironic.drivers.modules.ipmitool.IPMIPower.reboot
The reboot method will call the following methods in the "good path":
ironic.drivers.modules.ipmitool._power_off
ironic.drivers.utils.ensure_next_boot_device
ironic.drivers.modules.ipmitool._power_on
The _power_off method is calling:
ironic.drivers.modules.ipmitool._set_and_wait
Which returns
ironic.conductor.utils.node_wait_for_power_state
This is what raises this error:
2021-10-19 08:18:22.683 8 ERROR ironic.conductor.utils [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] Timed out after 30 secs waiting for power off on node 677d3491-814a-428e-8be6-f309d6ac95d6.: oslo_service.loopingcall.LoopingCallTimeOut: Looping call timed out after 20.60 seconds
The lock is released:
2021-10-19 08:18:22.734 8 DEBUG ironic.conductor.task_manager [req-e6bc6e68-907b-4cbb-96e8-59c96c998143 4d8927747a344c9590e907366812a57a f0136f62e9b04cc6bee7ed60d56dd2c5 - default default] Successfully released exclusive lock for changing node power state on node 677d3491-814a-428e-8be6-f309d6ac95d6 (lock was held 40.87 sec) release_resources /usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py:418
Result is that the "ironic.drivers.modules.ipmitool.IPMIPower.reboot" never gets to run:
ironic.drivers.utils.ensure_next_boot_device
ironic.drivers.modules.ipmitool._power_on
i.e the node remains powered off, the _power_on step of the reboot is never called.
We have 30 seconds in Train:
# Number of seconds to wait for power operations to complete,
# i.e., so that a baremetal node is in the desired power
# state. If timed out, the power operation is considered a
# failure. (integer value)
# Minimum value: 2
#power_state_change_timeout = 30
Upstream the default is 60 seconds, modified in Change-Id: I0aa5131504b60b13d43c73c9a3be1f50f7855cbc.
2021-10-19 08:17:28.299 8 DEBUG ironic. conductor. task_manager [req-0daec60d- e154-48e1- 8ca8-4fafcb28c0 15 - - - - -] Attempting to get shared lock on node 677d3491- 814a-428e- 8be6-f309d6ac95 d6 (for power state sync) __init__ /usr/lib/ python3. 6/site- packages/ ironic/ conductor/ task_manager. py:226
2021-10-19 08:17:32.110 8 DEBUG ironic. conductor. task_manager [req-0daec60d- e154-48e1- 8ca8-4fafcb28c0 15 - - - - -] Successfully released shared lock for power state sync on node 677d3491- 814a-428e- 8be6-f309d6ac95 d6 (lock was held 3.81 sec) release_resources /usr/lib/ python3. 6/site- packages/ ironic/ conductor/ task_manager. py:418
2021-10-19 08:17:41.816 8 DEBUG ironic. conductor. manager [req-e6bc6e68- 907b-4cbb- 96e8-59c96c9981 43 4d8927747a344c9 590e907366812a5 7a f0136f62e9b04cc 6bee7ed60d56dd2 c5 - default default] RPC change_ node_power_ state called for node 677d3491- 814a-428e- 8be6-f309d6ac95 d6. The desired new state is rebooting. change_ node_power_ state /usr/lib/ python3. 6/site- packages/ ironic/ conductor/ manager. py:303
Desired state is rebooting, this should end up calling: drivers. modules. ipmitool. IPMIPower. reboot
ironic.
The reboot method will call the following methods in the "good path": drivers. modules. ipmitool. _power_ off drivers. utils.ensure_ next_boot_ device drivers. modules. ipmitool. _power_ on
ironic.
ironic.
ironic.
The _power_off method is calling: drivers. modules. ipmitool. _set_and_ wait conductor. utils.node_ wait_for_ power_state
ironic.
Which returns
ironic.
This is what raises this error: conductor. utils [req-e6bc6e68- 907b-4cbb- 96e8-59c96c9981 43 4d8927747a344c9 590e907366812a5 7a f0136f62e9b04cc 6bee7ed60d56dd2 c5 - default default] Timed out after 30 secs waiting for power off on node 677d3491- 814a-428e- 8be6-f309d6ac95 d6.: oslo_service. loopingcall. LoopingCallTime Out: Looping call timed out after 20.60 seconds
2021-10-19 08:18:22.683 8 ERROR ironic.
The lock is released: conductor. task_manager [req-e6bc6e68- 907b-4cbb- 96e8-59c96c9981 43 4d8927747a344c9 590e907366812a5 7a f0136f62e9b04cc 6bee7ed60d56dd2 c5 - default default] Successfully released exclusive lock for changing node power state on node 677d3491- 814a-428e- 8be6-f309d6ac95 d6 (lock was held 40.87 sec) release_resources /usr/lib/ python3. 6/site- packages/ ironic/ conductor/ task_manager. py:418
2021-10-19 08:18:22.734 8 DEBUG ironic.
Result is that the "ironic. drivers. modules. ipmitool. IPMIPower. reboot" never gets to run: drivers. utils.ensure_ next_boot_ device drivers. modules. ipmitool. _power_ on
ironic.
ironic.
i.e the node remains powered off, the _power_on step of the reboot is never called.
We have 30 seconds in Train: state_change_ timeout = 30
# Number of seconds to wait for power operations to complete,
# i.e., so that a baremetal node is in the desired power
# state. If timed out, the power operation is considered a
# failure. (integer value)
# Minimum value: 2
#power_
Upstream the default is 60 seconds, modified in Change-Id: I0aa5131504b60b 13d43c73c9a3be1 f50f7855cbc.
Let's try to bump that in CI?