Affected version: newton
Configuration driver: pxe_drac, neutron networking.
Steps to reproduce:
1. Put node in 'manage' mode with maintanance=off
2. ironic node-set-provision-state $NODE provide
3. Wait until it become available.
4. ironic node-set-provision-state $NODE manage
4. ironic node-set-provision-state $NODE provide
Expected result: node become available.
Actual result: node in 'clean fail' state.
AFAIK it trying to connect to node when it is in power-off state and receive error.
Trace from ironic-conductor:
ronic.conductor.task_manager[26379]: DEBUG Attempting to get exclusive lock on node dbf66326-3def-47a8-b061-0b409d3d544c (for provision action provide) [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] __init__ /usr/lib/python2.7/dist-packages/ironic/conductor/task_manager.py:208
ironic.conductor.task_manager[26379]: DEBUG Node dbf66326-3def-47a8-b061-0b409d3d544c successfully reserved for provision action provide (took 0.02 seconds) [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] reserve_node /usr/lib/python2.7/dist-packages/ironic/conductor/task_manager.py:252
ironic.common.states[26379]: DEBUG Exiting old state 'manageable' in response to event 'provide' [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] on_exit /usr/lib/python2.7/dist-packages/ironic/common/states.py:219
ironic.common.states[26379]: DEBUG Entering new state 'cleaning' in response to event 'provide' [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] on_enter /usr/lib/python2.7/dist-packages/ironic/common/states.py:225
ironic.conductor.manager[26379]: DEBUG Starting automated cleaning for node dbf66326-3def-47a8-b061-0b409d3d544c [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] _do_node_clean /usr/lib/python2.7/dist-packages/ironic/conductor/manager.py:793
ironic.conductor.manager[26379]: INFO Executing cleaning on node dbf66326-3def-47a8-b061-0b409d3d544c, remaining steps: [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices'}] [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -]
ironic.conductor.manager[26379]: INFO Executing {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'} on node dbf66326-3def-47a8-b061-0b409d3d544c [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -]
ironic.drivers.modules.agent_client[26379]: DEBUG Executing agent command clean.execute_clean_step for node dbf66326-3def-47a8-b061-0b409d3d544c [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -] _command /usr/lib/python2.7/dist-packages/ironic/drivers/modules/agent_client.py:62
ironic.drivers.modules.agent_client[26379]: ERROR Error invoking agent command clean.execute_clean_step for node dbf66326-3def-47a8-b061-0b409d3d544c. Error: HTTPConnectionPool(host='204.74.228.42', port=9999): Max retries exceeded with url: /v1/commands?wait=false (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4d5241ddd0>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)) [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -]
ironic.conductor.manager[26379]: ERROR Node dbf66326-3def-47a8-b061-0b409d3d544c failed step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}: Error invoking agent command clean.execute_clean_step for node dbf66326-3def-47a8-b061-0b409d3d544c. Error: HTTPConnectionPool(host='204.74.228.42', port=9999): Max retries exceeded with url: /v1/commands?wait=false (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4d5241ddd0>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',)) [req-fd6b2d9e-4f22-4f63-a783-2ab6f72142c0 - - - - -]
ironic.conductor.manager[26379]: TRACE Traceback (most recent call last):
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic/conductor/manager.py", line 893, in _do_next_clean_step
ironic.conductor.manager[26379]: TRACE result = interface.execute_clean_step(task, step)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic_lib/metrics.py", line 61, in wrapped
ironic.conductor.manager[26379]: TRACE result = f(*args, **kwargs)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic/drivers/modules/iscsi_deploy.py", line 532, in execute_clean_step
ironic.conductor.manager[26379]: TRACE return deploy_utils.agent_execute_clean_step(task, step)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic/drivers/modules/deploy_utils.py", line 624, in agent_execute_clean_step
ironic.conductor.manager[26379]: TRACE result = client.execute_clean_step(step, task.node, ports)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic_lib/metrics.py", line 61, in wrapped
ironic.conductor.manager[26379]: TRACE result = f(*args, **kwargs)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic/drivers/modules/agent_client.py", line 227, in execute_clean_step
ironic.conductor.manager[26379]: TRACE params=params)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic_lib/metrics.py", line 61, in wrapped
ironic.conductor.manager[26379]: TRACE result = f(*args, **kwargs)
ironic.conductor.manager[26379]: TRACE File "/usr/lib/python2.7/dist-packages/ironic/drivers/modules/agent_client.py", line 71, in _command
ironic.conductor.manager[26379]: TRACE raise exception.IronicException(msg)
ironic.conductor.manager[26379]: TRACE IronicException: Error invoking agent command clean.execute_clean_step for node dbf66326-3def-47a8-b061-0b409d3d544c. Error: HTTPConnectionPool(host='204.74.228.42', port=9999): Max retries exceeded with url: /v1/commands?wait=false (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4d5241ddd0>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH',))
ironic.conductor.manager[26379]: TRACE
204.74.228.42 was an IP address of the neutron port for node on provision/clean networking.
Node was in power-off state (both in real world and in ironic power_state).
The fact that it has an IP to call makes me think we didn't clean up the old heartbeat information (which is likely), however at the same time I'm curious if the node was ever actually powered in this sequence upon the second execution of cleaning steps.
Would it be possible to get a complete conductor log with timestamps starting from when the node is in available state?
Also, out of curiosity, this been confirmed on ocata?