Ironic fails to execute node servicing step exposed by IPA's HardwareManager

Bug #2069430 reported by Przemyslaw Szczerbik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Unassigned

Bug Description

Description:

Ironic fails to execute servicing steps exposed by IPA's HardwareManager, even though the steps are correctly detected by get_service_steps() method.

Expected result:

Node servicing is executed successfully and node returns to active state.

Actual result:

Node servicing fails and node is put 'service failed' provisioning state and maintenance mode is set.

Steps to reproduce:
1. Provision a node
2. Execute node servicing with any step exposed by the IPA HardwareManager. Example: baremetal node service <node> --service-steps '[{"interface": "deploy", "step": "burnin_cpu"}]'
3. Check node status / Ironic logs

Logs:

2024-06-14 13:08:34.102 1 DEBUG ironic.common.states [-] Exiting old state 'service wait' in response to event 'resume' on_exit /usr/lib/python3.9/site-packages/ironic/common/states.py:360ESC[00m
2024-06-14 13:08:34.102 1 DEBUG ironic.common.states [-] Entering new state 'servicing' in response to event 'resume' on_enter /usr/lib/python3.9/site-packages/ironic/common/states.py:366ESC[00m
2024-06-14 13:08:34.113 1 INFO ironic.conductor.task_manager [-] Node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce moved to provision state "servicing" from state "service wait"; target provision state is "active"ESC[00m
2024-06-14 13:08:34.114 1 DEBUG ironic.drivers.modules.agent_base [-] Refreshing agent service step cache for node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce. Previously cached steps: None refresh_steps /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_base.py:877ESC[00m
2024-06-14 13:08:34.121 1 DEBUG ironic.drivers.modules.agent_client [-] Executing agent command service.get_service_steps for node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce with params {'wait': 'true', 'agent_token': '***'} _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:197ESC[00m
2024-06-14 13:08:34.209 1 DEBUG ironic.drivers.modules.agent_client [-] Agent command service.get_service_steps for node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce returned result {'service_steps': {'GenericHardwareManager': [{'step': 'delete_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'abortable': True}, {'step': 'apply_c
onfiguration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'argsinfo': {'raid_config': {'description': 'The RAID configuration to apply.', 'required': True}, 'delete_existing': {'description': "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", '
required': False}}}, {'step': 'create_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'abortable': True}, {'step': 'burnin_cpu', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True}, {'step': 'burnin_memory', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True
}, {'step': 'burnin_network', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True}, {'step': 'write_image', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False}, {'step': 'inject_files', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'argsinfo': {'files': {'description': "Files to inject, a
list of file structures with keys: 'path' (path to the file), 'partition' (partition specifier), 'content' (base64 encoded string), 'mode' (new file mode) and 'dirmode' (mode for the leaf directory, if created). Merged with the values from node.properties[inject_files].", 'required': False}, 'verify_ca': {'description': 'Whether to verify TLS certific
ates. Global agent options are used by default.', 'required': False}}}]}, 'hardware_manager_version': {'generic_hardware_manager': '1.2'}}, error None, HTTP status code 200 _command /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_client.py:234ESC[00m
2024-06-14 13:08:34.233 1 DEBUG ironic.drivers.modules.agent_base [-] Refreshed agent service step cache for node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce: defaultdict(<class 'list'>, {'raid': [{'step': 'delete_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'abortable': True}, {'step': 'apply_configuration', 'priority': 0,
'interface': 'raid', 'reboot_requested': False, 'argsinfo': {'raid_config': {'description': 'The RAID configuration to apply.', 'required': True}, 'delete_existing': {'description': "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", 'required': False}}}, {'step':
'create_configuration', 'priority': 0, 'interface': 'raid', 'reboot_requested': False, 'abortable': True}], 'deploy': [{'step': 'burnin_cpu', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True}, {'step': 'burnin_memory', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True}, {'step': 'burni
n_network', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'abortable': True}, {'step': 'write_image', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False}, {'step': 'inject_files', 'priority': 0, 'interface': 'deploy', 'reboot_requested': False, 'argsinfo': {'files': {'description': "Files to inject, a list of file struc
tures with keys: 'path' (path to the file), 'partition' (partition specifier), 'content' (base64 encoded string), 'mode' (new file mode) and 'dirmode' (mode for the leaf directory, if created). Merged with the values from node.properties[inject_files].", 'required': False}, 'verify_ca': {'description': 'Whether to verify TLS certificates. Global agent
 options are used by default.', 'required': False}}}]}) refresh_steps /usr/lib/python3.9/site-packages/ironic/drivers/modules/agent_base.py:939ESC[00m
2024-06-14 13:08:34.233 1 DEBUG ironic.conductor.steps [-] List of the steps for service of node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce: [{'interface': 'deploy', 'step': 'burnin_cpu', 'abortable': True, 'priority': 0}] set_node_service_steps /usr/lib/python3.9/site-packages/ironic/conductor/steps.py:498ESC[00m
2024-06-14 13:08:34.248 1 INFO ironic.conductor.servicing [None req-f3a7c367-002f-4b8f-9447-a6b7a1052fa2 - - - - - -] Executing service on node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce, remaining steps: [{'interface': 'deploy', 'step': 'burnin_cpu', 'abortable': True, 'priority': 0}]ESC[00m
2024-06-14 13:08:34.261 1 INFO ironic.conductor.servicing [None req-f3a7c367-002f-4b8f-9447-a6b7a1052fa2 - - - - - -] Executing {'interface': 'deploy', 'step': 'burnin_cpu', 'abortable': True, 'priority': 0} on node 42ebe99c-7aee-419e-b5d2-b0148d3c28ceESC[00m
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils [None req-f3a7c367-002f-4b8f-9447-a6b7a1052fa2 - - - - - -] Node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce failed step {'interface': 'deploy', 'step': 'burnin_cpu', 'abortable': True, 'priority': 0}: 'AgentDeploy' object has no attribute 'burnin_cpu': AttributeError: 'AgentDeploy' object has no attribu
te 'burnin_cpu'
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils Traceback (most recent call last):
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils File "/usr/lib/python3.9/site-packages/ironic/conductor/servicing.py", line 149, in do_next_service_step
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils result = interface.execute_service_step(task, step)
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils File "/usr/lib/python3.9/site-packages/ironic/drivers/base.py", line 456, in execute_service_step
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils return self._execute_step(task, step)
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils File "/usr/lib/python3.9/site-packages/ironic/drivers/base.py", line 321, in _execute_step
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils return getattr(self, step['step'])(task)
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils AttributeError: 'AgentDeploy' object has no attribute 'burnin_cpu'
2024-06-14 13:08:34.261 1 ERROR ironic.conductor.utils ESC[00m
2024-06-14 13:08:34.262 1 DEBUG ironic.common.pxe_utils [None req-f3a7c367-002f-4b8f-9447-a6b7a1052fa2 - - - - - -] Cleaning up PXE config for node 42ebe99c-7aee-419e-b5d2-b0148d3c28ce clean_up_pxe_config /usr/lib/python3.9/site-packages/ironic/common/pxe_utils.py:416ESC[00m

Changed in ironic:
status: New → In Progress
Revision history for this message
Jay Faulkner (jason-oldos) wrote :

https://review.opendev.org/c/openstack/ironic/+/922024 is addressing this and stuck in the gate currently

Changed in ironic:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.opendev.org/c/openstack/ironic/+/922024
Committed: https://opendev.org/openstack/ironic/commit/4f924f2d648712acabcbdfeaa0733ec140e582ca
Submitter: "Zuul (22348)"
Branch: master

commit 4f924f2d648712acabcbdfeaa0733ec140e582ca
Author: Przemyslaw Szczerbik <email address hidden>
Date: Wed May 29 03:25:10 2024 -0700

    Fix execution of node servicing steps exposed by IPA's HardwareManager

    Implement execute_service_step() in AgentBaseMixin that will
    asynchronously execute service step on the agent. Without it, Ironic
    will try to find <step_name> attribute on the object that implements
    interface specified by the servicing step.

    Example:

    Step: [{"interface": "deploy", "step": "burnin_cpu"}]
    Error: AttributeError: 'AgentDeploy' object has no attribute 'burnin_cpu'

    Closes-Bug: #2069430

    Change-Id: Idb1d5b50656c3765ea5c9e21b7844946ae4cfc67
    Signed-off-by: Przemyslaw Szczerbik <email address hidden>

Changed in ironic:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.