A timed out cleaning cannot be retried successfully
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic |
Fix Released
|
High
|
Julia Kreger |
Bug Description
A user utilizing the manual cleaning process, can be stuck in an infinite cleaning loop of sorts if IPA should silently fail or an external influence causes the cleaning process to timeout.
Presently, upon cleaning timing out, the error handler for cleaning is not called which leaves clean_step preserved. If a user attempts to retry cleaning, the presence of the clean_step entry causes the user to essentially become stuck in a loop with cleaning where they are unable to refresh their steps they wish to be performed. This means that if a user submitted manual cleaning JSON document is part of the cause, the operator must manually clean up the clean_step and driver_
Note: stable/mitaka links are used below, as of the filing of this bug, the same behavior is present in master branch, and was reproducible in a test environment with manual cleaning on both master branch and stable/mitaka packages.
Sequence of events:
The _check_
https:/
The result is that node.clean_step is not purged, along with node.driver_
Upon re-invoking cleaning, the agent driver (via https:/
The node powers up.
The agent driver then heartbeats which if node.clean_step is not empty, the heartbeat results in continue_cleaning being called at https:/
which checks to see if there are present commands, which if a timeout occurred, there are none most likely, and the continue_cleaning call is returned prior to taking any additional action at https:/
Essentially, no further action takes place except heartbeat operations. If node.clean_step was empty, self._refresh_
Steps to reproduce:
1) Initiate manual cleaning, such as a raid configuration process.
2) Once the agent has booted and initiated the processes, manually power-off the node or kill the IPA agent before the raid step has completed.
3) Allow timeout to fail the node.
Possible fix:
In the _check_
Changed in ironic: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in ironic: | |
assignee: | nobody → Julia Kreger (juliaashleykreger) |
AFAICT, this affects all forms of cleaning, so changed the title.