[RFE] Allow specifying a maximum time allowed per clean step
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ironic |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Currently, there's no way to automatically abort a clean step that's taking longer than expected.
This has a number of problems:
* Nodes can get stuck on a particular clean step for long periods of time if they never make it to the "CLEANWAIT" state (see this bug: https:/
* There's no way to automatically detect potential issues with nodes that may be non-fatal. For example, if a disk takes N hours to erase but we only expect it to take N-3, the clean step may succeed, but the disk might have degraded performance that is unacceptable for a provisioned node to have.
* The clean_callback_
One potential way to do this is to let hardware manager creators specify a "timeout" field when defining a clean step, similar to how the "abortable" field was added. If a clean step exceeds that value, the node will be placed in CLEANFAIL after the specified amount of time has elapsed without that clean step successfully finishing.
As described above, an optional "timeout" field will be added to each clean step specification and indicate the seconds for which a clean step will timeout. For example, this would be the clean step definition for erase_disks if a timeout of 2400 seconds were added:
```
{
}
```
The conductor will keep track of this timeout and place the node in a CLEANFAIL provision state if the timeout is reached.
If no timeout is specified, this means that no timeout will be enforced.
Another option is having the agent itself keep track of the timeout. This would allow easier differentiation between cases where the ramdisk doesn't boot and an actual clean step timeout.
Changed in ironic: | |
assignee: | nobody → Mario Villaplana (mario-villaplana-j) |
Changed in ironic: | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |
description: | updated |
tags: | added: rfe-approved |
Changed in ironic: | |
status: | In Progress → Triaged |
assignee: | Mario Villaplana (mario-villaplana-j) → nobody |
Mario,
If you can be more specific about what you're proposing (such as the exact field, and the exact actions to take on a node when it happens) I think this could be done without a spec. Can you add these specifics to the description?
Thanks,
Jay