Default Waiting Policy for power commands for power retries might lead to incorrectly determining failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Graham Binns | ||
1.7 |
Fix Released
|
Critical
|
Graham Binns |
Bug Description
THe default waiting policy for power command retries is too low and can cause machines that are in the process of being powered on/off to fail.
Currently, the default waiting policy is:
(1, 1, 1, 1, 1, 3, 5)
In some scenarios, this can cause BMC lockup, or can completely mistake a node being powered down as failing to power down, leading to a failure of releasing:
For example, some IPMI based BMC's do not actually power off the system right away. The BMC can be doing the process, or unresponsive for a few seconds (for example, 10-15 seconds).
So when the power off request comes in, the machine does not power off right a way. Sometimes it takes a few seconds (which are longer than the totality of the wait time) to actually power off.
In this case, what it is causing is that in some cases, the machine is being powered off, but MAAS thinks it failed because MAAS waits 13 secodns tops, where it can be taking the BMC 15 seconds to report that the node is actually off.
I'd suggest we increase the waiting time to something like:
(2, 2, 2, 2, 2, 4, 6)
This would mean that we have a waiting policy in total of 20 seconds before we decide whether the node failed to power on/off.
Related branches
- Christian Reis (community): Approve
- Newell Jensen (community): Approve
-
Diff: 24 lines (+2/-2)2 files modifiedsrc/provisioningserver/rpc/power.py (+1/-1)
src/provisioningserver/rpc/tests/test_power.py (+1/-1)
- Christian Reis (community): Approve
-
Diff: 24 lines (+2/-2)2 files modifiedsrc/provisioningserver/rpc/power.py (+1/-1)
src/provisioningserver/rpc/tests/test_power.py (+1/-1)
- Andres Rodriguez (community): Approve
-
Diff: 21 lines (+12/-1)1 file modifieddebian/changelog (+12/-1)
Changed in maas: | |
importance: | Undecided → Critical |
Changed in maas: | |
status: | New → Triaged |
tags: | added: trivial |
Changed in maas: | |
assignee: | nobody → Andres Rodriguez (andreserl) |
milestone: | none → next |
Changed in maas: | |
assignee: | Andres Rodriguez (andreserl) → Graham Binns (gmb) |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
milestone: | next → none |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Is this suitable for 1.7?