IPMI power template performs very minimal error checking which can lead to silent failures
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Raphaël Badin |
Bug Description
When powering on, the IPMI power template performs two steps:
1) Sets the boot device to PXE
2) Issues the power on/power cycle command
step 1 has only very minimal error checking - the script only treats it as a failure if "password invalid" is in the response, but there are many other possible error messages. For all other errors, the template continues on to step 2.
This can cause a system to boot straight to disk instead of booting from PXE, which can lead to failed commissioning and deployments.
I've seen behavior that matches this problem on a few nodes in OIL - failed deployments due to booting from disk instead of PXE.
The same problem applies to step 2 - the only error caught is "invalid password".
It should be easy enough to fix this - just check $? after issuing the command to PXE boot and if it's not 0 then fail.
Related branches
- Blake Rouse (community): Approve
- Ricardo Bánffy (community): Approve
-
Diff: 23 lines (+7/-0)1 file modifiedetc/maas/templates/power/ipmi.template (+7/-0)
summary: |
- IPMI power template can silently fail to enable PXE boot + IPMI power template can silently fail |
description: | updated |
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 1.8.0 |
description: | updated |
summary: |
- IPMI power template can silently fail + IPMI power template performs very minimal error checking which can lead + to silent failures |
Changed in maas: | |
assignee: | nobody → ubuntudotcom1 (ubuntudotcom1) |
Changed in maas: | |
assignee: | ubuntudotcom1 (ubuntudotcom1) → nobody |
Changed in maas: | |
importance: | Critical → High |
status: | New → Triaged |
Changed in maas: | |
assignee: | nobody → Raphaël Badin (rvb) |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Triaged → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
So the complexity I see with this is that even if the script fails to set PXE for booting, that doesn't necessarily mean that the machine was not set to PXE. Maybe the BMC does not allow the user to set the boot to PXE, while it will always do that by default.
If we error out when it fails to tell the machine to PXE, but the machine PXE boots by default, then we would be marking the machine as failed when it shouldn't be.
That being said, the description above is not entirely accurate. MAAS would not *only* tell the machine to PXE when using IPMI, but it would also tell the following:
Section Chassis_Power_Conf
Power_ Restore_ Policy Off_State_AC_Apply
Boot_Flags_ Persistent No
EndSection
Section Chassis_Boot_Flags
Boot_Device PXE
What the real issue was here, is that the command would fail to commit the above config due to Chassis_Power_conf, which would cause the Chassis_Boot_Flags to not be committed. This, in turn, cause not to commit the Boot_Device settings, which would cause machines to boot from disk (if they weren't manually set to PXE).
However, in the latest iteration we have remove the Chassis_Power_Conf section to not cause this to fail:
https:/ /code.launchpad .net/~andreserl /maas/disable_ chassis_ power_conf
@Jason,
Can you please confirm that you are still experiencing this issue? Thanks.