Comment 4 for bug 1454810

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1454810] Re: IPMI power template performs very minimal error checking which can lead to silent failures

I think the right approach here would be to add error checking, then if we
hit BMCs that cause ipmitool to raise an error when there really isn't one,
look at why and see if we can add changes to deal with them in particular.
I can say none of the IPMI BMCs I've tested raise errors when configuring
them to PXE boot.

On Fri, May 15, 2015 at 8:25 AM, Jason Hobbs <email address hidden>
wrote:

> I don't know for sure if I've ever seen it - I've only seen behavior
> that could possibly match it.
>
> Have we actually seen BMCs that PXE boot but return an error when we ask
> them to PXE boot? If not, it seems like a poor excuse for not doing
> error checking.
>
> Also, the same issue applies with the power on command itself - it's not
> error checked except for "incorrect password".
>
> ** Changed in: maas
> Status: Incomplete => New
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1454810
>
> Title:
> IPMI power template performs very minimal error checking which can
> lead to silent failures
>
> Status in MAAS:
> New
>
> Bug description:
> When powering on, the IPMI power template performs two steps:
>
> 1) Sets the boot device to PXE
> 2) Issues the power on/power cycle command
>
> step 1 has only very minimal error checking - the script only treats
> it as a failure if "password invalid" is in the response, but there
> are many other possible error messages. For all other errors, the
> template continues on to step 2.
>
> This can cause a system to boot straight to disk instead of booting
> from PXE, which can lead to failed commissioning and deployments.
>
> I've seen behavior that matches this problem on a few nodes in OIL -
> failed deployments due to booting from disk instead of PXE.
>
> The same problem applies to step 2 - the only error caught is "invalid
> password".
>
> It should be easy enough to fix this - just check $? after issuing the
> command to PXE boot and if it's not 0 then fail.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1454810/+subscriptions
>