If a power script fails, there is no UI feedback

Bug #1012954 reported by Julian Edwards
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Raphaël Badin

Bug Description

This is currently by design so this is a reminder bug to fix this.

Changed in maas:
status: New → Triaged
importance: Undecided → Low
Huw Wilkins (huwshimi)
tags: added: notifications ui
Changed in maas:
importance: Low → High
Raphaël Badin (rvb)
tags: added: robustness
David Britton (dpb)
tags: added: landscape
tags: added: cloud-installer
Revision history for this message
David Britton (dpb) wrote :

Two failure modes we hit while demoing at intel:

a) the script fails (wrong credentials for instance)

b) The power command hits a timeout contacting the BMC.

Both of these show up in a log file (which is hard to visually parse, but useful). Could I suggest a couple things?

1) some kind of heartbeat mechanism where we ping all the BMCs every 5 minutes or something (ipmipower -s)?

2) Check after you issue the ipmipower --on or --off that the system actually turned on or off?

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1012954] Re: If a power script fails, there is no UI feedback

On Thu, Jul 24, 2014 at 10:51 AM, David Britton
<email address hidden> wrote:
> Two failure modes we hit while demoing at intel:
>
> a) the script fails (wrong credentials for instance)
>
> b) The power command hits a timeout contacting the BMC.
>
> Both of these show up in a log file (which is hard to visually parse,
> but useful). Could I suggest a couple things?
>
> 1) some kind of heartbeat mechanism where we ping all the BMCs every 5
> minutes or something (ipmipower -s)?

Though I certainly +millions a method for exposing BMC failures to the
user - I've had to write external watchdogs for that when using MAAS
in the past - I'd be concerned that this method would cause a lot of
unnecessary traffic/process load at scale (imagine 5000 hung ipmipower
processes) and, depending on how this information is used, could cause
a lot of bogus alerts in cases where BMCs go temporarily MIA (e.g.
during a watchdog reset). I also would predict that we'd see a case or
two where the pinging itself causes BMC outages. This could be due to
platforms where a single system is serving as the BMC for multiple
nodes (DoS), or just because we're making a BMC run buggy code more
often than it otherwise would (although I may just be tainted from
dated experiences with buggy BMCs).

 -dann

> 2) Check after you issue the ipmipower --on or --off that the system
> actually turned on or off?
>
> ** Tags added: landscape
>
> ** Tags added: cloud-installer
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1249424).
> https://bugs.launchpad.net/bugs/1012954
>
> Title:
> If a power script fails, there is no UI feedback
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1012954/+subscriptions

Graham Binns (gmb)
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
Changed in maas:
milestone: none → 1.7.0
Revision history for this message
Julian Edwards (julian-edwards) wrote :

I think we can mark this fixed, since I've seen nodes get marked failed when the power script goes awry (and it's in the event log too).

Changed in maas:
status: In Progress → Fix Committed
assignee: Blake Rouse (blake-rouse) → nobody
Christian Reis (kiko)
Changed in maas:
assignee: nobody → Raphaël Badin (rvb)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.