Allow tuning of IPMI wait_time on power change

Bug #1921616 reported by Victor Tapia
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.6
3.4
Won't Fix
Medium
Unassigned
3.5
Won't Fix
Medium
Unassigned
3.6
Triaged
Medium
Unassigned

Bug Description

Some IPMI implementations on machines with lots of devices to enumerate on boot, such as Dell R6525 systems with 4.32.xx and later iDrac firmware, can require a long time to update the power status after a power change (12+ seconds in some tests but can be longer). It would be great to have a way to customize the wait time per machine instead of relying on the 4, 8, 16 and 32 seconds timeout iteration. Such iteration can force a "--on-if-off --cycle" on a booting machine that, depending on the firmware, can leave it powered off making the deployment fail. This feature would help deploy machines with such irregular IPMI behaviors.

Revision history for this message
David Jericho (davidjericho) wrote :

I'm presently using a work around that modifys the hard coded wait_time values in the ipmi power driver, however this isn't desirable for all the obvious reasons.

The vendor has said there's no way they can assure a time to power on status, and it may vary from chassis to chassis. As systems (particularly AMD) get more packed, it'll take longer. While the IPMI spec itself suggests it's a power-is-applied style status, Dell have interpreted it differently.

It only became an issue with later iDrac firmwares, as there appears to have been a change in the time-to-respond behaviour between 4.30 and 4.32. 4.30 used to take 10 to 15 seconds to respond to the first status query, where as 4.32 responds much much faster, hence triggering the retry loop. As to why the two subsequent "--on-if-off --cycle" causes the system itself to power off, it is a mystery, we can only assume some kind of protection feature against flapping.

Revision history for this message
Alberto Donato (ack) wrote :

Hello,

thank you for reporting a bug in MAAS.

We're currently using Discourse (https://discourse.maas.io/) for tracking feature requests as it makes it easier to have long-running conversations.

Would you mind making a post there in the "Features" category about this request?

Also see https://maas.io/docs/request-a-feature for details.

Changed in maas:
status: New → Invalid
Revision history for this message
David Jericho (davidjericho) wrote :

This is not a feature request. This is an issue to do with various implementations of IPMI firmwares, and their tendency to failsafe into an off state if a powercycle is sent too frequently.

MAAS becomes unusable with these systems, and the interim fix is to modify the wait_time values. Inability to do this via the CLI or web UI makes MAAS fail to work with these systems.

Alberto Donato (ack)
summary: - [Feature Request] Tunable IPMI wait_time on power change
+ Allow tuning of IPMI wait_time on power change
Changed in maas:
status: Invalid → Triaged
importance: Undecided → Medium
Revision history for this message
Greg Schwimer (gschwim) wrote (last edit ):

This is affecting my implementation as well. We have been running a fix in production for some time which I placed here:

https://code.launchpad.net/~gschwim/maas/+git/maas/+ref/increase-ipmi-wait_time

Merge approval pending.

 Need to get a working build to test in 3.2 but the local patch works great.

Greg Schwimer (gschwim)
Changed in maas:
status: Triaged → Confirmed
Changed in maas:
milestone: none → 3.4.0
status: Confirmed → Triaged
Alberto Donato (ack)
Changed in maas:
milestone: 3.4.0 → 3.4.x
Changed in maas:
milestone: 3.4.x → 3.5.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.