IPMI timeout option

Bug #1521290 reported by Paolo de Rosa
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Andres Rodriguez

Bug Description

[Environment]
Trusty 14.04.3
MAAS 1.8.3

[Description]
Our customer has experienced timeout issues with some the BMC software described below.
Basically without adding "--retransmission-timeout=5000" to "workarounds" variable in etc/maas/templates/power/ipmi.template mass is not able to start and stop the machine.

Product Name ProLiant DL380 Gen9
UUID 30393137-3436-5A43-3335-34334B505031
Server Serial Number CZ3543KPP1
Product ID 719064-B21
System ROM P89 v1.50 (07/20/2015)
System ROM Date 07/20/2015
Backup System ROM 07/20/2015
Integrated Remote Console .NET Java
License Type iLO 4 Advanced
iLO Firmware Version 2.30 Aug 19 2015
iLO Hostname ILOCZ3543KPP1.

Product Name ProLiant DL380 Gen9
UUID 30393137-3436-5A43-3335-34334B505052
Server Serial Number CZ3543KPPR
Product ID 719064-B21
System ROM P89 v1.50 (07/20/2015)
System ROM Date 07/20/2015
Backup System ROM 07/20/2015
Integrated Remote Console .NET Java
License Type iLO 4 Standard
iLO Firmware Version 2.30 Aug 19 2015
iLO Hostname ILOCZ3543KPPR.Gromit1.rvc-servers.cz

Tags: sts internal
description: updated
tags: added: sts
Revision history for this message
Edward Hope-Morley (hopem) wrote :

As requested by the MAAS folks I have now tested the following with some older proliant servers (tested with G7 and G9):

sed -i '/^workarounds=/s/"$/ --retransmission-timeout=5000"/' /etc/maas/templates/power/ipmi.template

After applying this setting I see no negative impact on operation and in cases where impi was previously timing out I no longer see this problem.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

On closer inspection it would appear that I am now having commissioning failures with a Proliant G7 server. I need to investigate further to establish why this extra parameter is causing a failure though. FWIW I am still not seeing any issues with a G9 model.

Revision history for this message
Gavin Panella (allenap) wrote :

We could formally offer retransmission timeout as an option for power control, but that would entail changing this setting for each node individually. It would be best to figure out how to fix this bug without needing additional levers and dials.

Changed in maas:
status: New → Triaged
status: Triaged → Incomplete
Revision history for this message
Gavin Panella (allenap) wrote :

Ed, is that workaround okay for now, or does this need attention soonish?

Changed in maas:
status: Incomplete → New
status: New → Incomplete
Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
Revision history for this message
Alex Moldovan (alexmoldovan) wrote :

It looks like since revision r4219 in lp:maas/1.9, the ipmi power driver dropped the use of /etc/maas/.../ipmi.template. Are there plans to bring back the --retransmission-timeout option or to specify it otherwise?

Revision history for this message
A Bz (beuzon-a) wrote :

I am also looking for the --retransmission-timeout option. As described here (https://askubuntu.com/questions/839475/how-to-change-the-ipmi-timeout-in-maas/912579#912579), my servers' BMCs are unresponsive for about 10 seconds upon reboot, which unnecessarily leads to MaaS failed deployments. Could you please bring back the possibility of changing this parameter?

tags: added: internal
Revision history for this message
Mike Kingsbury (mike.kingsbury) wrote :

I also need the --retransmission-timeout for Dell R710s. I've hacked in the timeout parameters in ipmi.py, as the IPMI becomes unresponsive as it transitions power states. Without the parameters, I also get failed deployments/commissions.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

So i just hit this with maas 2.2.2 and setting the following in /etc/freeipmi/freeipmi.conf fixed it for me:

retransmission-timeout 5000

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi!

**This is an automated message**

We believe this is may no longer be an issue in the latest MAAS release. Due to the original date of the bug report, we are currently marking it as Invalid. If you believe this bug report still valid against the latest release of MAAS, or if you are still interested in this, please re-open this bug report.

Thanks

Changed in maas:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.