ipmitool "timing" flags are not working as expected causing failure to manage power of baremetal nodes

Bug #1943765 reported by Drew Freiberger
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Ironic Conductor Charm
Fix Released
Undecided
Unassigned
ironic (Ubuntu)
New
Undecided
Unassigned

Bug Description

In a focal-ussuri cloud environment where there is some amount of packet loss between the ironic-conductor and the BMC network, I'm experiencing random timeout issues with ipmitool failures.

The root issue I'm having is that using:

ipmitool -R 12 -N 5 <command>

is resulting in ipmitool hanging for 60 seconds (12 commands are sent even though the session is never properly started) and then timing out within the ironic-conductor application, causing "clean failed" state when transitioning a node from 'manage' to 'provide' status.

Ultimately, it appears that ussuri runs this bit of code that determines that ipmitool accepts -R and -N flags and instead of performing retries of ipmitool within the ironic code, it relies on ipmitool to perform all of the retries.

https://opendev.org/openstack/ironic/src/branch/stable/ussuri/ironic/drivers/modules/ipmitool.py#L538-L546

This has been addressed in the mainline code by the addition of an operator configurable option 'use_ipmitool_retries' to let ipmitool perform retries via -R flag, or fall back to letting ironic execute ipmitool multiple separate times.

https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/ipmitool.py#L494

In my environment, I require to re-run ipmitool multiple separate times to avoid failure.

Can we please backport this functionality into focal-ussuri?
https://opendev.org/openstack/ironic/commit/1de3db3b16f3e0475e506e540ca5d5ed6edb4cbf

Also, please expose charm configuration to allow operator to set "[ipmi] use_ipmitool_retries" = False.

Tags: sts
Revision history for this message
Drew Freiberger (afreiberger) wrote :

FYI, the commit with the option is available in Victoria+

Seyeong Kim (seyeongkim)
tags: added: sts
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

@afreiberger

Hello, Do you have test env?

I've created PPA for testing but I don't have test env yet.

If you have, Could you please test it?I put the commit you mentioned and 2 more related commits.

https://launchpad.net/~seyeongkim/+archive/ubuntu/lp1943765-2

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote (last edit ):

I have verified ironic-conductor ipmitool commands behaviour with the above PPA in #2 (On focal ussuri)

With configuration use_ipmitool_retries = False, ironic-conductor runs below command until 60 seconds timeout expiry.
Command: ipmitool -I lanplus -H 10.5.0.5:9999 -L ADMINISTRATOR -U test -R 1 -N 5 -f /tmp/tmpmt5292he power status

OpenStack commands used for testing:
/snap/bin/openstack baremetal node create --driver ipmi --driver-info ipmi_address=10.5.0.5:9999 --driver-info ipmi_username=test --driver-info ipmi_password=test
/snap/bin/openstack baremetal node list
/snap/bin/openstack baremetal node power on <node id>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ironic-conductor (master)
Changed in charm-ironic-conductor:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ironic-conductor (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ironic-conductor/+/810610
Committed: https://opendev.org/openstack/charm-ironic-conductor/commit/73a5b90d4026b5acf2cefe1f1057d078c8e923e4
Submitter: "Zuul (22348)"
Branch: master

commit 73a5b90d4026b5acf2cefe1f1057d078c8e923e4
Author: Hemanth Nakkina <email address hidden>
Date: Thu Sep 23 16:49:14 2021 +0530

    Add support for new option use-ipmitool-retries

    Add new configuration option use-ipmitool-retries to the charm.

    Closes-Bug: #1943765
    Change-Id: I2d11198d1955f3b96d27163683ac0947639d2f74

Changed in charm-ironic-conductor:
status: In Progress → Fix Committed
Felipe Reyes (freyes)
Changed in charm-ironic-conductor:
milestone: none → 22.04
Changed in charm-ironic-conductor:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.