BMC commissioning error on HPE Gen 10 with ILO 5

Bug #2073731 reported by DUFOUR Olivier
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Status tracked in 3.6
3.3
Fix Committed
Critical
Dariusz Gadomski
3.4
Fix Released
Critical
Björn Tillenius
3.5
Fix Released
Critical
Björn Tillenius
3.6
Fix Committed
Critical
Björn Tillenius

Bug Description

When deploying HPE Gen 10 Plus servers with the ILO 5 at the version 3.05, MAAS will systematically fail to run the script "30-maas-01-bmc-config"

This issue were not apparent on the version 2.98 and after downgrading the ILO again from 3.05 to 2.98, the commissioning script was indeed able to pass.

After some debugging done on the commissioning, checking the output of bmc-config-checkout does show a different output between the 2 versions of ILO. (both output are attached to this report.

If the IPMI user is already defined in MAAS' interface and the bmc config is skipped from MAAS, MAAS is perfectly able to query the BMC, query the power state or boot the server.

The conclusion for now is that any HPE server running a recent version of ILO 5 with version similar or higher than 3.05 may fail to be commissioned by default and MAAS will be unable to create its own user.

#
# Environment :
#
* Ubuntu 22.04
* MAAS 3.4.3 in HA

#
# 30-maas-01-bmc-config output from MAAS and both CLI
#
INFO: Loading IPMI kernel modules...
INFO: Checking for HP Moonshot...
INFO: Checking for Redfish...
ERROR: Redfish configuration failed. Missing SMBIOS data
INFO: Checking for IPMI...
INFO: IPMI detected!
INFO: Reading current IPMI BMC values...
ERROR: No cipher enabled!

#
# differences from bmc-config
#

# on ILO 5 2.98
# Section Rmcpplus_Conf_Privilege Comments
Section Rmcpplus_Conf_Privilege
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_0 Unused
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_1 User
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_2 User
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_3 Administrator

# on ILO 5 3.05
# Section Rmcpplus_Conf_Privilege Comments
Section Rmcpplus_Conf_Privilege
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_0 Unused
 ## Possible values: Unused/User/Operator/Administrator/OEM_Proprietary
 Maximum_Privilege_Cipher_Suite_Id_1 User
EndSection

Related branches

Revision history for this message
DUFOUR Olivier (odufourc) wrote :
Revision history for this message
DUFOUR Olivier (odufourc) wrote :
Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Raising ~Field-critical since this impacts our deployment and any other new deployments with HPE servers.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Could you please provide the output of 'ipmitool -I lanplus -H ... -U ... -P ... lan print 1'

Where '1' might have to be replaced with the correct lan channel.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Give us ~24 hours. We don't have continues access to the env but only in the day time of APAC time zones.

Revision history for this message
Nobuto Murata (nobuto) wrote :

> 'ipmitool -I lanplus -H ... -U ... -P ... lan print 1'

But are you sure you want us to test it with `ipmitool`? I thought MAAS relies on freeipmi-tools mainly instead of ipmitool.

https://git.launchpad.net/maas/tree/src/metadataserver/builtin_scripts/commissioning_scripts/bmc_config.py#n234

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Attached in ipmitool-ilo-2.98.txt the full output from ipmitool when querying the ILO on version 2.98 :

RMCP+ Cipher Suites : 0,1,2,3
Cipher Suite Priv Max : XuuaXXXXXXXXXXX
                        : X=Cipher Suite Unused
                        : c=CALLBACK
                        : u=USER
                        : o=OPERATOR
                        : a=ADMIN
                        : O=OEM

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Attached in ipmitool-ilo-3.05.txt the full output from ipmitool when querying the ILO on version 3.05 :

RMCP+ Cipher Suites : 0,1
Cipher Suite Priv Max : XuuaXXXXXXXXXXX
                        : X=Cipher Suite Unused
                        : c=CALLBACK
                        : u=USER
                        : o=OPERATOR
                        : a=ADMIN
                        : O=OEM

So the output seems to match what bmc-config is reporting.

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

And HPE released a version 3.06 of ILO, but sadly the behaviour is strictly identical to 3.05 for IPMI.

Revision history for this message
Björn Tillenius (bjornt) wrote :

Hmm. So ipmitool output is both consistent and inconsistent with bmc-config. It's consistent in that "RMCP+ Cipher Suites" reports only 0 and 1, but "Cipher Suite Priv Max" indicates that max user priv is "user" for cipher 2 and "admin" for cipher 3.

Could you manually try different values for -C using ipmitool to see which cipher suites actually work?

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Hello

I ran ipmitool with various values for -C (with a basic loop testing from ciphers from 0 to 17)

The results are the following :
* on ILO 2.98 : only Cipher 3 is working at Administrator level

* on ILO 3.05~3.06 : Cipher 3 and 17 are both working with Admin level.

Looking at the release note of ILO 5, it looks like HPE did some changes since the release of 3.05 to include and enable the Cipher 17 :
> "Added cypher #17 support for HPE iLO IPMI over LAN interface."
https://support.hpe.com/connect/s/softwaredetails?language=en_US&collectionId=MTX-2dc80c4ae4b943fa&tab=revisionHistory

Is there a possibility for ipmitool or bmc-config on Jammy to not be able to detect the cipher 17 ?

Revision history for this message
Jack Lloyd-Walters (lloydwaltersj) wrote :

May as well see what freeimpitools say also for enabled suites, and whether that agrees with the above

Revision history for this message
Nobuto Murata (nobuto) wrote :

I'm shamelessly asking this question without taking a deeper look. What's the next step here? I'm conscious about the time zone difference and not having continuous access to the machine in question. Every iteration takes ~24 hours because of the mix of those two reasons. Is there any way we can shorten the path to the resolution? e.g giving multiple commands from the engineering team by being several moves ahead instead of just running one command each day.

Revision history for this message
Jack Lloyd-Walters (lloydwaltersj) wrote :

That's fair.
We were wanting to try out a new bmc_config script with the sys.exit() commented out after checking for which ciphers were enabled, and then performing the same steps in the original bug report.
(relevant code change https://code.launchpad.net/~lloydwaltersj/maas/+git/maas/+merge/470100)

We're theory crafting about whether this is a deeper issue in the ipmi bmcs, or something we've introduced in recent changes, so we're wanting to skip the abort here to see what happens.

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Nobuto, would you be able to test if the modification to the commissioning script that Jack shared is making any difference? Is it possible to get access to one of the affected systems so we can poke around with ipmitools and freeipmi-tools?

Revision history for this message
Nobuto Murata (nobuto) wrote (last edit ): Re: [Bug 2073731] Re: BMC commissioning error on HPE Gen 10 with ILO 5

2024年7月26日(金) 0:55 Jerzy Husakowski <email address hidden>:
> Nobuto, would you be able to test if the modification to the
> commissioning script that Jack shared is making any difference?

It's a snap based deployment so it's tricky to make this kind of changes on
the fly without rebuilding a snap if I'm not mistaken.

> Is it
> possible to get access to one of the affected systems so we can poke
> around with ipmitools and freeipmi-tools?

If you can assign somebody in EMEA, you can join a session with a customer
in your morning time. Please reach out to Olivier for the link, etc.

Revision history for this message
Nobuto Murata (nobuto) wrote :

One correction, today's engineer is Yoshi instead of Olivier.

It would be good to spend 30 min at least to see if the proposed commissioning scirpt can be tested without re-deploying MAAS, and poke the iLO 5 with ipmitool or freeipmi-tools.

If those do not work well, I think we need to involve the certification team for MAAS team to get access to the actual hardware such as HPE ProLiant DL380 Gen10 Plus with iLO 5 or any other HPE server with iLO 5.

Revision history for this message
Björn Tillenius (bjornt) wrote :

I'll reach out to Yoshi.

Since you're running MAAS as a snap, we can build a snap branch for you to test out.

It looks like there's a problem in either the ILO firmware or the freeipmi tools. The bmc-checkout command tells us that cipher 3 is not enabled, yet you confirmed that it is available.

We can work around it in MAAS, but it would be good to reach out to HPE about this as well.

Revision history for this message
Moula BADJI (moulab1) wrote :

Hi guys.
I have the same problem with ASUS hardware.
Thanks.

Revision history for this message
Moula BADJI (moulab1) wrote :

hi @bjornt
Despite the change you made yesterday to version 3.6, the bug is still there. Commissioning still does not work on my Asus servers.
Maybe I need to open another bug?
THANKS.

Revision history for this message
Moula BADJI (moulab1) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

@moulab1, your issue is something different from this report. Your issue is trying to commission a node with noble somehow, which is incompatible with the commissioning script.

Revision history for this message
Moula BADJI (moulab1) wrote :

Bonjour @nobuto.
Thank you for your reply.
I have tried to commission with jammy several times and the problem is always the same. Jammy or Noble I have this problem.
I'll open another bug.

Thanks.

Revision history for this message
Moula BADJI (moulab1) wrote :

@nobuto.
I tested again with jammy for commissioning and the problem is worse.
With Jammy everything is failed, not just the script : 20-maas-02-dhcp-unconfigured-ifaces
Thank's.

Revision history for this message
Björn Tillenius (bjornt) wrote :

The fix is now fixed in master and backported to the 3.5 and 3.4 branches. It should be included in the next releases for 3.4 and 3.5.

In the mean time, the fix is also in a snap branch for 3.4: 3.4/stable/hotfix-bug-2073731

If you run 3.4/stable, you can install it using:
  sudo snap refresh --channel=3.4/stable/hotfix-bug-2073731 maas

It's currently MAAS 3.4.3 + the fix for this bug.

We don't have a hotfix branch for 3.5.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Thanks a bunch. That unblocks our work.

In the meantime, do you want to access the hardware for certification to track down why the cipher list detection failed in the long term?

Revision history for this message
Bartosz Woronicz (mastier1) wrote :

Found the same happening for Supermicro. About to test the new MAAS now

{
  "hardware_info": {
    "system_vendor": "Supermicro",
    "system_product": "AS -2015CS-TNR",
    "system_family": "SMC H13",
    "system_version": "Unknown",
    "system_sku": "091715D9",
    "system_serial": "<CENSORED>",
    "cpu_model": "AMD EPYC 9554P 64-Core Processor",
    "mainboard_vendor": "Supermicro",
    "mainboard_product": "H13SSW",
    "mainboard_serial": "<CENSORED>",
    "mainboard_version": "1.01A",
    "mainboard_firmware_vendor": "American Megatrends International, LLC.",
    "mainboard_firmware_date": "05/28/2024",
    "mainboard_firmware_version": "1.9",
    "chassis_vendor": "Supermicro",
    "chassis_type": "Other",
    "chassis_serial": "<CENSORED>",
    "chassis_version": "Unknown"
  },
  "power_type": "ipmi",
  "system_id": "4dyw37",
  "resource_uri": "/MAAS/api/2.0/machines/4dyw37/"
}

Revision history for this message
Bartosz Woronicz (mastier1) wrote :

So currently I do not observe issues with bmc-config on 3.4.4

Yet I see that after commisioning the script got stuck on 4 out of 57 machines
I created a seperate bug report
https://bugs.launchpad.net/maas/+bug/2077564

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Unsubscribing field critical as this is no longer a blocking issue and the fix is out on all supported releases.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.