MAAS wrongly reports cannot reach PowerEdge R630 BMC during machine create and fails to commission node later on

Bug #1827015 reported by Pedro Guimarães
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned

Bug Description

Running on a cluster of 9 PowerEdge R630 with Intel Xeon E5 2640 v3
BIOS versions tested: 1.5.4; 2.5.5; 2.9.1
iDRAC version: 2.61.60.60
MAAS: 2.5.2 (7523-ge4ecbd54d-0ubuntu1~18.04.1)

Since start of the deployment, I've been seeing power-related failures during node deployment due to "BMC Busy". I decided to upgrade firmware (iDRAC and BIOS) to try remove those deployment errors.

However, when moving to other BIOS versions, nodes could not pass commissioning phase any longer.

When adding nodes with "maas create", rather than just booting, on BIOS 2.5.5 and 2.9.1, I get:

2019-04-29-20:56:00 root ERROR Command failed: machines create hostname=HOSTNAME power_type=ipmi architecture=amd64/generic mac_addresses=BA:DB:AD:BA:DB:AD power_parameters_power_address=IP_ADDRESS zone=AZ2 power_parameters_power_user=USERNAME power_parameters_power_pass=PASSWORD power_parameters_power_driver=LAN_2_0 power_parameters_power_boot_type=efi
2019-04-29-20:56:00 root ERROR No rack controllers can access the BMC of node: HOSTNAME

I reproduced ipmi_power and ipmi_chassis_config commands as described on drivers/power/ipmi.py on provisioningserver package:

ipmi-chassis-config -W opensesspriv --driver-type LAN_2_0 -h HOST_IP -u USERNAME -p PASSWORD --commit --filename ~/maas-config-test
~/maas-config-test:
Section Chassis_Boot_Flags
        Boot_Flags_Persistent No
        BIOS_Boot_Type efi
        Boot_Device PXE
EndSection
ipmipower -W opensesspriv -D LAN_2_0 -h HOST_IP -u USERNAME -p PASSWORD --on
Both commands run OK -> freeipmi tools are working on this node

Node is correctly added to MAAS as "New" and keeps a warning on WEB UI saying no rack can reach it. Eventually, warning disappears.

The problem is that commissioning on those nodes always timeout.
When I check cloud-init configs, I can see that actually, that node received tasks for enlistment process rather than commissioning.

Steps to reproduce:
1) Set your PowerEdge R630 for the setup described above;
2) Add node using machine create command, as described
3) Run commissioning while seeing serial output of the node; machine will have "maas-enlistment-node" as hostname
4) Node will power off as enlistment process states but MAAS will keep waiting on commissioning until timeout

Tags: cpe-onsite
summary: - MAAS wrongly reports cannot reach PowerEdge R630 BMC although freeipmi
- commands work fine
+ MAAS wrongly reports cannot reach PowerEdge R630 BMC during machine
+ create
description: updated
summary: MAAS wrongly reports cannot reach PowerEdge R630 BMC during machine
- create
+ create and fails to commission node later on
tags: added: cpe-onsite
Revision history for this message
Lee Trager (ltrager) wrote :

Is this still effecting you with the latest MAAS and PowerEdge R630 firmware? If so what is the return code from the IPMI command?

Can you also attach the MAAS logs?

Changed in maas:
status: New → Incomplete
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Hi, I believe we can mark this issue as invalid. Eventually, I've found that network had some latency and some of freeipmi commands were timing out. That explains why failures were random.

Increasing timeouts /etc/freeipmi/freeipmi.conf resolved this issue.

Changed in maas:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.