ipmi-config connection timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
maas (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I'm trying to commission a machine and I'm getting the following error in the maas.log:
Apr 9 00:24:39 maas maas.drivers.
Apr 9 00:25:45 maas maas.drivers.
Apr 9 00:27:13 maas maas.power: [error] Error changing power state (on) of node: server1 (8w7s4m)
Apr 9 00:27:14 maas maas.node: [info] server1: Status transition from COMMISSIONING to FAILED_
Apr 9 00:27:14 maas maas.node: [error] server1: Marking node failed: Power on for the node failed: Could not contact node's BMC: Connection timed out while performing power action. Check BMC configuration and connectivity and try again.
MaaS Versions:
# dpkg -l | grep maas
ii maas 2.3.0-6434-
ii maas-cli 2.3.0-6434-
ii maas-common 2.3.0-6434-
ii maas-dhcp 2.3.0-6434-
ii maas-dns 2.3.0-6434-
ii maas-proxy 2.3.0-6434-
ii maas-rack-
ii maas-region-api 2.3.0-6434-
ii maas-region-
ii python3-django-maas 2.3.0-6434-
ii python3-maas-client 2.3.0-6434-
ii python3-
Ubuntu version:
# lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Expected result:
MaaS is able to correctly reboot/start/stop the machine, it just can't change the boot order to PXE. I would assume that if it can't change the boot order after 1-2 attempts, MaaS wouldn't fail the commissioning but still allow another 5-10 mins before failing, to let the server PXE boot since the server is most likely already configured to PXE boot as the first option. It might also be nice to have an option to disable MaaS from trying to set the boot option so we don't waste time resetting the server when we know it can't set the boot option.
What happens:
Right now, MaaS turns the server on, gets the first ipmi-config timeout, resets the server, gets the second timeout, resets the server, and then immediately fails the commissioning.
tags: | added: ipmi maas |
Changed in maas (Ubuntu): | |
status: | New → Incomplete |
Hi Douglas,
If MAAS cannot set the boot order to PXE, it doesn't matter, because that's a best-effort and not something that causes a failure.
The failure, however, is that your BMC is either reporting that it is failing ti power on, failing to report it was powered on correctly, or not reporting it at all. MAAS does this:
1. MAAS attempts to set the machine to PXE boot. If it fails, it doens't matter, it continues.
2. MAAS tells the machine to power on and checks if it powered on. If it didn't power on, it re-attempts to power on and check if it powered on.
MAAS does 2 in an interval of (1, 2, 2, 4, 6, 8, 12) seconds, unless the tool reports there's fatal errors.
That said, the times we have typically seen the issues you are reporting, although very few cases, it have been due to a buggy BMC that locks itself up. As such, I would recommend you try by upgrading the firmware.
Once that, could you also provide the output of:
ipmipower -W opensesspriv -D LAN_2_0 -u <user> -p <password> -h <host> --cycle --on-if-off
ipmipower -W opensesspriv -D LAN_2_0 -u <user> -p <password> -h <host> --stat
And repeat that, if you can script it to see if your BMC locks or reports an failure?