iLO sometimes refuses power requests with Enclosure Busy

Bug #1725204 reported by Dmitry Tantsur
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
proliantutils
Fix Released
Undecided
Nisha Agarwal

Bug Description

Initially reported as https://bugzilla.redhat.com/show_bug.cgi?id=1460915.

On ironic side it can be seen as
2017-06-13 03:45:13.269 15932 ERROR ironic.drivers.modules.ilo.power [req-91b6eae5-152b-45e3-bc2b-e2166d915e34 - - - - -] iLO failed to change state to power on within 12 sec
2017-06-13 03:45:13.278 15932 ERROR ironic.drivers.modules.agent_base_vendor [req-91b6eae5-152b-45e3-bc2b-e2166d915e34 - - - - -] Error rebooting node cb8110ee-aea9-4acd-942b-3bcd5423c4c8 after deploy. Error: Failed to set node power state to power on.

In the iLO logs I see 2 types of errors:

>SHOW SYSLOG SERVER ALL
Retrieving Server syslog(s) ...

Server 1 Syslog:
<EVENT_LOG DESCRIPTION="Integrated Management Log">
 <EVENT
  SEVERITY="Informational"
  CLASS="Maintenance"
  LAST_UPDATE="05/19/2017 05:35"
  INITIAL_UPDATE="05/19/2017 05:35"
  COUNT="1"
  DESCRIPTION="IML Cleared (iLO 4 user:OSPctl)"
  EVENT_CLASS="0x0021"
  EVENT_CODE="0x0001"
 />
 <EVENT
  SEVERITY="Informational"
  CLASS="Rack Infrastructure"
  LAST_UPDATE="[NOT SET] "
  INITIAL_UPDATE="[NOT SET] "
  COUNT="1"
  DESCRIPTION="Server Blade Enclosure Power Request Denied: Enclosure Busy (Enclosure Serial Number 2M271200DS, Slot 1)"
  EVENT_CLASS="0x0022"
  EVENT_CODE="0x001a"
 />
 <EVENT
  SEVERITY="Informational"
  CLASS="Rack Infrastructure"
  LAST_UPDATE="09/19/2017 00:06"
  INITIAL_UPDATE="09/19/2017 00:06"
  COUNT="1"
  DESCRIPTION="Server Blade Enclosure Power Request Denied: Enclosure Busy (Enclosure Serial Number 2M271200DS, Slot 1)"
  EVENT_CLASS="0x0022"
  EVENT_CODE="0x001a"
 />
</EVENT_LOG>

and

Server 13 Syslog:
<EVENT_LOG DESCRIPTION="Integrated Management Log">
 <EVENT
  SEVERITY="Informational"
  CLASS="Maintenance"
  LAST_UPDATE="05/19/2017 05:35"
  INITIAL_UPDATE="05/19/2017 05:35"
  COUNT="1"
  DESCRIPTION="IML Cleared (iLO 4 user:OSPctl)"
  EVENT_CLASS="0x0021"
  EVENT_CODE="0x0001"
 />
 <EVENT
  SEVERITY="Informational"
  CLASS="Rack Infrastructure"
  LAST_UPDATE="[NOT SET] "
  INITIAL_UPDATE="[NOT SET] "
  COUNT="1"
  DESCRIPTION="Server Blade Enclosure Power Request Denied: Enclosure Busy (Enclosure Serial Number 2M271200DS, Slot 13)"
  EVENT_CLASS="0x0022"
  EVENT_CODE="0x001a"
 />
 <EVENT
  SEVERITY="Critical"
  CLASS="Rack Infrastructure"
  LAST_UPDATE="[NOT SET] "
  INITIAL_UPDATE="[NOT SET] "
  COUNT="1"
  DESCRIPTION="Server Blade Enclosure Inadequate Power To Power On: Not Enough Power (Enclosure Serial Number 2M271200DS, Slot 13)"
  EVENT_CLASS="0x0022"
  EVENT_CODE="0x0001"
 />
</EVENT_LOG>

Note that we've seen a remotely similar problem with Dell machines, and ended up adding a loop to dracclient, waiting for the controller to become ready: https://github.com/openstack/python-dracclient/commit/deed7d7c1c79d1d9d7fcf83fc1bf726c93fd5ef4

Changed in proliantutils:
assignee: nobody → Nisha Agarwal (agarwalnisha1980)
status: New → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to proliantutils (master)

Fix proposed to branch: master
Review: https://review.openstack.org/519967

Changed in proliantutils:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to proliantutils (master)

Reviewed: https://review.openstack.org/519967
Committed: https://git.openstack.org/cgit/openstack/proliantutils/commit/?id=fd3dbea575b543300de787bed603f0105716dca0
Submitter: Zuul
Branch: master

commit fd3dbea575b543300de787bed603f0105716dca0
Author: Nisha Agarwal <email address hidden>
Date: Tue Nov 14 21:45:32 2017 -0800

    Retry power on operation for Blade servers

    This patch retries power on operation if it fails
    to power on in definite time. This is needed
    only for Blade servers. The fix is done for
    Gen9 Proliant servers.

    Change-Id: I088b8cf9bbde057c5536cad6368fce7d8d608f41
    Closes-bug: 1725204

Changed in proliantutils:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to proliantutils (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/657004

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to proliantutils (stable/pike)

Reviewed: https://review.opendev.org/657004
Committed: https://git.openstack.org/cgit/x/proliantutils/commit/?id=42668e4be45cc9e2e0ffe1976896e83b5a5e3f5f
Submitter: Zuul
Branch: stable/pike

commit 42668e4be45cc9e2e0ffe1976896e83b5a5e3f5f
Author: Nisha Agarwal <email address hidden>
Date: Tue Nov 14 21:45:32 2017 -0800

    Retry power on operation for Blade servers

    This patch retries power on operation if it fails
    to power on in definite time. This is needed
    only for Blade servers. The fix is done for
    Gen9 Proliant servers.

    Change-Id: I088b8cf9bbde057c5536cad6368fce7d8d608f41
    Closes-bug: 1725204
    (cherry picked from commit fd3dbea575b543300de787bed603f0105716dca0)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.