WS-Man operations fail when iDRAC is not ready

Bug #1697558 reported by Richard G. Pioso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-dracclient
Fix Released
Undecided
Richard G. Pioso

Bug Description

Web Services Management (WS-Management and WS-Man) requests/commands can fail when issued to an Integrated Dell Remote Access Controller (iDRAC) whose Lifecycle Controller remote service is not "ready". Specifically, this applies to the WS-Man Enumerate and Invoke operations.

This bug has been observed by the following workflows:

1. Manual out-of-band (OOB) RAID cleaning. It is documented by a bug [0].
2. Set power off. This has been encountered by test builds performed by the Dell ironic third-party continuous integration (CI). This bug, and its more general nature, was discovered while investigating this failure mode.

A Dell technical white paper [1], "Lifecycle Controller Integration -- Best Practices Guide", states that for Lifecycle Controller firmware 1.5.0 and later, "The Lifecycle Controller remote service must be in a 'ready' state before running any other WSMAN commands." That applies to all of the workflows documented by that paper, except the following:

Section # Section Heading
--------- ---------------
 4.28 FCoE Boot Using Broadcom (12th Generation and Later Version of Servers Only)
28.2 Inventory of System Info View
31.4 Check Version of Lifecycle Controller (LC)
32.7.1 Connect and Attach Network ISO Image as a USB CD-ROM Device Through RFS USB End Point
32.7.2 Disconnect and Detach ISO Image Exposed Through RFS USB End Point
32.7.3 Get RF ISO Image Connection Status

That document describes how to determine the readiness of the Lifecycle Controller remote service. A project openstack/python-dracclient commit [2] implements that.

[0] https://bugs.launchpad.net/ironic/+bug/1691808
[1] http://en.community.dell.com/techcenter/extras/m/white_papers/20442332
[2] https://github.com/openstack/python-dracclient/commit/39253bb272a7d4cfcc161c19708b8c6949a21240

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-dracclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/479443

Changed in python-dracclient:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/479444

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/479445

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/482371

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-dracclient (master)

Reviewed: https://review.openstack.org/479443
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=10df06f6c34f9428d6ba856704c2946040161483
Submitter: Jenkins
Branch: master

commit 10df06f6c34f9428d6ba856704c2946040161483
Author: Richard Pioso <email address hidden>
Date: Tue Jun 20 15:54:34 2017 -0400

    Refactor iDRAC is ready functionality

    Web Services Management (WS-Management and WS-Man) requests/commands can
    fail or return invalid results when issued to an Integrated Dell Remote
    Access Controller (iDRAC) whose Lifecycle Controller remote service is
    not "ready". Specifically, that applies to the WS-Man Enumerate and
    Invoke operations.

    A Dell technical white paper [0], "Lifecycle Controller Integration --
    Best Practices Guide", states that for Lifecycle Controller firmware
    1.5.0 and later "The Lifecycle Controller remote service must be in a
    'ready' state before running any other WSMAN commands." That applies to
    almost all of the workflows and use cases documented by that paper and
    supported by this project, openstack/python-dracclient. That document
    describes how to determine the readiness of the Lifecycle Controller
    remote service. A project commit [1] implements that.

    This refactors that patch in preparation for changing the internal
    implementation of the project's APIs so that they follow that best
    practice. The implementation of is_idrac_ready() and
    wait_until_idrac_is_ready() have been relocated further down the call
    stack, to the iDRAC specialization of the WS-Man client defined by class
    dracclient.client.WSManClient. Those methods continue to be available
    through the API provided by class dracclient.client.Client.

    No changes have been made to this project's APIs nor to any functional
    behavior.

    [0]
    http://en.community.dell.com/techcenter/extras/m/white_papers/20442332
    [1]
    https://github.com/openstack/python-dracclient/commit/39253bb272a7d4cfcc161c19708b8c6949a21240

    Change-Id: I87996bbca129995f6c84848ebdb0c33cfedeea53
    Partial-Bug: #1697558
    Related-Bug: #1691808

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-dracclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/487234

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-dracclient (master)

Reviewed: https://review.openstack.org/482371
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=c75969dd8dee4924374f53983b12881dfb705282
Submitter: Jenkins
Branch: master

commit c75969dd8dee4924374f53983b12881dfb705282
Author: Richard Pioso <email address hidden>
Date: Fri Jul 7 19:28:02 2017 -0400

    Parameterize iDRAC is ready retries at class level

    Web Services Management (WS-Management and WS-Man) requests/commands can
    fail or return invalid results when issued to an integrated Dell Remote
    Access Controller (iDRAC) whose Lifecycle Controller remote service is
    not "ready". Specifically, that applies to the WS-Man Enumerate and
    Invoke operations.

    A Dell technical white paper [0], "Lifecycle Controller Integration --
    Best Practices Guide", states that for Lifecycle Controller firmware
    1.5.0 and later "The Lifecycle Controller remote service must be in a
    'ready' state before running any other WSMAN commands." That applies to
    almost all of the workflows and use cases documented by that paper and
    supported by this project, openstack/python-dracclient. That document
    describes how to determine the readiness of the Lifecycle Controller
    remote service.

    This patch parameterizes the iDRAC is ready retry behavior at the class
    level. That makes it possible for consumers of this project, such as
    project openstack/ironic, to configure it library API-wide.

    Additionally, this patch improves the names of the parameters to class
    __init__() methods that control the retry behavior on SSL errors, so
    that they are not confused with those added by this patch. Finally, it
    defines constants for the default values of the retry behavior on SSL
    errors and iDRAC is ready retry parameters, and utilizes those new
    constants.

    [0]
    http://en.community.dell.com/techcenter/extras/m/white_papers/20442332

    Change-Id: Ie866466a8ddf587a24c6d25ab903ec7b24022ffd
    Partial-Bug: #1697558
    Related-Bug: #1691272
    Related-Bug: #1691808

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/479444
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=deed7d7c1c79d1d9d7fcf83fc1bf726c93fd5ef4
Submitter: Jenkins
Branch: master

commit deed7d7c1c79d1d9d7fcf83fc1bf726c93fd5ef4
Author: Richard Pioso <email address hidden>
Date: Wed Jun 28 17:18:32 2017 -0400

    Invoke operations can wait until iDRAC is ready

    Web Services Management (WS-Management and WS-Man) Invoke operations can
    fail when issued to an integrated Dell Remote Access Controller (iDRAC)
    whose Lifecycle Controller remote service is not "ready".

    A Dell technical white paper [0], "Lifecycle Controller Integration --
    Best Practices Guide", states that for Lifecycle Controller firmware
    1.5.0 and later "The Lifecycle Controller remote service must be in a
    'ready' state before running any other WSMAN commands." That applies to
    almost all of the workflows and use cases documented by that paper and
    supported by this project, openstack/python-dracclient. A notable
    exception is the dracclient.client.WSManClient.is_idrac_ready() method,
    which is a chicken and egg situation.

    This patch adds a new parameter to the
    dracclient.client.WSManClient.invoke() method that indicates whether or
    not it should wait until the iDRAC is ready to accept commands before
    issuing the Invoke command. When it is true, that method waits until the
    iDRAC is ready before issuing the command. Since almost all Invoke
    operations require the iDRAC to be ready, the new parameter's default
    value is 'True'.

    [0]
    http://en.community.dell.com/techcenter/extras/m/white_papers/20442332

    Change-Id: Ib5b9fb2a954579be40f47304c70157ab1f00d39c
    Partial-Bug: #1697558
    Related-Bug: #1691808

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/479445
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=3207d9e1bcb2090adcbfcaa4f810aeab8a0408c9
Submitter: Jenkins
Branch: master

commit 3207d9e1bcb2090adcbfcaa4f810aeab8a0408c9
Author: Richard Pioso <email address hidden>
Date: Fri Jun 30 12:26:57 2017 -0400

    Enumerate operations can wait until iDRAC is ready

    Web Services Management (WS-Management and WS-Man) Enumerate operations
    can fail or return invalid results when issued to an integrated Dell
    Remote Access Controller (iDRAC) whose Lifecycle Controller remote
    service is not "ready". The following are examples of failures which
    have been observed:

    + The result of Enumerate is an error.
    + Enumerate succeeds, but no items are returned when they are known to
    exist.
    + Enumerate succeeds, but items for all those known to exist are not
    returned.

    A Dell technical white paper [0], "Lifecycle Controller Integration --
    Best Practices Guide", states that for Lifecycle Controller firmware
    1.5.0 and later "The Lifecycle Controller remote service must be in a
    'ready' state before running any other WSMAN commands." That applies to
    almost all of the workflows and use cases documented by that paper and
    supported by this project, openstack/python-dracclient.

    This patch defines a new method in class dracclient.client.WSManClient,
    enumerate(). It extends its base class's implementation by adding a new
    parameter that indicates whether or not it should wait until the iDRAC
    is ready to accept commands before issuing the Enumerate command. When
    it is true, that method waits until the iDRAC is ready before issuing
    the command. Since almost all Enumerate operations require the iDRAC to
    be ready, the new parameter's default value is 'True'.

    [0]
    http://en.community.dell.com/techcenter/extras/m/white_papers/20442332

    Change-Id: Ied659a4ee45b1dd55cd3a420301d866d52c838fb
    Partial-Bug: #1697558
    Related-Bug: #1691808

Changed in python-dracclient:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/487234
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=38d863e489db19b209854d3bf3481be6d7dc4ec1
Submitter: Jenkins
Branch: master

commit 38d863e489db19b209854d3bf3481be6d7dc4ec1
Author: Richard Pioso <email address hidden>
Date: Tue Jul 25 18:14:44 2017 -0400

    Simplify get Lifecycle Controller version

    A Dell technical white paper [0], "Lifecycle Controller Integration --
    Best Practices Guide", describes how to determine the Lifecycle
    Controller version. See section 31.4, "Check Version of Lifecycle
    Controller (LC)". It simply enumerates the DCIM_SystemView class. No
    filter query is used to limit the items returned. And notably, that use
    case does not require the LC remote service to be in a "ready" state.

    That use case is implemented by the
    dracclient.resource.lifecycle_controller.LifecycleControllerManagement.get_version()
    method. It has used a filter query and waited for the integrated Dell
    Remote Access Controller (iDRAC) to be ready. To align it with best
    practices, this patch eliminates its use of a filter query and wait for
    the iDRAC.

    [0]
    http://en.community.dell.com/techcenter/extras/m/white_papers/20442332

    Change-Id: I9a499522b59f18282fc9a57227570f54e46dfd3e
    Closes-Bug: #1697558

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-dracclient (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/488966

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-dracclient (master)

Reviewed: https://review.openstack.org/488966
Committed: https://git.openstack.org/cgit/openstack/python-dracclient/commit/?id=bcfe996deb829b8cfbe9394216c100b2772c770e
Submitter: Jenkins
Branch: master

commit bcfe996deb829b8cfbe9394216c100b2772c770e
Author: Richard Pioso <email address hidden>
Date: Fri Jul 28 18:58:26 2017 -0400

    Simplify wait_until_idrac_is_ready() calls

    This change simplifies the internal calls to
    dracclient.client.WSManClient.wait_until_idrac_is_ready() by no longer
    passing arguments. That makes the code cleaner and easier to understand.
    It contains no functional change.

    The arguments no longer need to be passed, because that function's
    default parameter values are now None, which means use the values that
    were provided when the WSManClient object was created. The default
    values provided at creation are equal to the arguments that were being
    explicitly passed.

    Change-Id: I70237bb9eda49a98c55a452b7f534a1e720696bb
    Related-Bug: #1697558

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.