[RFE] Ironic-set maintenance conditions should be different than operator-set maintenance

Bug #1596107 reported by Jay Faulkner
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Wishlist
Kaifeng Wang

Bug Description

Currently, Ironic's API and data model makes no differentiation between Ironic setting maintenance on a node (for instance, when cleaning fails, or when the power status loop cannot contact the BMC) and an operator setting maintenance on a node (an explicit API action).

I propose to add a field on the node so that Ironic sees a fault on a node, it explicitly identifies it as detected by Ironic, separate from operator set maintenance and maintenance_reason.

This new field on the node would not be boolean, as the current maintenance mode is, but instead should explicitly indicate what kind of issue is going on. This would enable periodic tasks to be created to clean up specific kinds of failures. The classic example of this would be an unreachable BMC caused by temporary network issues. The machine could be put into a fault state indicating the BMC is unreachable, and periodically a task could attempt to reestablish contact with the BMC. If it does, the node could be removed from the fault state. This pattern could be completed for several different types of faults, including cleaning failures caused by timeout while waiting for an agent to heartbeat.

Tags: needs-spec rfe
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/334113

Dmitry Tantsur (divius)
Changed in ironic:
status: New → Confirmed
summary: - [RFC] Ironic-set maintenance conditions should be different than
+ [RFE] Ironic-set maintenance conditions should be different than
operator-set maintenance
tags: added: needs-spec
Revision history for this message
Ramamani Yeleswarapu (ramamani-yeleswarapu) wrote :

Hi, anyone working on this RFE/spec? Thanks.

Revision history for this message
Kaifeng Wang (kaifeng) wrote :

Hi, Ramamani, it seems that no one is working on it currently, it would be great if you plan to do it, or I can have a try.

Revision history for this message
Kaifeng Wang (kaifeng) wrote :

After talking with Jay, Julia and Ruby on the irc, it's certain that this work is not in progress now.
Sadly, Jay does not gonna working on it in the near future, I do want this spec to be implemented though. So I assign this bug to myself, to try to make that happen.

Well, this work is huge to me, my first though is breaking the scope defined in the spec into several relatively small features, that would make it easier to get it done. Thanks Jay for the idea and precious work, I'm sure he will do it much better than me, and thanks for the support from ironic team.

Changed in ironic:
assignee: nobody → Wang KaiFeng (kaifeng)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/501697

Changed in ironic:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic-specs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/553308

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic-specs (master)

Reviewed: https://review.openstack.org/553308
Committed: https://git.openstack.org/cgit/openstack/ironic-specs/commit/?id=c20caae1f7c8fb08d0157f8391eb29977bb0a52f
Submitter: Zuul
Branch: master

commit c20caae1f7c8fb08d0157f8391eb29977bb0a52f
Author: Kaifeng Wang <email address hidden>
Date: Thu Mar 15 17:44:32 2018 +0800

    Support power fault recovery

    This spec proposes adding a new node field to differentiate different
    maintenance types. And, if possible, recover node from maintenance
    state if the node is in maintenance due to power sync failure, and power
    sync is succeed.

    Change-Id: Ief33a1f8ac751f51279ad9ec2ef39ed5a363a175
    Related-Bug: #1596107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/555708

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/556015

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Kaifeng Wang (<email address hidden>) on branch: master
Review: https://review.openstack.org/501697
Reason: superseded by https://review.openstack.org/#/c/555708/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/556758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-ironicclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/556774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/558152

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-tempest-plugin (master)

Fix proposed to branch: master
Review: https://review.openstack.org/558170

Changed in ironic:
assignee: Kaifeng Wang (kaifeng) → Julia Kreger (juliaashleykreger)
Changed in ironic:
assignee: Julia Kreger (juliaashleykreger) → Kaifeng Wang (kaifeng)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/556758
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=0a1b165ba5e05c345523db34ceb121cefc8a665c
Submitter: Zuul
Branch: master

commit 0a1b165ba5e05c345523db34ceb121cefc8a665c
Author: Kaifeng Wang <email address hidden>
Date: Tue Mar 27 13:20:48 2018 +0800

    Power fault recovery: apply fault

    This patch implements setting and using the fault field.

    For each case currently maintenance is set to True, the fault is set
    accordingly. A periodic task is added to check power state for nodes
    in maintenance due to power failure, maintenance is cleared if the
    power state of a node can be retrieved.

    When a node is taken out of maintenance by user, the fault is
    cleared (if there is any).

    Story: #1596107
    Task: #10469

    Change-Id: Ic4ab20af9022a2d06bdac567e7a098f3ba08570a
    Partial-Bug: #1596107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/556015
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=b4c4eb99fcaef6b84ed9f3b4998f3a5d8b141eec
Submitter: Zuul
Branch: master

commit b4c4eb99fcaef6b84ed9f3b4998f3a5d8b141eec
Author: Kaifeng Wang <email address hidden>
Date: Sat Mar 24 15:32:53 2018 +0800

    Power fault recovery: API implementation

    This patch exposes fault field to the API node object,
    microversion and compatibility is handled.

    Story: #1596107
    Task: #10469

    Change-Id: I31ed332be12cf98baaf01badcbb09ae4b8c6cae9
    Partial-Bug: #1596107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/558152
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=0c29837c3a1927d36ffda492c71caf9d1d6138ff
Submitter: Zuul
Branch: master

commit 0c29837c3a1927d36ffda492c71caf9d1d6138ff
Author: Kaifeng Wang <email address hidden>
Date: Mon Apr 2 11:12:09 2018 +0800

    Power fault recovery: Notification objects

    This patch exposes fault field for related notification objects.

    Story: #1596107
    Task: #10469

    Change-Id: Iee50985846fbe8e529613d69645c283d4fe1e380
    Partial-Bug: #1596107

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-ironicclient (master)

Reviewed: https://review.openstack.org/556774
Committed: https://git.openstack.org/cgit/openstack/python-ironicclient/commit/?id=78902bfd0c56ba08642cd1ec0b21408c19ab2839
Submitter: Zuul
Branch: master

commit 78902bfd0c56ba08642cd1ec0b21408c19ab2839
Author: Kaifeng Wang <email address hidden>
Date: Tue Mar 27 15:52:19 2018 +0800

    Power fault recovery: client support

    This patch adds codes to support the fault field exposed from ironic API.
    Querying nodes with specified fault is also supported as well.

    Story: #1596107
    Task: #10469
    Partial-Bug: #1596107

    Depends-On: https://review.openstack.org/556015/

    Change-Id: I429df0ab5ea39140a2b988d5dfdacb24a67b955e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-tempest-plugin (master)

Reviewed: https://review.openstack.org/558170
Committed: https://git.openstack.org/cgit/openstack/ironic-tempest-plugin/commit/?id=2e64cdbc30cc1c3a976f37f6f9badf42805c390c
Submitter: Zuul
Branch: master

commit 2e64cdbc30cc1c3a976f37f6f9badf42805c390c
Author: Kaifeng Wang <email address hidden>
Date: Mon Apr 2 15:22:59 2018 +0800

    Power fault recovery: tempest tests

    Add tempest tests to check whether fault field is available
    between microversions.

    Story: #1596107
    Task: #10469

    Depends-On: https://review.openstack.org/#/c/574718

    Change-Id: I6415d6f84840b601d55c6ce515cc1edeca9fd185
    Closes-Bug: #1596107

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-tempest-plugin 1.2.0

This issue was fixed in the openstack/ironic-tempest-plugin 1.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic-specs (master)

Change abandoned by "Jay Faulkner <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/ironic-specs/+/334113
Reason: I no longer intend on pursuing this spec, and it's extremely bitrotted at this point. I'd like to request anyone pursuing this in the future create a fresh spec.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.