No useful feedback when 'node provide' fails

Bug #1620949 reported by Julie Pichon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Julie Pichon

Bug Description

Currently, if 'openstack node provide <uuid>' fails, the user isn't given any useful feedback to fix it.

$ openstack overcloud node provide 8aa420cf-0268-4bea-ae9b-a24889eccf49
Failed to set nodes to available state: Failed to set nodes to available.

Using the --debug flag doesn't offer additional hints.

In the Mistral server logs, there is this:

BadRequest: The requested action "provide" can not be performed on node "8aa420cf-0268-4bea-ae9b-a24889eccf49" while it is in state "available". (HTTP 400)

We should try to surface something like that so the user knows what to try next.

Revision history for this message
Julie Pichon (jpichon) wrote :
Download full text (3.6 KiB)

Because the failure is more likely to come from a sub-workflow in 'provide', the simple payload message formatting we usually do is a bit too rough to be useful:

$ openstack overcloud node provide 45dda8e9-f553-4301-ab0e-5c465a425136
Failed to set nodes to available state: [{u'result': u'Failure caused by error in tasks: set_provision_state\n\n set_provision_state [task_ex_id=d7d9ff22-a1f9-4489-8309-9c30ca94dc7a] -> Failed to run action [action_ex_id=6d31e9b1-aa70-4871-989f-5fb86d7becfd, action_cls=\'<class \'mistral.actions.action_factory.IronicAction\'>\', attributes=\'{u\'client_method_name\': u\'node.set_provision_state\'}\', params=\'{u\'state\': u\'provide\', u\'node_uuid\': u\'45dda8e9-f553-4301-ab0e-5c465a425136\', u\'configdrive\': None, u\'cleansteps\': None}\']\n IronicAction.node.set_provision_state failed: <class \'ironicclient.common.apiclient.exceptions.BadRequest\'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".\n [action_ex_id=6d31e9b1-aa70-4871-989f-5fb86d7becfd, idx=0]: Failed to run action [action_ex_id=6d31e9b1-aa70-4871-989f-5fb86d7becfd, action_cls=\'<class \'mistral.actions.action_factory.IronicAction\'>\', attributes=\'{u\'client_method_name\': u\'node.set_provision_state\'}\', params=\'{u\'state\': u\'provide\', u\'node_uuid\': u\'45dda8e9-f553-4301-ab0e-5c465a425136\', u\'configdrive\': None, u\'cleansteps\': None}\']\n IronicAction.node.set_provision_state failed: <class \'ironicclient.common.apiclient.exceptions.BadRequest\'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".\n'}]

A bit of simple massaging leads to this (for each failed node):

$ openstack overcloud node provide 45dda8e9-f553-4301-ab0e-5c465a425136
Failed to set nodes to available state:
Failure caused by error in tasks: set_provision_state

  set_provision_state [task_ex_id=472bf915-7a0d-471d-8493-9e104c8fd7c6] -> Failed to run action [action_ex_id=9671e995-218a-43f2-adbf-bc5b8093bbfa, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'provide', u'node_uuid': u'45dda8e9-f553-4301-ab0e-5c465a425136', u'configdrive': None, u'cleansteps': None}']
 IronicAction.node.set_provision_state failed: <class 'ironicclient.common.apiclient.exceptions.BadRequest'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".
    [action_ex_id=9671e995-218a-43f2-adbf-bc5b8093bbfa, idx=0]: Failed to run action [action_ex_id=9671e995-218a-43f2-adbf-bc5b8093bbfa, action_cls='<class 'mistral.actions.action_factory.IronicAction'>', attributes='{u'client_method_name': u'node.set_provision_state'}', params='{u'state': u'provide', u'node_uuid': u'45dda8e9-f553-4301-ab0e-5c465a425136', u'configdrive': None, u'cleansteps': None}']
 IronicAction.node.set_provision_state failed: <class 'ironicclient.common.apiclient.exceptions.BadRequest'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c...

Read more...

Revision history for this message
Julie Pichon (jpichon) wrote :

In this example the error comes from Ironic and is then passed through to the set_node_state workflow, then to the provide workflow. It makes it difficult to handle from the workflow directly, as sending the sub-task result directly seems to fail depending on the type/number of errors, which workflow it actually comes from or if there's a mix of success and failures ("YaqlEvaluationException: Can not evaluate YAQL expression: task(set_nodes_available).result.result"). I might try some super simple string parsing to shorten it to the last element of the message anyway...

$ openstack overcloud node provide 45dda8e9-f553-4301-ab0e-5c465a425136
Failed to set nodes to available state:
 IronicAction.node.set_provision_state failed: <class 'ironicclient.common.apiclient.exceptions.BadRequest'>: The requested action "provide" can not be performed on node "45dda8e9-f553-4301-ab0e-5c465a425136" while it is in state "available".

I tentatively opened https://bugs.launchpad.net/mistral/+bug/1621418 so perhaps we can handle this more gracefully and directly in the workflow in the future.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/367553

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
assignee: Julie Pichon (jpichon) → Dougal Matthews (d0ugal)
Julie Pichon (jpichon)
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → Julie Pichon (jpichon)
Changed in tripleo:
milestone: newton-rc1 → newton-rc2
Changed in tripleo:
assignee: Julie Pichon (jpichon) → Dougal Matthews (d0ugal)
assignee: Dougal Matthews (d0ugal) → Julie Pichon (jpichon)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/367552
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=341f5d83246be75119e96cc04ac3e0dfa62d9058
Submitter: Jenkins
Branch: master

commit 341f5d83246be75119e96cc04ac3e0dfa62d9058
Author: Julie Pichon <email address hidden>
Date: Thu Sep 8 11:56:08 2016 +0100

    Provide: return task result in case of failure

    Currently there is no way for the workflow user to figure out what went
    wrong when the workflow fails, without looking at the Mistral server
    logs.

    Change-Id: I8da43e4ff76488fc5cdb7bd2efa0cf9c39e7bb5e
    Partial-Bug: #1620949

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.openstack.org/367553
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=0573b605a2942ab18317ecebfc419cac4e0f1c9f
Submitter: Jenkins
Branch: master

commit 0573b605a2942ab18317ecebfc419cac4e0f1c9f
Author: Julie Pichon <email address hidden>
Date: Thu Sep 8 17:18:53 2016 +0100

    Provide more information when 'node provide' fails

    Because 'provide' calls to other workflows, the error message that
    surfaces back can be quite difficult to parse. This attempts to make
    the message more readable.

    Change-Id: Iae3f29e5da25177fdee45752410f92b064c874c3
    Depends-On: I8da43e4ff76488fc5cdb7bd2efa0cf9c39e7bb5e
    Closes-Bug: #1620949

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/374670

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/newton)

Reviewed: https://review.openstack.org/374670
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=322ccb789683c397bd8dfcb8746476d68d8960b9
Submitter: Jenkins
Branch: stable/newton

commit 322ccb789683c397bd8dfcb8746476d68d8960b9
Author: Julie Pichon <email address hidden>
Date: Thu Sep 8 17:18:53 2016 +0100

    Provide more information when 'node provide' fails

    Because 'provide' calls to other workflows, the error message that
    surfaces back can be quite difficult to parse. This attempts to make
    the message more readable.

    Change-Id: Iae3f29e5da25177fdee45752410f92b064c874c3
    Depends-On: I8da43e4ff76488fc5cdb7bd2efa0cf9c39e7bb5e
    Closes-Bug: #1620949
    (cherry picked from commit 0573b605a2942ab18317ecebfc419cac4e0f1c9f)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 5.2.0

This issue was fixed in the openstack/python-tripleoclient 5.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 5.5.0

This issue was fixed in the openstack/python-tripleoclient 5.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.