mistral raises 'Enviornment not found' when the stack name is called by id instead of name

Bug #1640933 reported by Harry Rybacki on 2016-11-10
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Medium
Adriano Petrich

Bug Description

While attempting to delete a compute node from an overcloud mistral fails to retrieve the environment and subsequently fails to delete the desired node.

From the undercloud:

[stack@undercloud ~]$ openstack stack list
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| ID | Stack Name | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+-----------------+----------------------+----------------------+
| 06b0022a-7819-41c3-a5f9-f097a880e905 | overcloud | UPDATE_COMPLETE | 2016-11-10T13:56:22Z | 2016-11-10T15:02:14Z |
+--------------------------------------+------------+-----------------+----------------------+----------------------+

## Note: stack id to remove node from == 06b0022a-7819-41c3-a5f9-f097a880e905

[stack@undercloud ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| 0bdd1d59-672c-4559-8746-23d1c96bb90e | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.7 |
| c8167eae-3695-439c-953f-42cd3df89d93 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.168.24.13 |
| e3c9c710-57c6-4cae-b0f9-79cb1c4d274d | overcloud-novacompute-1 | ACTIVE | - | Running | ctlplane=192.168.24.6 |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+

## Note: node to delete == overcloud-novacompute-0, id == c8167eae-3695-439c-953f-42cd3df89d93

[stack@undercloud ~]$ cat delete-node.sh
#! /bin/bash

source ./stackrc

openstack overcloud node delete --debug --stack 06b0022a-7819-41c3-a5f9-f097a880e905 --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /home/stack/network-environment.yaml c8167eae-3695-439c-953f-42cd3df89d93

[stack@undercloud ~]$ ./delete-node.sh

WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
START with options: [u'overcloud', u'node', u'delete', u'--debug', u'--stack', u'06b0022a-7819-41c3-a5f9-f097a880e905', u'--templates', u'-e', u'/usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml', u'-e', u'/usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml', u'-e', u'/home/stack/network-environment.yaml', u'c8167eae-3695-439c-953f-42cd3df89d93']

<SNIP>

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 135, in run
    ret_val = super(OpenStackShell, self).run(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 267, in run
    result = self.run_subcommand(remainder)
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 180, in run_subcommand
    ret_value = super(OpenStackShell, self).run_subcommand(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 387, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 59, in run
    return self.take_action(parsed_args) or 0
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_node.py", line 71, in take_action
    scale.scale_down(clients, parsed_args.stack, parsed_args.nodes)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/scale.py", line 58, in scale_down
    delete_node(clients, **workflow_input)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/scale.py", line 37, in delete_node
    assert message['status'] == "SUCCESS", pprint.pformat(message)
AssertionError: {u'execution': {u'id': u'a774b251-3d5c-43dd-a621-d27f77f558f8',
                u'input': {u'container': u'06b0022a-7819-41c3-a5f9-f097a880e905',
                           u'nodes': [u'c8167eae-3695-439c-953f-42cd3df89d93'],
                           u'queue_name': u'87ebd692-8c08-48a4-8b85-2ed89f736d79',
                           u'timeout': 240},
                u'name': u'tripleo.scale.v1.delete_node',
                u'params': {},
                u'spec': {u'description': u'deletes given overcloud nodes and updates the stack',
                          u'input': [u'container',
                                     u'nodes',
                                     {u'timeout': 240},
                                     {u'queue_name': u'tripleo'}],
                          u'name': u'delete_node',
                          u'tasks': {u'delete_node': {u'action': u'tripleo.scale.delete_node nodes=<% $.nodes %> timeout=<% $.timeout %> container=<% $.container %>',
                                                      u'name': u'delete_node',
                                                      u'on-error': u'set_delete_node_failed',
                                                      u'on-success': u'send_message',
                                                      u'type': u'direct',
                                                      u'version': u'2.0'},
                                     u'send_message': {u'action': u'zaqar.queue_post',
                                                       u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>',
                                                                                                       u'message': u"<% $.get('message', '') %>",
                                                                                                       u'status': u"<% $.get('status', 'SUCCESS') %>"},
                                                                                          u'type': u'tripleo.scale.v1.delete_node'}},
                                                                  u'queue_name': u'<% $.queue_name %>'},
                                                       u'name': u'send_message',
                                                       u'retry': u'count=5 delay=1',
                                                       u'type': u'direct',
                                                       u'version': u'2.0'},
                                     u'set_delete_node_failed': {u'name': u'set_delete_node_failed',
                                                                 u'on-success': u'send_message',
                                                                 u'publish': {u'message': u'<% task(delete_node).result %>',
                                                                              u'status': u'FAILED'},
                                                                 u'type': u'direct',
                                                                 u'version': u'2.0'}},
                          u'version': u'2.0'}},
 u'message': u"Failed to run action [action_ex_id=df2f7f0f-6d35-4129-be31-e03b33ccc13f, action_cls='<class 'mistral.actions.action_factory.ScaleDownAction'>', attributes='{}', params='{u'nodes': [u'c8167eae-3695-439c-953f-42cd3df89d93'], u'container': u'06b0022a-7819-41c3-a5f9-f097a880e905', u'timeout': 240}']\n Environment not found [name=06b0022a-7819-41c3-a5f9-f097a880e905]",
 u'status': u'FAILED'}

END return value: 1

## Additional notes:

Full log from ^^ available here[1].
Relevant journalctl logs here[2]. Note the error starting at line ~166

[1] - https://paste.fedoraproject.org/477444/79992414/
[2] - https://paste.fedoraproject.org/477464/14788013/

summary: - ment not found' for id of an existing stack while attempting to delete a
- node
+ mistral raises 'Enviornment not found' for id of an existing stack while
+ attempting to delete a node

Moving this over to tripleo. I don't think it is specific to Mistral, but rather how TripleO uses mistral.

affects: mistral → tripleo
Harry Rybacki (hrybacki-h) wrote :

The issue is that in my call to delete the node I am passing the <stack id> rather than the <stack name> as noted in the current tripleo docs[1]. This change in behavior must have happened between Mitaka and Newton -- our periodic scale jobs (using <stack id> rather than <stack name>) for Liberty and Mitaka are still passing fine.

I would recommend that we either a) Allow for Mistral to determine environment either by the stack name or the (heat) stack id or b) Improve the debug message to be a little clearer/verbose/helpful.

[1] - http://docs.openstack.org/developer/tripleo-docs/post_deployment/delete_nodes.html#deleting-overcloud-nodes

Dougal Matthews (d0ugal) on 2016-11-11
Changed in tripleo:
status: New → Confirmed
importance: Undecided → Medium
tags: added: tripleo-common workflows
Changed in tripleo:
assignee: nobody → Dougal Matthews (d0ugal)
tags: added: tripleoclient
removed: tripleo-common
summary: - mistral raises 'Enviornment not found' for id of an existing stack while
- attempting to delete a node
+ mistral raises 'Enviornment not found' when the stack name is wrong
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → Adriano Petrich (apetrich)
Dougal Matthews (d0ugal) on 2016-11-16
Changed in tripleo:
milestone: none → ocata-2

Fix proposed to branch: master
Review: https://review.openstack.org/398289

Changed in tripleo:
status: Confirmed → In Progress

Addressed this in two patches

one to improve the output of the failed task
https://review.openstack.org/#/c/398226/

and one to make the client do a search for the stack name for mistral
https://review.openstack.org/#/c/398289/

summary: - mistral raises 'Enviornment not found' when the stack name is wrong
+ mistral raises 'Enviornment not found' when the stack name is called by
+ id instead of name

Reviewed: https://review.openstack.org/398226
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=37a4a6f6ae16b26eb0e99086ffedb1011355847b
Submitter: Jenkins
Branch: master

commit 37a4a6f6ae16b26eb0e99086ffedb1011355847b
Author: Adriano Petrich <email address hidden>
Date: Wed Nov 16 09:37:20 2016 +0000

    Give better output on scale failures

    Currently when there's a failure during scale the output is
    pprinted json that is confusing and uninstrutive

    Printing the return message message attribute gives all the important
    status needed from that json

    Change-Id: I6e81f5812895f50209d1dc7a35c4f8fbd2447926
    Partial-Bug: #1640933

Reviewed: https://review.openstack.org/398289
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=a3012ca424747ae815ed0b72184fa465765ccb9a
Submitter: Jenkins
Branch: master

commit a3012ca424747ae815ed0b72184fa465765ccb9a
Author: Adriano Petrich <email address hidden>
Date: Wed Nov 16 10:56:36 2016 +0000

    Use stack name or id for backwards compatibility

    Heat used to accept either stack name or id

    Scale nodes in the documentation and in the argparse usage
    states that stacks can be identified by name or id but mistral
    only accepts stack names.

    This makes the client accept names or ids and pass names for the
    mistral workflow

    Change-Id: If7527e36c1e5d2214dc155392a2e3750b38ec365
    Closes-Bug: #1640933

Changed in tripleo:
status: In Progress → Fix Released
Julie Pichon (jpichon) on 2016-12-01
tags: added: newton-backport-potential
removed: newton

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/406099

Reviewed: https://review.openstack.org/406093
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=58dfb5196e08261938ec46c1f0c9573ac30d8777
Submitter: Jenkins
Branch: stable/newton

commit 58dfb5196e08261938ec46c1f0c9573ac30d8777
Author: Adriano Petrich <email address hidden>
Date: Wed Nov 16 09:37:20 2016 +0000

    Give better output on scale failures

    Currently when there's a failure during scale the output is
    pprinted json that is confusing and uninstrutive

    Printing the return message message attribute gives all the important
    status needed from that json

    Change-Id: I6e81f5812895f50209d1dc7a35c4f8fbd2447926
    Partial-Bug: #1640933
    (cherry picked from commit 37a4a6f6ae16b26eb0e99086ffedb1011355847b)

tags: added: in-stable-newton

Reviewed: https://review.openstack.org/406099
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=6097c132c744fd233741a774b0cfa70aca31e6b3
Submitter: Jenkins
Branch: stable/newton

commit 6097c132c744fd233741a774b0cfa70aca31e6b3
Author: Adriano Petrich <email address hidden>
Date: Wed Nov 16 10:56:36 2016 +0000

    Use stack name or id for backwards compatibility

    Heat used to accept either stack name or id

    Scale nodes in the documentation and in the argparse usage
    states that stacks can be identified by name or id but mistral
    only accepts stack names.

    This makes the client accept names or ids and pass names for the
    mistral workflow

    Change-Id: If7527e36c1e5d2214dc155392a2e3750b38ec365
    Closes-Bug: #1640933
    (cherry picked from commit a3012ca424747ae815ed0b72184fa465765ccb9a)

This issue was fixed in the openstack/python-tripleoclient 5.6.0 release.

This issue was fixed in the openstack/python-tripleoclient 5.4.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related blueprints