tripleo should not query overcloud if never deployed to prevent extraneous error messages in swift/mistral

Bug #1730712 reported by Alex Schultz
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Dougal Matthews

Bug Description

If the overcloud is never deployed, the log collection process still tries to gather information about the overcloud which leads to errors showing up in the logs that are unrelated to the original issue. If the overcloud is doesn't exist we shouldn't try and query the information around it. Usually this manifests itself as mistral/swift errors around 'overcloud'

2017-11-07 16:30:51.711 3601 WARNING mistral.actions.openstack.base [req-05aad61b-993d-4317-9d99-ad261a5f2e43 316ee6a75a0b433fb641f542d30b28b9 6519354349814995a6776fa1afe9b824 - default default] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 117, in run
    result = method(**self._kwargs_for_run)
  File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1740, in head_container
    return self._retry(None, head_container, container, headers=headers)
  File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1678, in _retry
    service_token=self.service_token, **kwargs)
  File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 980, in head_container
    resp, 'Container HEAD failed', body)
ClientException: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found
: ClientException: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor [req-05aad61b-993d-4317-9d99-ad261a5f2e43 316ee6a75a0b433fb641f542d30b28b9 6519354349814995a6776fa1afe9b824 - default default] Failed to run action [action_ex_id=ce261538-2dd6-4dbc-9d1f-76d309b025a6, action_cls='<class 'mistral.actions.action_factory.SwiftAction'>', attributes='{u'client_method_name': u'head_container'}', params='{u'container': u'overcloud'}']
 SwiftAction.head_container failed: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found: ActionException: SwiftAction.head_container failed: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor Traceback (most recent call last):
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 109, in run_action
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor result = action.run(context.ctx())
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 130, in run
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor (self.__class__.__name__, self.client_method_name, str(e))
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor ActionException: SwiftAction.head_container failed: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found
2017-11-07 16:30:51.712 3601 ERROR mistral.executors.default_executor

summary: - quickstart shouldn't query overcloud if deployment didn't succeed
+ quickstart shouldn't query overcloud if never deployed
Changed in tripleo:
milestone: none → queens-3
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote : Re: quickstart shouldn't query overcloud if never deployed

It's too big effort for that purpose. Distinguishing of failed deployment and non-deployment, adding checks for all cases we query "overcloud", all these just for removing a few lines in mistral logs, it doesn't worth it.
Of course these two lines could be confusing first time, but never again.

Changed in tripleo:
status: Triaged → Won't Fix
Revision history for this message
Alex Schultz (alex-schultz) wrote :

@Sagi, You used the exact thing I was refering to in this bug to report a new bug. See Bug 1736137. This needs to get addressed for the exact reasons stated. It is confusing and causes everyone extra work to know that these errors may not be legitimate.

Changed in tripleo:
status: Won't Fix → Triaged
tags: added: ux
Revision history for this message
wes hayutin (weshayutin) wrote :

@Alex, your request seems pretty trivial.. in our collect logs role.. simply check for a ssh before trying to collect logs and error out appropriately.

An appropriate message from collect logs would be something like,
"the overcloud nodes are not accessible, please look for deployment failures on the undercloud"

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I updated the bug title to reflect that this may be a larger issue than just quickstart.

summary: - quickstart shouldn't query overcloud if never deployed
+ tripleo should not query overcloud if never deployed to prevent
+ extraneous error messages in swift/mistral
Revision history for this message
Toure Dunnon (toure) wrote :

Alex I will take a look into this.

Changed in tripleo:
assignee: nobody → Toure Dunnon (toure)
Revision history for this message
Dougal Matthews (d0ugal) wrote :

I am trying to get my head around this bug. Is it just that we don't like the following error to be in the mistral log file?

"ClientException: Container HEAD failed: http://192.168.24.1:8080/v1/AUTH_6519354349814995a6776fa1afe9b824/overcloud 404 Not Found"

This actually happens by design. When creating a plan, the workflow attempts to get the container to see if it exists. If it doesn't it continues with the creation, if it does exist it will exit and error.

https://github.com/openstack/tripleo-common/blob/master/workbooks/plan_management.yaml#L81-L87

Logically, it is similar to doing this in Python.

    try:
        container = swift.head_container("overcloud")
        # the container does exist, we should exit if the above line doesn't error
    except:
        # the container doesn't exist so we can create it

The only downside to this approach is that Mistral logs the exception, which is then alarming if anyone discovers it when debugging something else.

... or is it something totally different that I am missing?

Revision history for this message
Dougal Matthews (d0ugal) wrote :

To work around this, rather than getting the container, we can change the workflow to get a list of all containers and check to see if the name is in the list. It feels like a less correct way to check for existence, but it will stop the error showing up in the logs.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Would it be possible to just lower the logging on exception in the swift.head_container action to maybe WARNING?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528213

Changed in tripleo:
assignee: Toure Dunnon (toure) → Dougal Matthews (d0ugal)
status: Triaged → In Progress
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Dougal Matthews (d0ugal)
tags: added: workflows
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → Steven Hardy (shardy)
Changed in tripleo:
assignee: Steven Hardy (shardy) → Dougal Matthews (d0ugal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/528213
Reason: failed in gate

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/528213
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=b5d5cbab32d169e622ce30a2797c7400667bce11
Submitter: Zuul
Branch: master

commit b5d5cbab32d169e622ce30a2797c7400667bce11
Author: Dougal Matthews <email address hidden>
Date: Fri Dec 15 09:27:25 2017 +0000

    Verify the Swift container exists with a small utility workflow

    The previous method of `swift.head_container` worked well, but it caused
    Mistral to log the exception raised by swiftclient. This then left a red
    flag in the logs that confused users debugging.

    This new method checks for the container by listing all the containers
    in the account and checking for the name. We only consider containers
    that start with the full name we are looking for - Swift doesn't have an
    exact match, only a prefix filter.

    The workflow then can be used to create the container, capturing the
    logic that was duplicated in each individual workflow.

    Closes-Bug: #1730712
    Depends-On: I41649d15c57e16bffcf7870a52bc01177aae7cc8
    Change-Id: I4a6b5b9b31a4f76840a6c6070a1d733ceade5c64

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.2.0

This issue was fixed in the openstack/tripleo-common 9.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.