when zaqar/mistral timesout the traceback fails and the user has no idea what happens

Bug #1882134 reported by Alex Schultz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Alex Schultz

Bug Description

When a deployment timesout, the process fails with a json decode error. This is really confusing for users and we should catch the exception and print out some user friendly error.

Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1002, in take_action
    in_flight_validations=parsed_args.inflight)
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 378, in config_download
    for payload in base.wait_for_messages(workflow_client, ws, execution):
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/base.py", line 61, in wait_for_messages
    for payload in websocket.wait_for_messages(timeout=timeout):
  File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 153, in wait_for_messages
    message = self.recv()
  File "/usr/lib/python3.6/site-packages/tripleoclient/plugin.py", line 131, in recv
    return json.loads(self._ws.recv())
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Expecting value: line 1 column 1 (char 0)

Revision history for this message
Alex Schultz (alex-schultz) wrote :

for the record this only impacts Train and older versions as we removed mistral/zaqar from these things in Ussuri

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/733691

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/733691
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=ecf22416686ace0bfeb1631f15ba2978c3b8c929
Submitter: Zuul
Branch: stable/train

commit ecf22416686ace0bfeb1631f15ba2978c3b8c929
Author: Alex Schultz <email address hidden>
Date: Thu Jun 4 14:14:38 2020 -0600

    [TRAIN-AND-OLDER] Improve timeout error handling

    For many releases we have seen overall deployment and workflow exections
    that timeout throw a json decode error. This is usually because either
    the mistral execution completely failed (unhandled exception),
    something during the deployment hangs (bad network config), or the
    --timeout was less than the time it takes to run an action. If we get an
    exception waiting for timeouts that isn't already a websocket timeout or
    something to that effect, we should catch it and print some useful
    messaging that the user can use to begin their troubleshooting.

    Change-Id: Ie239f3fc11bbf95dc9af9786b288f6e8aef1193a
    Closes-Bug: #1882134

tags: added: in-stable-train
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/734084

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/734085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/734086

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/stein)

Reviewed: https://review.opendev.org/734084
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=ac5d1d906ac2cc7212273105df8455e5144abbc4
Submitter: Zuul
Branch: stable/stein

commit ac5d1d906ac2cc7212273105df8455e5144abbc4
Author: Alex Schultz <email address hidden>
Date: Thu Jun 4 14:14:38 2020 -0600

    [TRAIN-AND-OLDER] Improve timeout error handling

    For many releases we have seen overall deployment and workflow exections
    that timeout throw a json decode error. This is usually because either
    the mistral execution completely failed (unhandled exception),
    something during the deployment hangs (bad network config), or the
    --timeout was less than the time it takes to run an action. If we get an
    exception waiting for timeouts that isn't already a websocket timeout or
    something to that effect, we should catch it and print some useful
    messaging that the user can use to begin their troubleshooting.

    Change-Id: Ie239f3fc11bbf95dc9af9786b288f6e8aef1193a
    Closes-Bug: #1882134
    (cherry picked from commit ecf22416686ace0bfeb1631f15ba2978c3b8c929)

tags: added: in-stable-stein
tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/rocky)

Reviewed: https://review.opendev.org/734085
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=bd95d65fec565241e9f1f7ab38968be77ec54b82
Submitter: Zuul
Branch: stable/rocky

commit bd95d65fec565241e9f1f7ab38968be77ec54b82
Author: Alex Schultz <email address hidden>
Date: Thu Jun 4 14:14:38 2020 -0600

    [TRAIN-AND-OLDER] Improve timeout error handling

    For many releases we have seen overall deployment and workflow exections
    that timeout throw a json decode error. This is usually because either
    the mistral execution completely failed (unhandled exception),
    something during the deployment hangs (bad network config), or the
    --timeout was less than the time it takes to run an action. If we get an
    exception waiting for timeouts that isn't already a websocket timeout or
    something to that effect, we should catch it and print some useful
    messaging that the user can use to begin their troubleshooting.

    Change-Id: Ie239f3fc11bbf95dc9af9786b288f6e8aef1193a
    Closes-Bug: #1882134
    (cherry picked from commit ecf22416686ace0bfeb1631f15ba2978c3b8c929)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/queens)

Reviewed: https://review.opendev.org/734086
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=dc91fc41758257f0c240128328ce5b952ea2db82
Submitter: Zuul
Branch: stable/queens

commit dc91fc41758257f0c240128328ce5b952ea2db82
Author: Alex Schultz <email address hidden>
Date: Thu Jun 4 14:14:38 2020 -0600

    [TRAIN-AND-OLDER] Improve timeout error handling

    For many releases we have seen overall deployment and workflow exections
    that timeout throw a json decode error. This is usually because either
    the mistral execution completely failed (unhandled exception),
    something during the deployment hangs (bad network config), or the
    --timeout was less than the time it takes to run an action. If we get an
    exception waiting for timeouts that isn't already a websocket timeout or
    something to that effect, we should catch it and print some useful
    messaging that the user can use to begin their troubleshooting.

    Change-Id: Ie239f3fc11bbf95dc9af9786b288f6e8aef1193a
    Closes-Bug: #1882134
    (cherry picked from commit ecf22416686ace0bfeb1631f15ba2978c3b8c929)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient rocky-eol

This issue was fixed in the openstack/python-tripleoclient rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient queens-eol

This issue was fixed in the openstack/python-tripleoclient queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient stein-eol

This issue was fixed in the openstack/python-tripleoclient stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.