[wallaby][multinode][heat-ephemeral] Randomly Failing with Unexpected error occurred serving API: Timed out waiting for a reply to message ID

Bug #1930906 reported by yatin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Unassigned

Bug Description

Noticed in https://review.rdoproject.org/r/c/rdoinfo/+/33644 which tests tag releases for projects, in some rechecks job passed so it's random failure, fails like below, may be the issue was fixed but not yet tag released:-

2021-06-04 07:14:20 | 2021-06-04 07:14:20.461 172008 INFO tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Stopping ephemeral heat.
2021-06-04 07:14:20 | 5888bbe2118be17090df7b81ea6afc1efd965fd82ccd0db34126d89a61dbc337
2021-06-04 07:14:22 | 5888bbe2118be17090df7b81ea6afc1efd965fd82ccd0db34126d89a61dbc337
2021-06-04 07:14:22 | 2021-06-04 07:14:22.973 172008 ERROR tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] Exception occured while running the command: ValueError: Failed to deploy: ERROR: Internal Error
2021-06-04 07:14:22 | Traceback (most recent call last):
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2021-06-04 07:14:22 | return self._queues[msg_id].get(block=True, timeout=timeout)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2021-06-04 07:14:22 | return waiter.wait()
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2021-06-04 07:14:22 | return get_hub().switch()
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
2021-06-04 07:14:22 | return self.greenlet.switch()
2021-06-04 07:14:22 | queue.Empty
2021-06-04 07:14:22 |
2021-06-04 07:14:22 | During handling of the above exception, another exception occurred:
2021-06-04 07:14:22 |
2021-06-04 07:14:22 | Traceback (most recent call last):
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/api/middleware/fault.py", line 167, in process_request
2021-06-04 07:14:22 | return req.get_response(self.application)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1314, in send
2021-06-04 07:14:22 | application, catch_exc_info=False)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1278, in call_application
2021-06-04 07:14:22 | app_iter = application(self.environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/noauth.py", line 46, in __call__
2021-06-04 07:14:22 | return self.app(env, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 129, in __call__
2021-06-04 07:14:22 | resp = self.call_func(req, *args, **kw)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 193, in call_func
2021-06-04 07:14:22 | return self.func(req, *args, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/wsgi.py", line 630, in __call__
2021-06-04 07:14:22 | response = req.get_response(self.application)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1314, in send
2021-06-04 07:14:22 | application, catch_exc_info=False)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1278, in call_application
2021-06-04 07:14:22 | app_iter = application(self.environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 129, in __call__
2021-06-04 07:14:22 | resp = self.call_func(req, *args, **kw)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 193, in call_func
2021-06-04 07:14:22 | return self.func(req, *args, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/wsgi.py", line 630, in __call__
2021-06-04 07:14:22 | response = req.get_response(self.application)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1314, in send
2021-06-04 07:14:22 | application, catch_exc_info=False)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/request.py", line 1278, in call_application
2021-06-04 07:14:22 | app_iter = application(self.environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 143, in __call__
2021-06-04 07:14:22 | return resp(environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/routes/middleware.py", line 141, in __call__
2021-06-04 07:14:22 | response = self.app(environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 143, in __call__
2021-06-04 07:14:22 | return resp(environ, start_response)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 129, in __call__
2021-06-04 07:14:22 | resp = self.call_func(req, *args, **kw)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/webob/dec.py", line 193, in call_func
2021-06-04 07:14:22 | return self.func(req, *args, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/wsgi.py", line 923, in __call__
2021-06-04 07:14:22 | raise translate_exception(err, request.best_match_language())
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/wsgi.py", line 891, in __call__
2021-06-04 07:14:22 | request, **action_args)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/common/wsgi.py", line 964, in dispatch
2021-06-04 07:14:22 | return method(*args, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/api/openstack/v1/util.py", line 46, in handle_stack_method
2021-06-04 07:14:22 | return handler(controller, req, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/api/openstack/v1/stacks.py", line 653, in validate_template
2021-06-04 07:14:22 | ignorable_errors=ignorable_errors)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/rpc/client.py", line 412, in validate_template
2021-06-04 07:14:22 | version='1.36')
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/heat/rpc/client.py", line 89, in call
2021-06-04 07:14:22 | return client.call(ctxt, method, **kwargs)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
2021-06-04 07:14:22 | transport_options=self.transport_options)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
2021-06-04 07:14:22 | transport_options=transport_options)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
2021-06-04 07:14:22 | transport_options=transport_options)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2021-06-04 07:14:22 | call_monitor_timeout)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2021-06-04 07:14:22 | message = self.waiters.get(msg_id, timeout=timeout)
2021-06-04 07:14:22 | File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
2021-06-04 07:14:22 | 'to message ID %s' % msg_id)
2021-06-04 07:14:22 | oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID c63b800b2bec41c08542a73ebdc35eb7

Logs:-
https://logserver.rdoproject.org/44/33644/17/check/rdoinfo-tripleo-wallaby-testing-centos-8-containers-multinode/389648c/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
https://logserver.rdoproject.org/44/33644/17/check/rdoinfo-tripleo-wallaby-testing-centos-8-containers-multinode/389648c/logs/undercloud/home/zuul/overcloud-deploy/overcloud/heat-launcher/log/heat-api.log.txt.gz

Revision history for this message
Rabi Mishra (rabi) wrote :

python3-tripleoclient-16.2.0-1.el8.noarch[1] does not have https://review.opendev.org/c/openstack/python-tripleoclient/+/791856

heat-api is processing requests before the queues are created, heat-engines listening on them. When it's trying to put messages in the non-existing queues requests would timeout.

We probably need a new tagged release of tripleoclient and tripleo-common for CBS?

[1]
python3-tripleoclient noarch 16.2.0-1.el8 temp-cloud8s-openstack-wallaby-testing 533 k

yatin (yatinkarel)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
milestone: none → xena-1
Revision history for this message
yatin (yatinkarel) wrote :

Yes with latest wallaby commits it works:- https://review.rdoproject.org/r/c/rdoinfo/+/33962

<< We probably need a new tagged release of tripleoclient and tripleo-common for CBS?
Yes, that would be needed to clear this.

Revision history for this message
Marios Andreou (marios-b) wrote :

ack ykarel thanks for ping looks like we *just* missed that commit in the last wallaby release (it is the top commit at https://github.com/openstack/python-tripleoclient/compare/16.2.0...stable/wallaby)

I have just posted stable/wallaby tags for tripleo-common & python-tripleoclient there https://review.opendev.org/c/openstack/releases/+/795126 Release wallaby tripleo-common python-tripleoclient

Revision history for this message
yatin (yatinkarel) wrote :

New tags have cleared the failing wallaby multinode job: https://review.rdoproject.org/r/c/rdoinfo/+/33644, closing the bug. Thanks marios.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
yatin (yatinkarel) wrote :

<< New tags have cleared the failing wallaby multinode job: https://review.rdoproject.org/r/c/rdoinfo/+/33644, closing the bug. Thanks marios.

So multinode was clear but with the new tags mariadb have issues, due to missing tht patch https://review.opendev.org/c/openstack/tripleo-heat-templates/+/792667, so would also need a tripleo-heat-templates release atleast.

Revision history for this message
yatin (yatinkarel) wrote :

<< So multinode was clear but with the new tags mariadb have issues, due to missing tht patch https://review.opendev.org/c/openstack/tripleo-heat-templates/+/792667, so would also need a tripleo-heat-templates release atleast.
Would also need tripleo-ansible release along with tripleo-heat-templates, tested in https://review.rdoproject.org/r/c/rdoinfo/+/33644

Revision history for this message
yatin (yatinkarel) wrote :

<< Would also need tripleo-ansible release along with tripleo-heat-templates, tested in https://review.rdoproject.org/r/c/rdoinfo/+/33644
Requested https://review.opendev.org/c/openstack/releases/+/795924

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.