mistral MessagingTimeout correlates with containerized undercloud uptime
Bug #1789680 reported by
John Fulton
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Dougal Matthews |
Bug Description
Even the simplest Mistral actions have MessagingTimeout on a containerized Rocky RC1 undercloud with an uptime of approximately 48 hours as reported by three independent users. For example:
(undercloud) [stack@undercloud ~]$ mistral run-action std.echo '{"output": "Hello Workflow!"}'
ERROR (app) MessagingTimeout: Timed out waiting for a reply to message ID 6e089e1b7fc7495
(undercloud) [stack@undercloud ~]$
Rebooting the undercloud works around the problem for two out of the three reports.
This affects overcloud deployment as 'openstack overcloud deploy ...' results in a similar MessagingTimeout error.
Changed in tripleo: | |
assignee: | nobody → Toure Dunnon (toure) |
Changed in tripleo: | |
milestone: | rocky-rc2 → stein-1 |
tags: | added: rocky-backport-potential |
Changed in tripleo: | |
status: | Triaged → Fix Committed |
Changed in tripleo: | |
status: | Fix Committed → Triaged |
Changed in tripleo: | |
assignee: | Bogdan Dobrelya (bogdando) → Toure Dunnon (toure) |
Changed in tripleo: | |
assignee: | Toure Dunnon (toure) → Bogdan Dobrelya (bogdando) |
Changed in tripleo: | |
assignee: | Bogdan Dobrelya (bogdando) → Dougal Matthews (d0ugal) |
tags: | removed: alert |
Changed in tripleo: | |
status: | In Progress → Fix Committed |
Changed in tripleo: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
dtantsur: oh, something I missed in mistral-api initially: ERROR
oslo. messaging. _drivers. impl_rabbit [req-6399e1f7- 19b8-4a8d- af99-bc431ae913 66
66fbd53f2148 4b49b76674b8b02 0a313 fdaa0a59b31e411 9ab7c80f1096b02 cd - default f412-4789- a948-bfea12b453 8f] AMQP server on
undercloud. internalapi. localdomain: 5672 is unreachable: [Errno 104] Connection
default] [4166ca76-
reset by peer. Trying again in 1 seconds.: error: [Errno 104]
jtomasek: fultonj, dtantsur: <shardy>Iit seems that mistral tries to create the execution
then that "no threads" RPC INFO happens, then eventually the RPC times out
jtomasek: fultonj, dtantsur: my observation - happens with any mistral action or workflow,
workflow itself actually executes without problem but the response never arrives back
and the command which initiated it timeouts