Sometimes async_task_executor gets stuck and is unable to process tasks queued in rabbitMQ. Restarting the async_task_executor service fixes it.
The async_task_executor code is not handling exception caused by not being able to connect to C* cluster when checking table status in table_info. The exception falls back to oslo messaging rpc dispatcher. The message gets stuck in the queue.
Failed tasks should be requeud, or at least all exceptions should handled, instead of falling back to oslo messaging rpc dispatcher.
Oslo messaginf seems unable to re-dispatch the message and blocks all other messages in the queue.
The following log may be related.
ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: ('Unable to complete the operation against any hosts', {<Host: 192.168.19.241 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.240 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.243 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.242 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.238 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.239 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.234 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.235 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.236 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.237 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.232 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.233 datacenter1>: ConnectionException('Pool is shutdown',)})
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
incoming.message))
File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
return self._do_dispatch(endpoint, method, ctxt, args)
File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
result = getattr(endpoint, method)(ctxt, **new_args)
File "/usr/bin/magnetodb-async-task-executor", line 120, in create
self._table_info_repo.update(context, table_info, ["status"])
File "/usr/lib/python2.7/dist-packages/magnetodb/storage/table_info_repo/cassandra_impl.py", line 174, in update
"".join(query_builder), consistent=True
File "/usr/lib/python2.7/dist-packages/magnetodb/common/cassandra/cluster_handler.py", line 164, in execute_query
raise ex
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.168.19.241 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.240 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.243 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.242 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.238 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.239 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.234 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.235 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.236 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.237 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.232 datacenter1>: ConnectionException('Pool is shutdown',), <Host: 192.168.19.233 datacenter1>: ConnectionException('Pool is shutdown',)})
More details here https:/ /bugs.launchpad .net/magnetodb/ +bug/1412576