Mirantis OpenStack

Comment 0 for bug 1572085

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2016-04-19:

Version: 9.0

Steps to reproduce:
1. Deploy environment MOS environment.
2. Run some tests on it (exact cause is unknown yet)

Expected results:
All logs are clean

Actual results:
In one of OpenStack components log you find a lot of exceptions like
NotFound: Basic.consume: (404) NOT_FOUND - no queue 'reply_4b5920a6600d4d779c61c1a82dd7b81a' in vhost '/'
(full stack trace from neutron-server logs - http://paste.openstack.org/show/494399/)

This indicates that process lost a queue it was listening on and the situation does not end by itself. Loosing a queue has an impact that server stops processing messages from it, which might be crucial to its work (depends on the queue).

In rabbit logs on node-61 one grep find the following entries:
http://paste.openstack.org/show/494589/
Note the pattern - first two queue.declare operations timed out and then basic.consume fail in endless loop.

It seems that RabbitMQ failed to create the queue due to overload or something and oslo.messaging did not notice that. Unfortunately the relevant neutron-server logs were already rotated, so it is not clear what happened in oslo.messaging at the time of the queue declaration.