Comment 11 for bug 1490523

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Here is RCA from the OpenStack point of view:

To reproduce the issue with just OpenStack, without deployment modifications:

1. Install MOS environment with Ceilometer. The deployment should have only one controller to ease further steps. The Ceilometer is needed because it triggers Fuel to populate [DEFAULT]/notification_driver parameter in /etc/keystone/keystone.conf
2. Firewall the RabbitMQ, so that keystone can not reach it. You can also optionally add invalid hosts to the rabbit_hosts parameter in keystone.conf.
3. Restart apache
4. Try to use any keystone command in a regular way, for instance 'openstack user list'. The command will either hang or return 503 after some time.
5. If you de-firewall RabbitMQ node, the Keystone will become responsive again.

The root cause is that keystone tries to send notifications about its actions, and there is no RabbitMQ alive where it could be sent. Oslo.messaging indefinitely in a loop tries to reconnect to RabbitMQ servers in the list and always fails (the errors could be seen in keystone log). This is a valid behaviour from the point of view of oslo.messaging, as it was designed not to loose notifications. Keeping in mind that there is already a deployment fix ready, there is no reason to change that.