I've found that when running rabbitmq-server for openstack services (nova/neutron/cinder/etc) that we have a failure condition where the vhost for "openstack" hangs and we get service connection timeouts, but no alerts appear in nagios, and manually running the check_rabbitmq and check_rabbitmq_queues checks result in "OK" and "OK 10 test messages succeeded" type of outputs.
When running rabbitmq-server in a 3 server cluster scenario, that sometimes the status and cluster_status are functioning properly, and the /, <server1>, <server2>, and <server3> vhosts are functioning properly (can run rabbitmqctl list_queues -p / and get a return quickly), but sometimes openstack vhost hangs up and IP connections timeout to the openstack vhost for the nova/neutron/cinder api services. The only way we're able to detect this is with openstack-service-checks we've added to the bootstack clouds in order to ensure all agents which are admin-up are alive.
I would like to recommend that if we relate an openstack service to rabbitmq that a check is added for the openstack vhost for a functional test (either rabbitmqctl list_queues -p openstack not returning with timeout or other error, and having 1+ queues returned) or that the base check_rabbitmq get configured to exercise message send/retrieval on all configured vhosts in rabbitmqctl list_vhosts output.
This is happening on a xenial/ocata 17.08 cloud as well as trusty/mitaka 17.02.
in the hooks, we render the check_rabbit.py as '{}/check_ rabbitmq. py --user {} --password {} --vhost {}'.format( NAGIOS_ PLUGINS, user, password, vhost)
Adding a fresh config option for a space separated list of additional vhosts to monitor, and letting the charm create a specific user for that node with permissions in that vhost, should allow us to achieve this goal. In that plan, we add the 'openstack' vhost for monitoring, and people using the charm for things other than OpenStack are able to make use of the functionality too.
Drafting a change now.