nrpe checks do not account for checking functionality of all vhosts

Bug #1730507 reported by Drew Freiberger on 2017-11-06
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack rabbitmq-server charm
Medium
Xav Paice

Bug Description

I've found that when running rabbitmq-server for openstack services (nova/neutron/cinder/etc) that we have a failure condition where the vhost for "openstack" hangs and we get service connection timeouts, but no alerts appear in nagios, and manually running the check_rabbitmq and check_rabbitmq_queues checks result in "OK" and "OK 10 test messages succeeded" type of outputs.

When running rabbitmq-server in a 3 server cluster scenario, that sometimes the status and cluster_status are functioning properly, and the /, <server1>, <server2>, and <server3> vhosts are functioning properly (can run rabbitmqctl list_queues -p / and get a return quickly), but sometimes openstack vhost hangs up and IP connections timeout to the openstack vhost for the nova/neutron/cinder api services. The only way we're able to detect this is with openstack-service-checks we've added to the bootstack clouds in order to ensure all agents which are admin-up are alive.

I would like to recommend that if we relate an openstack service to rabbitmq that a check is added for the openstack vhost for a functional test (either rabbitmqctl list_queues -p openstack not returning with timeout or other error, and having 1+ queues returned) or that the base check_rabbitmq get configured to exercise message send/retrieval on all configured vhosts in rabbitmqctl list_vhosts output.

This is happening on a xenial/ocata 17.08 cloud as well as trusty/mitaka 17.02.

Xav Paice (xavpaice) on 2017-11-07
Changed in charm-rabbitmq-server:
assignee: nobody → Xav Paice (xavpaice)
status: New → In Progress
Xav Paice (xavpaice) wrote :

in the hooks, we render the check_rabbit.py as '{}/check_rabbitmq.py --user {} --password {} --vhost {}'.format(NAGIOS_PLUGINS, user, password, vhost)

Adding a fresh config option for a space separated list of additional vhosts to monitor, and letting the charm create a specific user for that node with permissions in that vhost, should allow us to achieve this goal. In that plan, we add the 'openstack' vhost for monitoring, and people using the charm for things other than OpenStack are able to make use of the functionality too.

Drafting a change now.

Ryan Beisner (1chb1n) on 2018-12-06
Changed in charm-rabbitmq-server:
importance: Undecided → Medium
milestone: none → 19.04
David Ames (thedac) on 2019-04-17
Changed in charm-rabbitmq-server:
milestone: 19.04 → 19.07
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers