nrpe checks do not account for checking functionality of all vhosts

Bug #1730507 reported by Drew Freiberger
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
In Progress
Medium
Xav Paice

Bug Description

I've found that when running rabbitmq-server for openstack services (nova/neutron/cinder/etc) that we have a failure condition where the vhost for "openstack" hangs and we get service connection timeouts, but no alerts appear in nagios, and manually running the check_rabbitmq and check_rabbitmq_queues checks result in "OK" and "OK 10 test messages succeeded" type of outputs.

When running rabbitmq-server in a 3 server cluster scenario, that sometimes the status and cluster_status are functioning properly, and the /, <server1>, <server2>, and <server3> vhosts are functioning properly (can run rabbitmqctl list_queues -p / and get a return quickly), but sometimes openstack vhost hangs up and IP connections timeout to the openstack vhost for the nova/neutron/cinder api services. The only way we're able to detect this is with openstack-service-checks we've added to the bootstack clouds in order to ensure all agents which are admin-up are alive.

I would like to recommend that if we relate an openstack service to rabbitmq that a check is added for the openstack vhost for a functional test (either rabbitmqctl list_queues -p openstack not returning with timeout or other error, and having 1+ queues returned) or that the base check_rabbitmq get configured to exercise message send/retrieval on all configured vhosts in rabbitmqctl list_vhosts output.

This is happening on a xenial/ocata 17.08 cloud as well as trusty/mitaka 17.02.

Xav Paice (xavpaice)
Changed in charm-rabbitmq-server:
assignee: nobody → Xav Paice (xavpaice)
status: New → In Progress
Revision history for this message
Xav Paice (xavpaice) wrote :

in the hooks, we render the check_rabbit.py as '{}/check_rabbitmq.py --user {} --password {} --vhost {}'.format(NAGIOS_PLUGINS, user, password, vhost)

Adding a fresh config option for a space separated list of additional vhosts to monitor, and letting the charm create a specific user for that node with permissions in that vhost, should allow us to achieve this goal. In that plan, we add the 'openstack' vhost for monitoring, and people using the charm for things other than OpenStack are able to make use of the functionality too.

Drafting a change now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-rabbitmq-server (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/518276

Ryan Beisner (1chb1n)
Changed in charm-rabbitmq-server:
importance: Undecided → Medium
milestone: none → 19.04
David Ames (thedac)
Changed in charm-rabbitmq-server:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-rabbitmq-server:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-rabbitmq-server:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-rabbitmq-server:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-rabbitmq-server:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-rabbitmq-server:
milestone: 20.08 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/518276
Committed: https://git.openstack.org/cgit/openstack/charm-rabbitmq-server/commit/?id=5558e21ea80b425cf154644d5f9e9bcf9bde32de
Submitter: Zuul
Branch: master

commit 5558e21ea80b425cf154644d5f9e9bcf9bde32de
Author: Xav Paice <email address hidden>
Date: Tue Nov 7 21:01:53 2017 +1300

    Add option to check extra vhosts with nrpe

    Adds monitoring options to allow us to check for vhosts in addition to
    the host specific one made by default, when using the nrpe subordinate.

    Change-Id: I10715e8ab8c83fdd7d5c08736ee89472acfe3933
    Related-Bug: 1730507

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.