OpenStack RabbitMQ Server Charm

nrpe checks do not account for checking functionality of all vhosts

Bug #1730507 reported by Drew Freiberger on 2017-11-06

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack RabbitMQ Server Charm	In Progress	Medium	Xav Paice

Bug Description

I've found that when running rabbitmq-server for openstack services (nova/neutron/cinder/etc) that we have a failure condition where the vhost for "openstack" hangs and we get service connection timeouts, but no alerts appear in nagios, and manually running the check_rabbitmq and check_rabbitmq_queues checks result in "OK" and "OK 10 test messages succeeded" type of outputs.

When running rabbitmq-server in a 3 server cluster scenario, that sometimes the status and cluster_status are functioning properly, and the /, <server1>, <server2>, and <server3> vhosts are functioning properly (can run rabbitmqctl list_queues -p / and get a return quickly), but sometimes openstack vhost hangs up and IP connections timeout to the openstack vhost for the nova/neutron/cinder api services. The only way we're able to detect this is with openstack-service-checks we've added to the bootstack clouds in order to ensure all agents which are admin-up are alive.

I would like to recommend that if we relate an openstack service to rabbitmq that a check is added for the openstack vhost for a functional test (either rabbitmqctl list_queues -p openstack not returning with timeout or other error, and having 1+ queues returned) or that the base check_rabbitmq get configured to exercise message send/retrieval on all configured vhosts in rabbitmqctl list_vhosts output.

This is happening on a xenial/ocata 17.08 cloud as well as trusty/mitaka 17.02.

Tags:

Xav Paice (xavpaice) on 2017-11-07

Changed in charm-rabbitmq-server:
assignee:	nobody → Xav Paice (xavpaice)
status:	New → In Progress

Revision history for this message

Xav Paice (xavpaice) wrote on 2017-11-07:

in the hooks, we render the check_rabbit.py as '{}/check_rabbitmq.py --user {} --password {} --vhost {}'.format(NAGIOS_PLUGINS, user, password, vhost)

Adding a fresh config option for a space separated list of additional vhosts to monitor, and letting the charm create a specific user for that node with permissions in that vhost, should allow us to achieve this goal. In that plan, we add the 'openstack' vhost for monitoring, and people using the charm for things other than OpenStack are able to make use of the functionality too.

Drafting a change now.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-11-07: Related fix proposed to charm-rabbitmq-server (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/518276

Ryan Beisner (1chb1n) on 2018-12-06

Changed in charm-rabbitmq-server:
importance:	Undecided → Medium
milestone:	none → 19.04

David Ames (thedac) on 2019-04-17

Changed in charm-rabbitmq-server:
milestone:	19.04 → 19.07

David Ames (thedac) on 2019-08-12

Changed in charm-rabbitmq-server:
milestone:	19.07 → 19.10

David Ames (thedac) on 2019-10-24

Changed in charm-rabbitmq-server:
milestone:	19.10 → 20.01

James Page (james-page) on 2020-03-02

Changed in charm-rabbitmq-server:
milestone:	20.01 → 20.05

David Ames (thedac) on 2020-05-21

Changed in charm-rabbitmq-server:
milestone:	20.05 → 20.08

James Page (james-page) on 2020-08-03

Changed in charm-rabbitmq-server:
milestone:	20.08 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-09-15: Related fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/518276
Committed: https://git.openstack.org/cgit/openstack/charm-rabbitmq-server/commit/?id=5558e21ea80b425cf154644d5f9e9bcf9bde32de
Submitter: Zuul
Branch: master

commit 5558e21ea80b425cf154644d5f9e9bcf9bde32de
Author: Xav Paice <email address hidden>
Date: Tue Nov 7 21:01:53 2017 +1300

Add option to check extra vhosts with nrpe

Adds monitoring options to allow us to check for vhosts in addition to
the host specific one made by default, when using the nrpe subordinate.

Change-Id: I10715e8ab8c83fdd7d5c08736ee89472acfe3933
Related-Bug: 1730507

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.