NRPE check reports "OK: ceph-osd@xx service is running" when one of the N services is not running

Bug #1889887 reported by Vladimir Grevtsev
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Wishlist
Unassigned

Bug Description

The "Service Status Details" page in Nagios shows only the first line of the NRPE script response, so when one of the OSDs is failed (or intentionally has been shut down, like in my case), it can cause a confusion - an operator could open Nagios, see 'CRITICAL: OK: ceph-osd@30.service is running' and treat this as a false positive; while in reality the full message looks like:

Current Status:
  CRITICAL
 (for 0d 2h 50m 49s)
Status Information: OK: ceph-osd@30.service is running
OK: ceph-osd@36.service is running
OK: ceph-osd@41.service is running
OK: ceph-osd@47.service is running
OK: ceph-osd@51.service is running
OK: ceph-osd@55.service is running
Failed: check command raised: CRITICAL: ceph-osd@58.service is not running
Failed: check command raised: CRITICAL: ceph-osd@60.service is not running

Can we re-order these messages in order to show "Failed" first (so it can be clearly visible in Nagios error preview) ?

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :
Changed in charm-ceph-osd:
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Xav Paice (xavpaice) wrote :

Note that the standards for writing Nagios plugins is that the output has only one line, so this check is not compliant with what Nagios expects.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.