NRPE: overall_status shouldn't be used for status monitoring with Luminous

Bug #1756864 reported by Nobuto Murata
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Undecided
Xav Paice

Bug Description

The format has been changed in Luminous. "overall_status" shouldn't be used for monitoring any longer.

[files/nagios/check_ceph_status.py]

    if status_data['health']['overall_status'] != 'HEALTH_OK':

# ceph --version
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)

# ceph health detail -f json-pretty

{
    "checks": {},
    "status": "HEALTH_OK",
    "overall_status": "HEALTH_WARN",
    "detail": [
        "'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'"
    ]
}

Revision history for this message
Nobuto Murata (nobuto) wrote :

Just for the reference, the json output after applying 'mon health preluminous compat warning = false'. Looks like we could just use "status" instead of "overall_status".

# ceph health detail --format json-pretty

{
    "checks": {},
    "status": "HEALTH_OK"
}

Revision history for this message
Nobuto Murata (nobuto) wrote :

From the nagios stats.

juju-juju-3e8d85-0-lxd-0

Ubuntu Linux

juju-juju-3e8d85-0-lxd-0-ceph

 CRITICAL 2018-03-25 15:19:43 0d 6h 52m 32s 4/4 CRITICAL: ceph health: "HEALTH_WARN 'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false', Degraded ratio: 0.0, Misplaced ratio: 0.0, Recovering objects/sec 0.0"

Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
Xav Paice (xavpaice) wrote :

If I run ceph tell mon.* injectargs \
      "--mon_health_preluminous_compat_warning=false"

The check now reports:

check_ceph_status raised unknown exception '<class 'KeyError'>'
============================================================
Traceback (most recent call last):
File "/usr/local/lib/nagios/plugins/check_ceph_status.py", line 183, in main
msg = check_ceph_status(args)
File "/usr/local/lib/nagios/plugins/check_ceph_status.py", line 113, in check_ceph_status
if status_data['health']['overall_status'] != 'HEALTH_OK':
KeyError: 'overall_status'
============================================================

The overall_status key has gone from the 'ceph status' output.

Xav Paice (xavpaice)
Changed in charm-ceph-mon:
assignee: nobody → Xav Paice (xavpaice)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/562133

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/562133
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=b97177b7d6c544d518f90f02d4e42ecec1a0e135
Submitter: Zuul
Branch: master

commit b97177b7d6c544d518f90f02d4e42ecec1a0e135
Author: Xav Paice <email address hidden>
Date: Wed Apr 18 19:19:50 2018 +1200

    Update Nagios check for Luminous

    This adds a test to see if the ceph status output looks like Luminous or
    newer, and if so changes the output used to collect info.

    Change-Id: I98d194c329aace3c412701e06632dbfedfadefc7
    Closes-Bug: #1756864

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-ceph-mon:
milestone: none → 18.05
David Ames (thedac)
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.