KeyError: 'total_space' in oschecks-check_ceph_df on CEPH 1.3 (OSP7)

Bug #1537250 reported by VIncent S. Cojot
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Operators
Fix Released
Undecided
Unassigned

Bug Description

Hi everyone,
I just ran into this:
[root@ceph03-prv ~]# /usr/bin/oschecks-check_ceph_df
Traceback (most recent call last):
  File "/usr/bin/oschecks-check_ceph_df", line 10, in <module>
    sys.exit(check_ceph_df())
  File "/usr/lib/python2.7/site-packages/oschecks/ceph.py", line 74, in check_ceph_df
    exit_code, message = interpret_output_df(res)
  File "/usr/lib/python2.7/site-packages/oschecks/ceph.py", line 52, in interpret_output_df
    total = int(data['stats']['total_space'])
KeyError: 'total_space'

But ceph is fine:

[root@ceph03-prv ~]# ceph df
GLOBAL:
    SIZE AVAIL RAW USED %RAW USED
    27916G 25566G 2350G 8.42
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS
    rbd 0 0 0 8335G 0
    images 1 164G 0.59 8335G 21162
    volumes 2 328G 1.18 8335G 52705
    vms 3 291G 1.05 8335G 40132

Here's the reason:
oschecks-check_ceph_df runs 'ceph df --format=json' and expects these fields:
total_space, total_used, total_avail

But when I look at the output from 'ceph df --format=json' , I get these instead:
[root@ceph03-prv ~]# ceph df --format=json|sed -e 's/:/ /g'|xargs -n1|grep tota
{total_bytes
29974933831680,total_used_bytes
2523633577984,total_avail_bytes

Revision history for this message
VIncent S. Cojot (vincent-m) wrote :

Here's the diff that fixes it for me:
[root@ceph03-prv ~]# diff -c /usr/lib/python2.7/site-packages/oschecks/ceph.py.orig /usr/lib/python2.7/site-packages/oschecks/ceph.py
*** /usr/lib/python2.7/site-packages/oschecks/ceph.py.orig 2016-01-22 18:04:12.575117528 -0500
--- /usr/lib/python2.7/site-packages/oschecks/ceph.py 2016-01-22 18:05:13.449092454 -0500
***************
*** 49,57 ****
      warn_percent = int(sys.argv[1]) if len(sys.argv) >= 2 else 85
      crit_percent = int(sys.argv[2]) if len(sys.argv) >= 3 else 98

! total = int(data['stats']['total_space'])
! used = int(data['stats']['total_used'])
! avail = int(data['stats']['total_avail'])

      # Test correctness of values
      if used + avail != total:
--- 49,57 ----
      warn_percent = int(sys.argv[1]) if len(sys.argv) >= 2 else 85
      crit_percent = int(sys.argv[2]) if len(sys.argv) >= 3 else 98

! total = int(data['stats']['total_bytes'])
! used = int(data['stats']['total_used_bytes'])
! avail = int(data['stats']['total_avail_bytes'])

      # Test correctness of values
      if used + avail != total:

Revision history for this message
Mike Dorman (mdorman-m) wrote :

Could you report what version of Ceph you're running under? I suspect this output changed at some particular version, and it would be good if we could build the check such that it'll work with either format.

Revision history for this message
Mike Dorman (mdorman-m) wrote :

I checked with the original authors for this check, and they confirm this bug as well. Could you please file a change against osops-tools-monitoring with your above patch? Thanks.

Changed in osops:
status: New → Confirmed
Revision history for this message
VIncent S. Cojot (vincent-m) wrote :

Hi Mike,
I wrote a better patch and submitted a change.
However, it appears that the jenkins builds for this project had been failing for several days.
The newer version of the patch is at:
https://review.openstack.org/#/c/272230
This is happening on OSP7 (kilo) on RHEL7 with Ceph 1.3:

$ rpm -q ceph
ceph-0.94.1-13.el7cp.x86_64

Thanks,
Vincent

Revision history for this message
VIncent S. Cojot (vincent-m) wrote :

Hi Mike,
First of all, here is the link to the diff:

https://review.openstack.org/#/c/272230/1/oschecks/ceph.py,unified

It appears I submitted my change against
openstack/monitoring-for-openstack
but not against this other repo:
openstack/osops-tools-monitoring

I will submit another change really soon..
Regards,

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to osops-tools-monitoring (master)

Reviewed: https://review.openstack.org/272248
Committed: https://git.openstack.org/cgit/openstack/osops-tools-monitoring/commit/?id=f0c3a92d33fdfa8b3f2adaf63f50337c3a019544
Submitter: Jenkins
Branch: master

commit f0c3a92d33fdfa8b3f2adaf63f50337c3a019544
Author: Vincent S. Cojot <email address hidden>
Date: Mon Jan 25 14:59:26 2016 -0500

    Fix for changes in the output of ceph 1.3+ 'df' command.

    Previously, 'ceph df --format=json' outputted some JSON
    information where the following keys were defined:
    total_space,total_used,total_avail

    As of ceph 1.3, the keys have changed somewhat and are now
    defined as:
    total_bytes,total_used_bytes,total_avail_bytes

    This patch handles the new keys while retaining compatibility
    for older versions..

    Closes-bug: 1537250
    Change-Id: Ia7ab59109fc5263cd742587c99e94d18ef28fdcc

Changed in osops:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.