ceph-mon should report error when cluster is not in "Health_OK" state

Bug #1796430 reported by Andrey Grebennikov on 2018-10-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack ceph-mon charm
Medium
Chris MacNaughton

Bug Description

Hope this bug fits into ceph-mon space.

Currently when the ceph components are deployed - all ceph-relared units report "ready" status.

In case there is an issue with OSDs joining/PG replication etc - there is information passed into "juju status" output, as well as those using interface ceph-client know that the cluster in in unhealthy state and try to connect, create pools etc. This leads to the PGs getting stuck eventually and the pools require re-creation.

ubuntu@ubuntu:~$ juju status|grep ceph
k8s-ceph-3 controller1 cloud1 2.4.3 unsupported 23:57:01Z
ceph-mon 10.2.10 active 3 ceph-mon jujucharms 27 ubuntu
ceph-osd 10.2.10 active 3 ceph-osd jujucharms 270 ubuntu
ceph-radosgw 10.2.10 active 1 ceph-radosgw jujucharms 260 ubuntu
ceph-mon/0 active idle 0/lxd/0 192.168.122.175 Unit is ready and clustered
ceph-mon/1 active idle 1/lxd/0 192.168.122.180 Unit is ready and clustered
ceph-mon/2* active idle 2/lxd/0 192.168.122.183 Unit is ready and clustered
ceph-osd/0 active idle 0 172.16.0.2 Unit is ready (2 OSD)
ceph-osd/1* active idle 1 172.16.0.3 Unit is ready (2 OSD)
ceph-osd/2 active idle 2 172.16.0.4 Unit is ready (2 OSD)
ceph-radosgw/0* active idle 0/lxd/1 192.168.122.186 80/tcp Unit is ready

ubuntu@ubuntu:~$ juju run --unit ceph-mon/0 'sudo ceph -s'
    cluster 5370e132-c037-11e8-8565-00163e46d3a9
     health HEALTH_ERR
            148 pgs are stuck inactive for more than 300 seconds
            148 pgs peering
            148 pgs stuck inactive
            148 pgs stuck unclean
            200 requests are blocked > 32 sec
     monmap e2: 3 mons at {juju-76a6af-0-lxd-0=192.168.122.175:6789/0,juju-76a6af-1-lxd-0=192.168.122.180:6789/0,juju-76a6af-2-lxd-0=192.168.122.183:6789/0}
            election epoch 8, quorum 0,1,2 juju-76a6af-0-lxd-0,juju-76a6af-1-lxd-0,juju-76a6af-2-lxd-0
     osdmap e68: 6 osds: 4 up, 4 in; 148 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v34649: 148 pgs, 19 pools, 1588 bytes data, 171 objects
            169 MB used, 36650 MB / 36819 MB avail
                  87 peering
                  61 remapped+peering

Changed in charm-ceph-mon:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
status: New → In Progress
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers