ceph-mon should report error when cluster is not in "Health_OK" state

Bug #1796430 reported by Andrey Grebennikov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
In Progress
Medium
Chris MacNaughton

Bug Description

Hope this bug fits into ceph-mon space.

Currently when the ceph components are deployed - all ceph-relared units report "ready" status.

In case there is an issue with OSDs joining/PG replication etc - there is information passed into "juju status" output, as well as those using interface ceph-client know that the cluster in in unhealthy state and try to connect, create pools etc. This leads to the PGs getting stuck eventually and the pools require re-creation.

ubuntu@ubuntu:~$ juju status|grep ceph
k8s-ceph-3 controller1 cloud1 2.4.3 unsupported 23:57:01Z
ceph-mon 10.2.10 active 3 ceph-mon jujucharms 27 ubuntu
ceph-osd 10.2.10 active 3 ceph-osd jujucharms 270 ubuntu
ceph-radosgw 10.2.10 active 1 ceph-radosgw jujucharms 260 ubuntu
ceph-mon/0 active idle 0/lxd/0 192.168.122.175 Unit is ready and clustered
ceph-mon/1 active idle 1/lxd/0 192.168.122.180 Unit is ready and clustered
ceph-mon/2* active idle 2/lxd/0 192.168.122.183 Unit is ready and clustered
ceph-osd/0 active idle 0 172.16.0.2 Unit is ready (2 OSD)
ceph-osd/1* active idle 1 172.16.0.3 Unit is ready (2 OSD)
ceph-osd/2 active idle 2 172.16.0.4 Unit is ready (2 OSD)
ceph-radosgw/0* active idle 0/lxd/1 192.168.122.186 80/tcp Unit is ready

ubuntu@ubuntu:~$ juju run --unit ceph-mon/0 'sudo ceph -s'
    cluster 5370e132-c037-11e8-8565-00163e46d3a9
     health HEALTH_ERR
            148 pgs are stuck inactive for more than 300 seconds
            148 pgs peering
            148 pgs stuck inactive
            148 pgs stuck unclean
            200 requests are blocked > 32 sec
     monmap e2: 3 mons at {juju-76a6af-0-lxd-0=192.168.122.175:6789/0,juju-76a6af-1-lxd-0=192.168.122.180:6789/0,juju-76a6af-2-lxd-0=192.168.122.183:6789/0}
            election epoch 8, quorum 0,1,2 juju-76a6af-0-lxd-0,juju-76a6af-1-lxd-0,juju-76a6af-2-lxd-0
     osdmap e68: 6 osds: 4 up, 4 in; 148 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v34649: 148 pgs, 19 pools, 1588 bytes data, 171 objects
            169 MB used, 36650 MB / 36819 MB avail
                  87 peering
                  61 remapped+peering

Changed in charm-ceph-mon:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/629843

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-mon (master)

Change abandoned by "James Page <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/629843
Reason: This review is > 12 weeks without comment, and failed testing the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.