client relations are never processed when configuration change (monitor-count, expected-osd-count) causes cluster to become ready

Bug #1732491 reported by Edward Hope-Morley
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Medium
Trent Lloyd

Bug Description

If i deploy 1 unit of ceph-mon with monitor-count=3 it failes to bootstrap as expected. If i then set monitor-count=1 i would expect it to then bootstap but instead it does nothing and I have to remove and re-add the ceph client relations to get it to happen.

Tags: openstack sts
James Page (james-page)
Changed in charm-ceph-mon:
status: New → Triaged
James Page (james-page)
Changed in charm-ceph-mon:
milestone: 17.11 → 18.02
Ryan Beisner (1chb1n)
Changed in charm-ceph-mon:
milestone: 18.02 → 18.05
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 18.05 → 18.08
Revision history for this message
Edward Hope-Morley (hopem) wrote :

I have now found some additional issues with this logic. If I start by deploying 3 units with monitor-count=1 (which I know if wrong) what happens is that all 3 units will bootstrap independently i.e. at the point at which they have what they consider to be sufficient hosts i.e. 1 a.k.a themselves. The problem is that if I then set monitor-count=3 it remains wedged and the only way to fix it is to delete 2 units and start again. The same is also True if I have N mon units with monitor-count=N and i scale out my ceph-mon application before updating monitor-count. I think the charm should be able to manage these scenarios somehow.

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I suspect that the best way to manage some of these issues would be to only allow the juju leader to bootstrap, as there can be only one ;-)

James Page (james-page)
Changed in charm-ceph-mon:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 18.11 → 19.04
Revision history for this message
Trent Lloyd (lathiat) wrote :

The problem here is that both client_relation_joined and client_relation_changed hooks (which provides ceph keys & processes broker requests to create pools etc) check for "if ready_for_service()" before processing those requests.

If not ready for service, the hooks do nothing. However charm hooks are not re-run just because we weren't ready to process them.

The charm either needs to queue a list of such relations to process later, or, needs to iteratively check all such relations in some way when it later decides it is ready for service.

This seems to be a common charm design problem that probably needs some documentation written for charm authors to describe how to handle this situation cleanly. As it's not 100% obvious.

The ceph-mon charm does indeed have such a function to re-process the hook requests, which is notify_client(). However this function is not re-run in the case of config_changed. Presently it appears to happen only in the case of osd_relation (presumably to check if expected-monitor-count is now exceeded), upgrade_charm and also mon_relation.

As a result in the originally described situation (changing monitor-count=1) the client relations are not re-triggered. It also doesn't happen if expected-osd-count is updated, which is the situation I hit this same issue. expected-osd-count was not hit, so the relations were skipped and then even if you update the config they are not reprocessed.

Revision history for this message
Trent Lloyd (lathiat) wrote :

I also filed a bug that if expected-osd-count is not meant (or otherwise ready_for_service check fails), there is no juju status information to convey that information to the admin:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1807652

Trent Lloyd (lathiat)
summary: - failed bootstrap due to incorrect monitor-count is unresolvable
+ client relations are never processed when configuration change (monitor-
+ count, expected-osd-count) causes cluster to become ready
Revision history for this message
Trent Lloyd (lathiat) wrote :
Changed in charm-ceph-mon:
assignee: nobody → Trent Lloyd (lathiat)
status: Triaged → In Progress
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-ceph-mon:
milestone: 19.10 → 20.01
Revision history for this message
Trent Lloyd (lathiat) wrote :

This was fixed by:
https://review.opendev.org/#/c/640345/

Which appears to have made the 19.04 charm release

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-ceph-mon (master)

Change abandoned by Trent Lloyd (<email address hidden>) on branch: master
Review: https://review.opendev.org/624010
Reason: already fixed by 640345 instead

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Changing to fix released as Trent indicates that it has already been fixed in the 19.04 charms. Thanks for confirming!

Changed in charm-ceph-mon:
milestone: 20.01 → 19.04
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.