ceph-osd charm upgrade takes longer time for ceph-mon to settle to idle state

Bug #1913992 reported by Hemanth Nakkina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Medium
Hemanth Nakkina

Bug Description

ceph-osd charm upgrade from releases before 19.07 to 20.05 took around 4 hours for the ceph-mon charms to get back to idle state.
The environment have roughly 80 OSDs and 80 ceph clients (nova-compute, glance, cinder).

In charm ceph-osd 19.07 release, a new attribute ceph_release is introduced in the relation data between osd - mon [1]. As new attribute got added to osd-relation-data, the osd-relation-changed hook is triggered on each ceph-mon for 80 times (one from each OSD). The hook calls notify_client() which calls client_relation() for 80 times (one for each ceph client)

On profiling the code, here are some areas that takes time

1. Repetitive ceph command calls to find ceph cluster state for each client_relation
  Like is_leader(), ceph_quorum(), get_osds(), ceph_user()
2. ceph command calls to update broker requests for each client relation
   handle_broker_request()
3. apt_cache_show() function call in cmp_pkgrevno() and in some other places [3]
   apt-cache show --no-all-versions <pkg> takes 3 times more than apt-cache show <pkg> on Bionic.
   On focal, the performance of apt-cache show --no-all-versions seems good (around 1.5 times than apt-cache show)

Some of the recommendations:
a. Optimize ceph calls for handle_broker_request() to send requests in few ceph command calls instead of 1 ceph call for each ceph client unit
   This might require significant change of code
b. Optimize notify_relation code to minimize generic ceph command calls mentioned in point 1
c. apt_cache_show is used mostly to get the current ceph version.
   Use dpkg to get the current version instead of apt_cache_show

+ def get_current_version(self, package):
+ dpkg_result = self._dpkg_list([package]).get(package, {})
+ current_ver = None
+ installed_version = dpkg_result.get('version')
+ if installed_version:
+ current_ver = Version({'ver_str': installed_version})
+ return current_ver

d. Remove calling get_version() in ceph_user() call [4]
   (assuming current supported ceph releases are equal or greater than v9.x.x)

Note: This problem will arise whenever ceph_release attribute gets changed like during upgrade of ceph from one release to another.

[1] https://opendev.org/openstack/charm-ceph-osd/commit/28ca5957b3eb7f8fc651f51e6f6d30e166bbea59
[2] https://opendev.org/openstack/charm-ceph-mon/src/branch/master/hooks/ceph_hooks.py#L583-L586
[3] https://opendev.org/openstack/charm-ceph-mon/src/branch/master/hooks/charmhelpers/fetch/ubuntu_apt_pkg.py#L151
[4] https://opendev.org/openstack/charm-ceph-mon/src/branch/master/lib/charms_ceph/utils.py#L499-L503

tags: added: sts
Changed in charm-ceph-mon:
importance: Undecided → Medium
tags: added: charm-upgrade
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

https://review.opendev.org/c/openstack/charm-ceph-mon/+/773612 submitted to cover the recommendations b,c,d from Bug description.

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :

Another patch ready for code review (in addition to ones mentioned in comment #2)

https://review.opendev.org/c/openstack/charm-ceph-mon/+/773612

Changed in charm-ceph-mon:
assignee: nobody → Hemanth Nakkina (hemanth-n)
status: New → In Progress
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Thanks for the update Hemanth.

Changed in charm-ceph-mon:
milestone: none → 21.04
status: In Progress → Fix Committed
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.