During expansion of environments, ceph-mon charm should warn about or manage PG counts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph Monitor Charm |
Triaged
|
Low
|
Unassigned |
Bug Description
Several clouds have experienced PG per OSD starvation during expansion of nodes in a cloud.
The ceph-mon charm creates pools based on expected % util per pool and expected osd count and replica count (typically 3), but does not manage these numbers beyond initial creation of the pools.
Imagine a healthy cloud that starts with an expected-osd-count = 100 that goes on to expand to actually running 400 OSDS by adding 4x the number of initial nodes. You'll find yourself adding the 300-400th osds in a situation where Luminous is going to start going into HEALTH_ERR due to PG starvation with <20 pgs per OSD whereas you may have had 80-120 pgs per osd when you built the cloud with 100 OSDs.
As ceph-osd nodes are added to a stabile cloud, ceph-mon could warn the juju operator before allowing the additional OSDs to be added to the cluster so that ceph doesn't go into a critical state from PG starvation.
The proper workaround is for the operator of a cloud to pre-expand pg_num and pgp_num of pools expected to grow with the expansion based on the newly added OSD counts before adding the ceph-osd nodes.
Further details about pg_num configuration for pools can be found at https:/
tags: | added: canonical-bootstack |
Besides the starvation issue of adding OSDS, we should also be warning about creating pools that would add too many pgs per osd such as having two different cinder-ceph charms set to pool weights of 40% each. If you add another cinder-ceph charm after deployment without considering the pool weight, it can easily spike the number of pgs per OSD beyond what's recommended.
This becomes even more complicated to prevent when using class-based replication policies and having a cinder-ceph charm for a density pool as well as a separate cinder-ceph charm for a performance pool. The ceph-mon charm doesn't have the ability to distinguish separate crush policies and make calculations for varied OSD counts per pool.
It may be best to approach this as a charm maintenance state or nagios alert that notes any OSDs over 200pgs per osd or under 50pgs per OSD so that operators can react to misconfigured pg counts before going live with new environments.