mon_pg_warn_max_object_skew setting not exposed via charm

Bug #1720374 reported by Drew Freiberger
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Wishlist
Shane Peters
OpenStack Ceph Charm (Retired)
Won't Fix
Wishlist
Unassigned

Bug Description

In newly built openstack environments with ceph backed glance/cinder/nova with charm default %util settings, after relating glance-simplestreams-sync to glance and importing standard ubuntu images, the skew of the %objects in the glance pool trips ceph HEALTH_WARN state.

We find that this issue happens until there is some level of adoption of the cloud by end-users, and we manually tune the setting mon_pg_warn_max_object_skew on each of the ceph mon.* objects to alleviate the HEALTH_WARN state at cloud initialization.

It would be extremely helpful to have mon_pg_warn_max_object_skew to be exposed via the charm to allow for ease of managing this setting, or potentially to have a time-based change of this setting automated in the charm to allow for the first X days of cloud operation for this setting to be quite high until greater adoption and import of data into the ceph environment.

Current resolution process is to login to the unit and run:

ceph tell mon.* injectargs '--mon_pg_warn_max_object_skew 20'

tags: added: canonical-bootstack
Revision history for this message
Drew Freiberger (afreiberger) wrote :

Was thinking about creating an action for this, but the action wouldn't persist the setting if we extended to another unit.

Changed in charm-ceph:
importance: Undecided → Wishlist
Changed in charm-ceph-mon:
importance: Undecided → Wishlist
Changed in charm-ceph:
status: New → Triaged
Changed in charm-ceph-mon:
status: New → Triaged
Revision history for this message
James Hilling (jamesh5979) wrote :

It would be useful if we could also have the following exposed to the charm:

mon data avail warn = X
mon data avail crit = Y

This becomes especially important, when ceph starts to alert on disk space, at a different set of values to whatever your monitoring system is set to alert at.

For example: when using the nagios, in conjunction to deploying the nrpe subordinate charm to a ceph-mon unit, the deployed nrpe unit will have a set of alerting thresholds for the 'check_disk_root' check. These will be different to the values that ceph will alert at usually.

We want to be able to tune ceph to alert at a value that will not interfere with monitoring.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi James/Drew, couple of points here;

The ceph charms have a config-flags option that allows you to set any config option within the [global], [mon], [osd] or [rgw] sections. It was added so that people could try out different configs then suggest them for native inclusion in the charm if appropriate. I think the scenario you mention here is valid since the warning is displayed despite a perfectly healthy cloud and ceph. I suggest that you leverage the config-flags opt to apply the config until this feature lands.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

One more thing, it appears that setting mon_pg_warn_max_object_skew to 0 will disable the check [1].

[1] https://github.com/ceph/ceph/blob/master/src/mon/PGMap.cc#L2631

Shane Peters (shaner)
Changed in charm-ceph-mon:
assignee: nobody → Shane Peters (shaner)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.openstack.org/615600
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-mon/commit/?id=7a362ff0a5b355f55c430e867ebbf4f1303faf2a
Submitter: Zuul
Branch: master

commit 7a362ff0a5b355f55c430e867ebbf4f1303faf2a
Author: Shane Peters <email address hidden>
Date: Mon Nov 5 11:45:32 2018 -0500

    Add disable-pg-max-object-skew option

    Openstack clouds that use ceph will typically start their life with
    at least one pool (glance) loaded with a disproportionately high
    amount of data/objects where other pools may remain empty. This can
    trigger a HEALTH_WARN if mon_pg_warn_max_object_skew is exceeded
    but that is actually a false positive.

    Change-Id: I5a535dbb17db2149630d971d85ac311f14298b09
    Closes-Bug: 1720374

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
James Page (james-page)
Changed in charm-ceph-mon:
milestone: none → 18.11
status: Fix Committed → Fix Released
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Marking the charm-ceph task wontfix as the ceph charm has been removed from support for a while now

Changed in charm-ceph:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.