Add support to configure the available disk space alert thresholds

Bug #1890777 reported by Jose Guedez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
Wishlist
Cornellius Metto

Bug Description

It would be useful to be able to configure the alerting thresholds for disk usage in the monitors (cluster degrades to HEALTH_WARN/CRIT when they are breached)

Currently ceph-mon effectively is configured with the defaults [1]:

mon data avail warn = 30
mon data avail crit = 5

In particular the warning threshold can be pretty conservative. Also in many cases ceph-mon units are lxd containers that essentially report the disk usage of the underlying host. The underlying host typically will have its own monitoring, with its own threshold - which can lead to duplicate alerts, or even false positives when the host and ceph-mon thresholds are different.

Otherwise the option is to use config-flags, which is discouraged.

[1] https://docs.ceph.com/docs/master/rados/configuration/mon-config-ref/

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → Wishlist
tags: added: onboarding
Changed in charm-ceph-mon:
assignee: nobody → Cornellius Metto (ckmetto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: Triaged → In Progress
tags: added: good-first-bug
removed: onboarding
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/790914
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/320ddae8277d966404d764a1d34d2271b731f36f
Submitter: "Zuul (22348)"
Branch: master

commit 320ddae8277d966404d764a1d34d2271b731f36f
Author: Cornellius Metto <email address hidden>
Date: Wed May 12 10:44:31 2021 +0300

    Add configuration options for disk usage alerting thresholds

    The ceph cluster degrades to HEALTH_{WARN|CRIT} when the following
    default thresholds are breached:

    mon data avail warn = 30
    mon data avail crit = 5

    - These thresholds can be conservative. It might be desirable
      to change them.
    - A specific common scenario is when ceph-mon units are run in lxd
      containers which report the disk usage of the underlying host. The
      underlying host may have its own monitoring and its own
      thresholds which can lead to duplicate or conflicting alerts.

    Closes-Bug: #1890777
    Change-Id: I13e35be71697b98b19260970bcf9812a43ef9369

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Changed in charm-ceph-mon:
milestone: none → 21.10
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.