Ceph Dashboard Charm

Bug #2041500
Comment #3

Comment 3 for bug 2041500

Revision history for this message

Nobuto Murata (nobuto) wrote on 2023-10-31:

One example of "no data" query is "sum(irate(ceph_osd_recovery_ops[1m]))" regardless it's from the charm or the upstream.

[charm]
https://github.com/openstack/charm-ceph-dashboard/blob/4ee08c02972ba174ba379728e9ab1f045bacd1a4/src/dashboards/ceph-cluster.json#L1426

[upstream]
https://github.com/ceph/ceph/blob/21548fe806cf259deac1421530d5ce720be17997/monitoring/ceph-mixin/dashboards_out/ceph-cluster.json#L1107

That's because the scrape_interval in COS is 1m although Ceph upstream expects 15s, and there are no two data points in the 1m range in the query above as a result.
https://prometheus.io/docs/prometheus/latest/querying/functions/#irate

And customizing the scrape_interval is "strongly discouraged" so a workaround is to use prometheus-scrape-config-k8s charm in the middle.
https://github.com/canonical/prometheus-k8s-operator/blob/16ba0e867b571d17ac8e87af7ab5720228d53d52/lib/charms/prometheus_k8s/v0/prometheus_scrape.py#L172-L190

# LP: #2041500
# the interval is from:
# https://docs.ceph.com/en/latest/mgr/prometheus/#confval-mgr-prometheus-scrape_interval
juju deploy -m cos prometheus-scrape-config-k8s prometheus-scrape-config --config scrape_interval=15s
juju integrate -m cos prometheus:metrics-endpoint prometheus-scrape-config:metrics-endpoint

juju offer -m cos prometheus-scrape-config:configurable-scrape-jobs
juju consume cos.prometheus-scrape-config cos-prometheus-scrape-config
juju integrate ceph-mon:metrics-endpoint cos-prometheus-scrape-config:configurable-scrape-jobs