Some metrics are not collected with Ceph Nautilus

Bug #1867100 reported by Yoshi Kadokawa
54
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Prometheus Ceph Exporter Charm
Fix Released
Critical
David O Neill
prometheus-ceph-exporter snap
Fix Released
Critical
David O Neill

Bug Description

With Ceph Nautilus, some of the Ceph Template dashboard for Grafana fails to load.
In Grafana, I see the following logs in /var/log/juju/unit-grafana-0.log

2020-03-12 05:42:36 DEBUG juju-log Skipping Dashboard Template: CephCluster.json.j2 missing 2 metrics.Missing: ceph_osd_perf_apply_latency_seconds, ceph_osd_perf_commit_latency_seconds
2020-03-12 05:42:36 DEBUG juju-log Skipping Dashboard Template: CephOSD.json.j2 missing 2 metrics.Missing: ceph_osd_perf_apply_latency_seconds, ceph_osd_perf_commit_latency_seconds

It looks like the following metrics are not available or not collected.
- ceph_osd_perf_apply_latency_seconds
- ceph_osd_perf_commit_latency_seconds

According to this[0], there is a fix for this in ceph-exporter for Nautilus.
However, the prometheus-ceph-exporter snap is built from the default branch(luminous),
so this fix is not available from snap package.

[0] https://github.com/digitalocean/ceph_exporter/commit/6622c2f7a3f44be47738c02270c2ec479d82ecff#diff-1536b81b5897f95267830a7c215ad5ab

Revision history for this message
Nobuto Murata (nobuto) wrote :

The github project doesn't have a release or a tag from the nautilus branch.
https://github.com/digitalocean/ceph_exporter/tags

So a separate snap channel would be required for nautilus.

Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :

I have tested by using the snap build from the nautilus branch, but did not work.
It looks like that the latest fix in nautilus branch in https://github.com/digitalocean/ceph_exporter has reverted the fix for OSD Latency metrics.

However, I could confirm with the snap build by specifying the commit that has the fix.
https://github.com/digitalocean/ceph_exporter/pull/125/files

This is the snapcraft.yaml that I used.
https://git.launchpad.net/~yoshikadokawa/snap-prometheus-ceph-exporter/commit/?h=nautilus-support

Revision history for this message
Xav Paice (xavpaice) wrote :

This is a snap issue, following up on that, have marked invalid for the charm since there's nothing we can change in the charm to help.

Changed in charm-prometheus-ceph-exporter:
status: New → Invalid
Revision history for this message
Nobuto Murata (nobuto) wrote :

@Xav, even if the snap could be updated, it probably needs multiple channels to support multiple releases of Ceph. Thus, the ceph-exporter charm would also need a config option to specify a channel for the snap like Graylog:
https://jaas.ai/graylog#charm-config-channel

Nobuto Murata (nobuto)
Changed in charm-prometheus-ceph-exporter:
status: Invalid → New
Edin S (exsdev)
Changed in charm-prometheus-ceph-exporter:
importance: Undecided → Wishlist
Xav Paice (xavpaice)
Changed in snap-prometheus-ceph-exporter:
importance: Undecided → Wishlist
Revision history for this message
Xav Paice (xavpaice) wrote :
Revision history for this message
Xav Paice (xavpaice) wrote :

https://forum.snapcraft.io/t/request-for-tracks-for-prometheus-ceph-exporter/19407 requesting tracks in the snap.

For the snap itself, I've uploaded a v3.0.0 (nautilus) snap to the latest/edge channel if we need it.

The latest release of the prometheus-ceph-exporter charm will have the ability to switch channels, but changing that will need manual intervention (see LP:#1891582).

Revision history for this message
David O Neill (dmzoneill) wrote :

need to promote -next to production.

Changed in charm-prometheus-ceph-exporter:
assignee: nobody → David O Neill (dmzoneill)
Changed in snap-prometheus-ceph-exporter:
assignee: nobody → David O Neill (dmzoneill)
Changed in charm-prometheus-ceph-exporter:
status: New → Fix Committed
Changed in snap-prometheus-ceph-exporter:
status: New → Fix Committed
Changed in charm-prometheus-ceph-exporter:
importance: Wishlist → Critical
Changed in snap-prometheus-ceph-exporter:
importance: Wishlist → Critical
Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Can confirm that latest/edge works fine, the metrics are in-place and the dashboards has been imported successfully.

However, with the edge snap we're having another issue: https://bugs.launchpad.net/charm-prometheus-ceph-exporter/+bug/1899030

Revision history for this message
Drew Freiberger (afreiberger) wrote :

Upstream bug filed against the project. The ceph_exporter team is working on a one-version-to-rule-them-all rewrite. I believe we should hold until that is ready. We could write up something in the charm to detect if Nautilus+ is running and force the snap channel to edge.
https://github.com/digitalocean/ceph_exporter/issues/182

I'd hate to unwind the separate snapstore tracks across many clouds back to a single track once the upstream project finishes their work.

Revision history for this message
Jose Guedez (jfguedez) wrote :

As both the snap (edge channel) [0], and the charm change to enable changing channels have been released [1]. Setting status to fix released.

[0] https://bugs.launchpad.net/charm-prometheus-ceph-exporter/+bug/1867100/comments/6
[1] https://bugs.launchpad.net/charm-prometheus-ceph-exporter/+bug/1891582

Changed in charm-prometheus-ceph-exporter:
status: Fix Committed → Fix Released
Changed in snap-prometheus-ceph-exporter:
status: Fix Committed → Fix Released
Revision history for this message
Flavio (flavp87) wrote (last edit ):

Hi, am i wrong or some metrics are still missing?
Those related to operation's latency,for example

Using #14 prom-ceph-exporter charm revision and edge channel.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.