With multiple instances of prometheus-ceph-exporter, the Ceph graphs report multiple series error

Bug #1822691 reported by Xav Paice
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Grafana Charm
Fix Released
Wishlist
Zachary Zehring

Bug Description

We have a site where there are two Ceph clusters (regrettably both named 'ceph'). The Ceph applications in Juju are named, e.g. CephA and CephB, and the exporter applications are named cephA-prometheus-ceph-exporter and cephB-prometheus-ceph-exporter.

These both send metrics to the same Prometheus, which aggregates both sets of data. When the Ceph Cluster dashboard loads, we want to know the stats for each Ceph application cephA and CephB, separately, however the queries are not written to do that. The various panels simply report N/A and "Multiple Series Error".

We can edit the dash to get a good result, using queries such as:
ceph_osd_utilization{osd="$osd", job="cephA-prometheus-ceph-exporter"}

This affects all the Ceph dashboards in Grafana.

It would be good if we could split this out, to have a set of Dashboards for each Ceph in use - i.e. in this example a list of dashboards rather than "[Juju] Ceph - Cluster" we have "[Juju] Ceph - Cluster - CephA" and "[Juju] Ceph - Cluster - CephB"

Related branches

Changed in charm-grafana:
status: New → Confirmed
importance: Undecided → Wishlist
tags: added: field-medium
Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :

I have an environment with 3 ceph clusters, and I have applied the patch that was in the MP, and successfully the Ceph dashboards are now showing multi-ceph cluster metrics.
I have attached the screenshots of ceph-cluster, ceph-osd and ceph-pools.

Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :
Revision history for this message
Yoshi Kadokawa (yoshikadokawa) wrote :
Joe Guo (guoqiao)
Changed in charm-grafana:
assignee: nobody → Zachary Zehring (zzehring)
status: Confirmed → In Progress
Revision history for this message
Nobuto Murata (nobuto) wrote :
Changed in charm-grafana:
status: In Progress → Fix Committed
Revision history for this message
David Coronel (davecore) wrote :

Is this commited fix available in a charm somewhere in the charm store? (ie. cs:~llama-charmers-next/grafana)

Revision history for this message
David Coronel (davecore) wrote :

It looks like cs:~llama-charmers-next/grafana-9 has the commit:

~/charms/next$ charm pull cs:~llama-charmers-next/grafana-9
cs:~llama-charmers-next/grafana-9

~/charms/next$ cd grafana/

~/charms/next/grafana$ grep -ri ceph_monitor_quorum_count *
templates/dashboards/prometheus/CephCluster.json.j2: "expr": "ceph_monitor_quorum_count{job=\"$job\"}",

~/charms/next/grafana$ grep -ri ceph_stuck_degraded_pgs *
templates/dashboards/prometheus/CephCluster.json.j2: "expr": "ceph_stuck_degraded_pgs{job=\"$job\"} + ceph_stuck_stale_pgs{job=\"$job\"} + ceph_stuck_unclean_pgs{job=\"$job\"} + ceph_stuck_undersized_pgs{job=\"$job\"}",
templates/dashboards/prometheus/CephCluster.json.j2: "expr": "ceph_stuck_degraded_pgs{job=\"$job\"}",

Revision history for this message
David Coronel (davecore) wrote :

I know it's a little late, but I subscribed ~field-medium for awareness

tags: removed: field-medium
Revision history for this message
David Coronel (davecore) wrote :

I'm looking in the cs:~llama-charmers-next/grafana charms revisions 9 and 11 and it looks like the CephOSD.json.j2 templates don't have the job="$job" fix from https://git.launchpad.net/charm-grafana/commit/?id=67f0cdbca9032ebc570cc23f99474bc72246030f

The CephPools.json.j2 and CephCluster.json.j2 templates have the fix, but not CephOSD.json.j2

Was CephOSD just missed?

Linda Guo (lihuiguo)
Changed in charm-grafana:
status: Fix Committed → Fix Released
milestone: none → 20.08
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.