/metrics query is hanging forever

Bug #1931613 reported by Vladimir Grevtsev
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Prometheus Ceph Exporter Charm
New
Medium
Unassigned

Bug Description

==== my steps to reproduce
For the record: I wasn't able to reproduce the issue outside of the environment I'm working on; however it's reproducing reliably across this env so I'm asking for some assistance in diagnosis.

env: focal-ussuri, latest stable charms

What I did:

juju deploy cs:prometheus-ceph-exporter pce --to lxd:6 --series focal --bind 'oam-space ceph=ceph-access-space'
juju config pce snap_channel=edge
juju add-relation pce ceph-mon:client

Then, tried to query the exporter:

ubuntu@infra-node1:~$ j status pce
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.8.10 unsupported 16:25:06Z

SAAS Status Store URL
graylog active foundations-maas admin/lma.graylog-beats
nagios active foundations-maas admin/lma.nagios-monitors
prometheus active foundations-maas admin/lma.prometheus-target

App Version Status Scale Charm Store Rev OS Notes
pce active 1 prometheus-ceph-exporter jujucharms 13 ubuntu

Unit Workload Agent Machine Public address Ports Message
pce/0* active idle 6/lxd/1 172.16.151.66 9128/tcp Running

Machine State DNS Inst id Series AZ Message
6 started 172.16.151.162 cloud-node-004 focal zone1 Deployed
6/lxd/1 started 172.16.151.66 juju-7ce49b-6-lxd-1 focal zone1 Container started

ubuntu@infra-node1:~$ curl 172.16.151.66:9128
<html>
                        <head><title>Ceph Exporter</title></head>
                        <body>
                        <h1>Ceph Exporter</h1>
                        <p><a href='/metrics'>Metrics</a></p>
                        </body>
                        </html>ubuntu@infra-node1:~$ curl 172.16.151.66:9128/metrics

<hangs forever>

===== Observations

1. I've built a p-c-e binary (non-snapped) to verify, if there's any issue in either Ceph or my networking. It looks like there isn't any: https://paste.ubuntu.com/p/fMcKMfSJrM/
2. I tried to strace the snapped exporter to get an idea what it could be busy with, and got the following after invoking the /metrics query: https://paste.ubuntu.com/p/mppBtrtZgt/
3. Invoking the snap run doesn't make any difference; for some reason exporter isn't trying to query MONs:

root@cloud-node-008:~# snap run prometheus-ceph-exporter.ceph-exporter
* Running /snap/prometheus-ceph-exporter/21/bin/ceph_exporter with args: -ceph.user prometheus-ceph-exporter
2021/06/08 13:08:24 Starting ceph exporter on ":9128"

querying endpoint: 52 ubuntu@infra-node1:~/pcb-plus-01-bundle$ curl 172.16.151.74:9128/metrics
<nothing>

4. We've tried to rebuild the snap from scratch, but
a) without --devmode: the /metrics query is not hanging, but complaining in exporter logs:

TRAC[2021-06-10T15:29:24Z] creating rados connection to execute mon command args="{\"format\":\"json\",\"prefix\":\"osd tree\",\"states\":[\"down\"]}"
2021-06-10 15:29:24.944386 7efc777fe700 -1 Errors while parsing config file!
2021-06-10 15:29:24.944391 7efc777fe700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (13) Permission denied

b) with --devmode - it's complaining about "Permission denied" and the following error in ceph-mon logs occurs:

2021-06-10T13:22:16.327+0000 7fc866cd6700 0 cephx server client.admin: handle_request failed to decode CephXAuthenticate: buffer::end_of_buffer

Any suggestions on how could we proceed, will be much appreciated.

Edin S (exsdev)
Changed in charm-prometheus-ceph-exporter:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.