Only one monitor address is populated into the config

Bug #1807282 reported by Andrey Grebennikov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Prometheus Ceph Exporter Charm
Fix Released
Medium
Drew Freiberger

Bug Description

Rocky/Bionic deployment

Ceph cluster consists of 3 monitors and a number of OSDs.

When adding prometheus-ceph-exporter unit and the relation to ceph-mon:client, it starts collecting the metrics.
However, the ceph config used by the exporter contains only one mon address:

root@juju-c5b7d0-1-lxd-8:/var/lib/juju/agents/unit-prometheus-ceph-exporter-4# cat /var/snap/prometheus-ceph-exporter/current/ceph.conf
###############################################################################
# [ WARNING ]
# glance configuration file maintained by Juju
# local changes may be overwritten.
###############################################################################
[global]
auth_supported = cephx
 #keyring = /etc/ceph/$cluster.$name.keyring
 keyring = /var/snap/prometheus-ceph-exporter/current/ceph.client.prometheus-ceph-exporter.keyring
 mon host = 10.11.30.46:6789

At the same time, for example, nova-computes have all 3 mons populated in their config. This must be caused by different implementation of obtaining the addresses of the mons in the layer vs how it is done in Nova:
https://github.com/openstack/charm-nova-compute/blame/c744e052347d8ddfae88804a0ad0bdfdf4f5ae0d/hooks/charmhelpers/contrib/storage/linux/ceph.py#L896

Related branches

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (3.3 KiB)

Based on the output produced during a clean redeploy, I can see that the charm logic is incorrect in handling config changes.

Only the first config-changed event with all necessary data present is handled
https://github.com/openstack-charmers/charm-interface-ceph-client/blob/0e191799fc9ee2d7662cd77f030790fa850c6dea/requires.py#L40-L48

at this time other units do not have ceph-public-address exposed via relation data yet, therefore, the following code simply ignores the lack of those:

https://github.com/openstack-charmers/charm-interface-ceph-client/blob/0e191799fc9ee2d7662cd77f030790fa850c6dea/requires.py#L108-L109

Then configure_exporter is executed and exporter.started is set which triggers ceph.conf rendering with only one monitor address present.

@when('ceph.available')
@when_not('exporter.started')
def configure_exporter(ceph_client):

This flag is never cleared and the reactive code of the exporter does not have any intelligent relation data change processing with regards to that.

This is confirmed by the timestamps:

# the first and the only rendering event happened at 02:46:45
root@juju-d85cbd-12-lxd-2:/var/lib/juju/agents/unit-prometheus-ceph-exporter-0/charm# grep ceph.conf /var/log/juju/unit-prometheus-ceph-exporter-0.log
2018-12-07 02:46:45 DEBUG juju-log ceph:295: Writing file /var/snap/prometheus-ceph-exporter/current/ceph.conf root:root 444

# joined events - nothing rendered after that as expected
root@juju-d85cbd-12-lxd-2:/var/lib/juju/agents/unit-prometheus-ceph-exporter-0/charm# grep ceph-relation-joined /var/log/juju/unit-prometheus-ceph-exporter-0.log
2018-12-07 02:28:11 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-joined
2018-12-07 02:28:12 DEBUG ceph-relation-joined lxc
2018-12-07 02:28:13 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-joined
2018-12-07 02:28:13 DEBUG ceph-relation-joined lxc
2018-12-07 02:29:29 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-joined
2018-12-07 02:29:29 DEBUG ceph-relation-joined lxc

root@juju-d85cbd-12-lxd-2:/var/lib/juju/agents/unit-prometheus-ceph-exporter-0/charm# grep ceph-relation-changed /var/log/juju/unit-prometheus-ceph-exporter-0.log
2018-12-07 02:28:12 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-changed
2018-12-07 02:28:12 DEBUG ceph-relation-changed lxc
2018-12-07 02:28:13 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-changed
2018-12-07 02:28:13 DEBUG ceph-relation-changed lxc
2018-12-07 02:30:10 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-changed
2018-12-07 02:30:10 DEBUG ceph-relation-changed lxc
2018-12-07 02:46:45 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-changed
2018-12-07 02:46:45 DEBUG ceph-relation-changed lxc
2018-12-07 02:46:55 DEBUG ceph-relation-changed active

^^^ this corresponds to the rendering event and no rendering attempts happen for other relation-changed events which results in the incorrect file content
2018-12-07 02:47:26 INFO juju-log ceph:295: Reactive main running for hook ceph-relation-changed
2018-12-07 02:47:26 DEBUG ceph-relation-changed lxc
2018-12-07 02:47:...

Read more...

Changed in prometheus-ceph-exporter-charm:
importance: Undecided → Medium
Changed in prometheus-ceph-exporter-charm:
status: New → Incomplete
status: Incomplete → In Progress
assignee: nobody → Vladimir Grevtsev (vlgrevtsev)
Revision history for this message
Drew Freiberger (afreiberger) wrote :

Confirmed, charm does not update mons after getting first mon relation with auth info.

I've added a handler to check for change in mon_hosts and/or key data from ceph-client relation and re-trigger configuration/restart of snap.

https://code.launchpad.net/~afreiberger/charm-prometheus-ceph-exporter/+git/charm-prometheus-ceph-exporter/+merge/388227

Changed in charm-prometheus-ceph-exporter:
assignee: Vladimir Grevtsev (vlgrevtsev) → Drew Freiberger (afreiberger)
milestone: none → 20.08
Changed in charm-prometheus-ceph-exporter:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.