No way to enable RBD per-image IO statistics optionally using the charm

Bug #2042405 reported by Nobuto Murata
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Committed
Undecided
Unassigned

Bug Description

Even after enabling the prometheus module in Ceph mgr, RBD per-image IO statistics are disabled by default. And it's well documented in the upstream.
https://docs.ceph.com/en/quincy/mgr/prometheus/#rbd-io-statistics

We don't have to enable the statistics out of the box, but it would be good if we expose it as a charm config option so that operators can enable the stat for necessary RBD pools per their need.

There was a proposed patch to charm-ceph-dashboard before, but this config should be in charm-ceph-mon instead since it doesn't depends on the fact that ceph-dashboard is deployed or not.
https://review.opendev.org/c/openstack/charm-ceph-dashboard/+/855888/8/src/charm.py

Revision history for this message
Nobuto Murata (nobuto) wrote :

Just for the record, this is out of the box (empty)

Revision history for this message
Nobuto Murata (nobuto) wrote :

And this is after executing the following by hand.

juju exec --unit ceph-mon/leader 'ceph config set mgr mgr/prometheus/rbd_stats_pools "*"'

Revision history for this message
Nobuto Murata (nobuto) wrote :

Also we would be able to show an informative message in the dashboard itself so nobody has to wonder why it has no data.

POC patch:
https://github.com/nobuto-m/ceph/commit/8a6974eb86abb667bab9afe1d9950f940363d81a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/900153
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/ffe81367e10eaf8c6fecc96f3c442eb471354bbc
Submitter: "Zuul (22348)"
Branch: master

commit ffe81367e10eaf8c6fecc96f3c442eb471354bbc
Author: Samuel Walladge <email address hidden>
Date: Mon Nov 6 17:11:19 2023 +1030

    Add config option for rbd_stats_pools

    This allows configuration RBD IO statistics collection for RBD pools.

    Co-authored-by: Yoshi Kadokawa <email address hidden>

    Closes-Bug: #2042405

    Related-Bug: #1989648
    Change-Id: I2252163533a312f0f53165f946711ab20bb0e3c9

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (stable/quincy.2)

Fix proposed to branch: stable/quincy.2
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/900899

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (stable/quincy.2)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/900899
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/398c34ae05af9905c267f4847a2c36133913d7ea
Submitter: "Zuul (22348)"
Branch: stable/quincy.2

commit 398c34ae05af9905c267f4847a2c36133913d7ea
Author: Samuel Walladge <email address hidden>
Date: Mon Nov 6 17:11:19 2023 +1030

    Add config option for rbd_stats_pools

    This allows configuration RBD IO statistics collection for RBD pools.

    Co-authored-by: Yoshi Kadokawa <email address hidden>

    Closes-Bug: #2042405

    Related-Bug: #1989648
    Change-Id: I2252163533a312f0f53165f946711ab20bb0e3c9
    (cherry picked from commit ffe81367e10eaf8c6fecc96f3c442eb471354bbc)

Revision history for this message
Shunde Zhang (shunde-zhang) wrote :
Download full text (3.7 KiB)

It seems juju would fail if the value is empty.

2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: 2024-01-06 05:47:06 WARNING unit.ceph-mon/0.config-changed logger.go:60 Error EINVAL: unrecognized config option 'mgr/prometheus/rbd_stats_pools'
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: 2024-01-06 05:47:06 ERROR unit.ceph-mon/0.juju-log server.go:316 Uncaught exception while in charm code:
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: Traceback (most recent call last):
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "./src/charm.py", line 310, in <module>
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: main(CephMonCharm)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/main.py", line 436, in main
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: _emit_charm_event(charm, dispatcher.event_name)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: event_to_emit.emit(*args, **kwargs)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 351, in emit
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: framework._emit(event)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 853, in _emit
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: self._reemit(event_path)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops/framework.py", line 942, in _reemit
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: custom_handler(event)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/ops_openstack/core.py", line 260, in _on_config
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: self.on_config(event)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "./src/charm.py", line 87, in on_config
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: if hooks.config_changed():
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/charmhelpers/contrib/hardening/harden.py", line 90, in _harden_inner2
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: return f(*args, **kwargs)
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/src/ceph_hooks.py", line 381, in config_changed
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: mgr_config_set_rbd_stats_pools()
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/src/utils.py", line 427, in mgr_config_set_rbd_stats_pools
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: ceph_utils.mgr_config_set(
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: File "/var/lib/juju/agents/unit-ceph-mon-0/charm/venv/charms_ceph/utils.py", line 3569, in ceph_config_set
2024-01-06 05:47:08 [ERROR] unit-ceph-mon-0.log: subprocess.check_call(['ceph', 'config', 'set', who, name, value])
2024-...

Read more...

Revision history for this message
Shunde Zhang (shunde-zhang) wrote :

When the value is empty, it seems to execute:
ceph config set mgr mgr/prometheus/rbd_stats_pools

But it should execute:
ceph config set mgr mgr/prometheus/rbd_stats_pools ''

May need to update charms_ceph/utils.py to handle empty values properly

Revision history for this message
Shunde Zhang (shunde-zhang) wrote :
Download full text (3.1 KiB)

After some more tests, it turns out that this command will fail no matter if I give it a value or not.
It seems the command is executed too early and at that time ceph mon is not initiated yet.

2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:05:59 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Ceph is not bootstrapped, skipping upgrade checks.
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:05:59 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Monitor hosts are ['10.5.2.27']
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:00 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Settings for the cluster are: {'fsid': '0be52cf2-ac62-11ee-8f8a-9109e1518e46', 'monitor-secret': 'AQDY+5hlxaxyChAA1kj4CSKU3tq2z1uDNTUgGg=='}
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:00 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Making dir /var/lib/charm/secondary-ceph-mon ceph:ceph 555
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:00 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Making dir /var/run/ceph ceph:ceph 755
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:00 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Making dir /var/lib/ceph/mon/ceph-juju-9d6cae-zaza-34cc90b87300-5 ceph:ceph 755
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:00 WARNING unit.secondary-ceph-mon/0.config-changed logger.go:60 Created symlink /<email address hidden> → /lib/systemd/system/ceph-mon@.service.
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:01 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Waiting for quorum to be reached
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:04 WARNING unit.secondary-ceph-mon/0.config-changed logger.go:60 2024-01-06T07:06:04.580+0000 7f9d064f9700 -1 auth: unable to find a keyring on /var/lib/ceph/mon/ceph-/keyring: (2) No such file or directory
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:04 WARNING unit.secondary-ceph-mon/0.config-changed logger.go:60 2024-01-06T07:06:04.580+0000 7f9d064f9700 -1 AuthRegistry(0x7f9d00060240) no keyring found at /var/lib/ceph/mon/ceph-/keyring, disabling cephx
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:04 INFO unit.secondary-ceph-mon/0.juju-log server.go:316 Making dir /var/lib/ceph/mgr/ceph-juju-9d6cae-zaza-34cc90b87300-5 ceph:ceph 555
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:05 WARNING unit.secondary-ceph-mon/0.config-changed logger.go:60 Created symlink /<email address hidden> → /lib/systemd/system/ceph-mgr@.service.
2024-01-06 07:06:08 [ERROR] unit-secondary-ceph-mon-0.log: 2024-01-06 07:06:07 WARNING unit.secondary-ceph-mon/0.config-changed logger.go:60 Error EINVAL: unrecognized config option 'mgr/prometheus/rbd_stats_pools'

Maybe it needs to wait until mon is bootstrapped then run the com...

Read more...

Revision history for this message
Nobuto Murata (nobuto) wrote :

The command is guarded by ceph_utils.is_bootstrapped() already so it sounds like a race condition whether the mgr and the module is enabled by that point.

Do you have a minimal reproducer e.g. bundle to trigger this reliably?

Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

We've hit this issue during CI runs of the radosgw charm, AFAIK that's a reliable reproducer.

This latest bug in particular is being handled here: https://review.opendev.org/c/openstack/charm-ceph-mon/+/904594

Revision history for this message
Shunde Zhang (shunde-zhang) wrote (last edit ):

The fix in ceph-mon charm r199 works! Thanks for looking into this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-ceph-mon (stable/quincy.2)

Related fix proposed to branch: stable/quincy.2
Review: https://review.opendev.org/c/openstack/charm-ceph-mon/+/914836

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-mon (stable/quincy.2)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/914836
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/01bd228242a1fe437e9f0f544aa4b4e093c56e2d
Submitter: "Zuul (22348)"
Branch: stable/quincy.2

commit 01bd228242a1fe437e9f0f544aa4b4e093c56e2d
Author: Luciano Lo Giudice <email address hidden>
Date: Wed Jan 3 18:10:30 2024 -0300

    Retry setting rbd_stats_pools prometheus config

    Setting the 'mgr/prometheus/rbd_stats_pools' option can fail
    if we arrive too early, even if the cluster is bootstrapped. This is
    particularly seen in ceph-radosgw test runs. This patchset thus
    adds a retry decorator to work around this issue.

    Related-Bug: #2042405
    Related-Bug: #2058636

    Change-Id: Id9b7b903e67154e7d2bb6fecbeef7fac126804a8
    (cherry picked from commit d76939ef70bd5016a6e515558de1b9eabe9d0d55)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.