We sometimes get these tracebacks in CI (edited for brevity):
mon-relation-changed logger.go:60 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-9
mon-relation-changed logger.go:60 Running command: /usr/bin/systemctl enable ceph-volume@lvm-9-a48e34ad-193e-40dd-ac22-17b2a0920877
mon-relation-changed logger.go:60 stderr: Created symlink /<email address hidden> → /lib/systemd/system/ceph-volume@.service.
mon-relation-changed logger.go:60 Running command: /usr/bin/systemctl enable --runtime ceph-osd@9
mon-relation-changed logger.go:60 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@9.service → /lib/systemd/system/ceph-osd@.service.
mon-relation-changed logger.go:60 Running command: /usr/bin/systemctl start ceph-osd@9
mon-relation-changed logger.go:60 --> ceph-volume lvm activate successful for osd ID: 9
mon-relation-changed logger.go:60 --> ceph-volume lvm create successful for: ceph-a48e34ad-193e-40dd-ac22-17b2a0920877/osd-block-a48e34ad-193e-40dd-ac22-17b2a0920877
mon-relation-changed logger.go:60 Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"
mon-relation-changed logger.go:60 Traceback (most recent call last):
mon-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 908, in <module>
mon-relation-changed logger.go:60 hooks.execute(sys.argv)
mon-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 963, in execute
mon-relation-changed logger.go:60 self._hooks[hook_name]()
mon-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/mon-relation-changed", line 668, in mon_relation
mon-relation-changed logger.go:60 ceph.apply_osd_settings(settings)
mon-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-ceph-osd-0/charm/lib/charms_ceph/utils.py", line 3425, in apply_osd_settings
mon-relation-changed logger.go:60 subprocess.check_output(cmd.split()).decode('UTF-8'))
mon-relation-changed logger.go:60 File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
mon-relation-changed logger.go:60 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
mon-relation-changed logger.go:60 File "/usr/lib/python3.10/subprocess.py", line 526, in run
mon-relation-changed logger.go:60 raise CalledProcessError(retcode, process.args,
mon-relation-changed logger.go:60 subprocess.CalledProcessError: Command '['ceph', 'daemon', 'osd.9', 'config', '--format=json', 'get', 'osd_heartbeat_grace']' returned non-zero exit status 22.
I believe this is due to a race in the mon-relation-changed hook, where after `prepare_disks_and_activate()` we shortly call `ceph.apply_osd_settings(settings)`.
The prepare call also starts the OSDs but that is async -- it'll return before the OSD service is fully up. To apply OSD settings otoh we need the service to listen on the admin socket, so this can race which I believe would result in the above error.