Ceph Monitor Charm

Bug #2055143
Comment #0

Comment 0 for bug 2055143

Revision history for this message

Nishant Dash (dash3) wrote on 2024-02-27:

- Given a ntp misconfiguration or timeout on reaching ntp servers, there can a temporary situation where there is time drift or clock skew across a ceph-mon cluster. While this in itself is really bad, what I see with ceph mon is that the ceph-mon@... service (/usr/bin/ceph-mon) ends up taking gigs of mem (maxed out the node mem usage at 24G) when it should normally be taking a lot less (for the cluster I have about 500-800M.)

- My Setup:
3 vms, microk8s charm deployed to them and ceph mon + ceph osd charms deployed to them
```
ceph-csi v3.9.0,v0,v3... active 3 ceph-csi 1.28/stable 37 no Versions: cephfs=v3.9.0, config=v0, rbd=v3.9.0
ceph-mon 17.2.6 active 3 ceph-mon quincy/stable 201 no Unit is ready and clustered
ceph-osd 17.2.6 active 3 ceph-osd quincy/stable 576 no Unit is ready (1 OSD)
grafana-agent-microk8s active 3 grafana-agent edge 52 no
microk8s 1.28.3 active 3 microk8s 1.28/stable 213 no node is ready
ntp 4.2 active 3 ntp stable 50 no chrony: Ready
```

- To reproduce (1:1), you can set unreachable ntp servers for so chronyc should essentially have nothing to sync from, until a clock skew is introduced.