Activity log for bug #2055143

Date Who What changed Old value New value Message
2024-02-27 10:17:21 Nishant Dash bug added bug
2024-02-27 10:18:01 Nishant Dash description - Given a ntp misconfiguration or timeout on reaching ntp servers, there can a temporary situation where there is time drift or clock skew across a ceph-mon cluster. While this in itself is really bad, what I see with ceph mon is that the ceph-mon@... service (/usr/bin/ceph-mon) ends up taking gigs of mem (maxed out the node mem usage at 24G) when it should normally be taking a lot less (for the cluster I have about 500-800M.) - My Setup: 3 vms, microk8s charm deployed to them and ceph mon + ceph osd charms deployed to them ``` ceph-csi v3.9.0,v0,v3... active 3 ceph-csi 1.28/stable 37 no Versions: cephfs=v3.9.0, config=v0, rbd=v3.9.0 ceph-mon 17.2.6 active 3 ceph-mon quincy/stable 201 no Unit is ready and clustered ceph-osd 17.2.6 active 3 ceph-osd quincy/stable 576 no Unit is ready (1 OSD) grafana-agent-microk8s active 3 grafana-agent edge 52 no microk8s 1.28.3 active 3 microk8s 1.28/stable 213 no node is ready ntp 4.2 active 3 ntp stable 50 no chrony: Ready ``` - To reproduce (1:1), you can set unreachable ntp servers for so chronyc should essentially have nothing to sync from, until a clock skew is introduced. - Given a ntp misconfiguration or timeout on reaching ntp servers, there can a temporary situation where there is time drift or clock skew across a ceph-mon cluster. While this in itself is really bad, what I see with ceph mon is that the ceph-mon@... service (/usr/bin/ceph-mon) ends up taking gigs of mem (maxed out the node mem usage at 24G) when it should normally be taking a lot less (for the cluster I have about 500-800M.) - My Setup: 3 vms, microk8s charm deployed to them and ceph mon + ceph osd charms deployed to them ``` ceph-csi v3.9.0,v0,v3... active 3 ceph-csi 1.28/stable 37 no Versions: cephfs=v3.9.0, config=v0, rbd=v3.9.0 ceph-mon 17.2.6 active 3 ceph-mon quincy/stable 201 no Unit is ready and clustered ceph-osd 17.2.6 active 3 ceph-osd quincy/stable 576 no Unit is ready (1 OSD) grafana-agent-microk8s active 3 grafana-agent edge 52 no microk8s 1.28.3 active 3 microk8s 1.28/stable 213 no node is ready ntp 4.2 active 3 ntp stable 50 no chrony: Ready ``` - To reproduce (1:1), you can set unreachable ntp servers for so chronyc should essentially have nothing to sync from, until a clock skew is introduced. Let me know if there is anything more I can provide and clarify
2024-02-28 07:33:54 Nishant Dash description - Given a ntp misconfiguration or timeout on reaching ntp servers, there can a temporary situation where there is time drift or clock skew across a ceph-mon cluster. While this in itself is really bad, what I see with ceph mon is that the ceph-mon@... service (/usr/bin/ceph-mon) ends up taking gigs of mem (maxed out the node mem usage at 24G) when it should normally be taking a lot less (for the cluster I have about 500-800M.) - My Setup: 3 vms, microk8s charm deployed to them and ceph mon + ceph osd charms deployed to them ``` ceph-csi v3.9.0,v0,v3... active 3 ceph-csi 1.28/stable 37 no Versions: cephfs=v3.9.0, config=v0, rbd=v3.9.0 ceph-mon 17.2.6 active 3 ceph-mon quincy/stable 201 no Unit is ready and clustered ceph-osd 17.2.6 active 3 ceph-osd quincy/stable 576 no Unit is ready (1 OSD) grafana-agent-microk8s active 3 grafana-agent edge 52 no microk8s 1.28.3 active 3 microk8s 1.28/stable 213 no node is ready ntp 4.2 active 3 ntp stable 50 no chrony: Ready ``` - To reproduce (1:1), you can set unreachable ntp servers for so chronyc should essentially have nothing to sync from, until a clock skew is introduced. Let me know if there is anything more I can provide and clarify - Given a ntp misconfiguration or timeout on reaching ntp servers, there can a temporary situation where there is time drift or clock skew across a ceph-mon cluster. While this in itself is really bad, what I see with ceph mon is that the ceph-mon@... service (/usr/bin/ceph-mon) ends up taking gigs of mem (maxed out the node mem usage at 24G) when it should normally be taking a lot less (for the cluster I have about 500-800M.) - My Setup: 3 vms, microk8s charm deployed to them and ceph mon + ceph osd charms deployed to them ``` ceph-csi v3.9.0,v0,v3... active 3 ceph-csi 1.28/stable 37 no Versions: cephfs=v3.9.0, config=v0, rbd=v3.9.0 ceph-mon 17.2.6 active 3 ceph-mon quincy/stable 201 no Unit is ready and clustered ceph-osd 17.2.6 active 3 ceph-osd quincy/stable 576 no Unit is ready (1 OSD) grafana-agent-microk8s active 3 grafana-agent edge 52 no microk8s 1.28.3 active 3 microk8s 1.28/stable 213 no node is ready ntp 4.2 active 3 ntp stable 50 no chrony: Ready ``` Let me know if there is anything more I can provide and clarify