Mysql pods being restarted by possible OOM killer?
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Snap |
Fix Released
|
Critical
|
Guillaume Boutry |
Bug Description
Once openstack is up and running, even if left unused the mysql instances will go offline into maintenance.
Issue appears to be oom killer. Note exit code 137.
mysql:
Container ID: containerd:
Image: registry.
Image ID: registry.
Port: <none>
Host Port: <none>
Command:
/
Args:
run
--create-dirs
--hold
--http
:38813
--verbose
State: Running
Started: Tue, 30 Jan 2024 14:34:07 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 30 Jan 2024 14:27:20 +0000
Finished: Tue, 30 Jan 2024 14:34:05 +0000
Ready: True
Restart Count: 74
Limits:
memory: 2Gi
Requests:
memory: 2Gi
Liveness: http-get http://
Readiness: http-get http://
Environment:
JUJU_
PEBBLE_
Mounts:
/
/
/
/
It seems this happens around the time the scheduled process (juju based?) runs logrotates etc. Its possible some of the noise in the logs exacerbates this (locale and mysql password plugin). I have seen oom take out containers when the filecache contributes extra weight to the container - the other solution may be to log to the backing store?
Observed on openstack 2023.2 375 2023.2/edge
Updating mysql and mysql router to 120/edge and 93/edge respectively doesn't help despite a healthcheck fix.
Changed in snap-openstack: | |
status: | Fix Committed → Fix Released |
Using a quick and dirty 'kubectl edit statefulset' on the mysql config to bump the limit from 2g to 4g clears the OOM restarts with exit code 137's, however general instability continues
e.g:-
openstack cinder-mysql-0 2/2 Running 0 32m
openstack placement-mysql-0 2/2 Running 2 (6m50s ago) 32m
openstack horizon-mysql-0 2/2 Running 1 (6m49s ago) 31m
openstack nova-mysql-0 2/2 Running 1 (2m1s ago) 32m
openstack neutron-mysql-0 2/2 Running 2 (118s ago) 30m
openstack keystone-mysql-0 2/2 Running 1 (118s ago) 35m
openstack glance-mysql-0 2/2 Running 2 (117s ago) 31m
Querying placement-mysql-0 shows an exist code of 0 and 2 restarts, so still something else triggering this.
State: Running
Started: Fri, 08 Feb 2024 18:51:13 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 08 Feb 2024 18:45:05 +0000
Finished: Fri, 08 Feb 2024 18:51:13 +0000
Ready: True
Restart Count: 2
Limits:
memory: 4Gi
Requests:
memory: 4Gi