kubelet image garbage collection settings too high with no mechanism to reconfigure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Jim Gauld |
Bug Description
Brief Description
-----------------
/var/lib/docker file-system thresholds are being hit prior to the kubelet configurable image garbage collection settings kick in. There is a global 80% setting for all file-systems. The kubelet default imageGC kicks in at 85% based on the threshold settings.
If there is extraneous remnant docker data, or if a customer has large images (eg, 2.5GB), then there can be little room left after we go beyond 85% where pod hard-evictions occur and we won't be able to schedule new pods on the node due to Node pressure. The kubelet default hard-eviction limit for images is 15%. This means that we actually have zero room left at 86% /var/lib/docker usage before pods stop scheduling, basically we cannot effectively even use the remaining docker space.
Need to reduce the imageGC setting below 80%.
Should configure the hard-eviction or imagefs to more reasonable value like 1GiB or 2GiB instead of 15% (eg, this translates to 4.5GiB of 30).
Once a system has installed via 'kubeadm' there is currently no mechanism to update any kubelet environment configuration settings from whatever they had initially.
Need a mechanism to update kubelet-config values and persist those changes on kubernetes nodes.
Severity
--------
Major: Cannot update kubelet-config settings. Sites require manual monitoring and periodic manual steps to recover /var/lib/docker usage. There is potential for a site to 'blow up'.
Steps to Reproduce
------------------
Fresh install ISO. Manually pull in various docker images.
The initial n3000-opae docker image and other manual docker images/pulls remain.
Expected Behavior
------------------
Expect to see kubelet logs where image garbage collection (imageGC) kick in and automatically remove images before 80% file-system alarms kick in. Expect no /var/lib/docker file-system alarms.
daemon.
Actual Behavior
----------------
See /var/lib/docker disk usage hitting 85% prior to imageGC removing images.
See pods being evicted and then scheduling Node pressure (unable to schedule) when we hit 86%. Unable to actually use the last 4.5GiB of the /var/lib/docker filesystem.
The /var/lib/docker/x directory may contain remnants of initial install, where all the new CRI stuff is under /var/lib/
Reproducibility
---------------
100%. We always get default imageGC and hard-eviction settings.
We always see some docker usage from initial install that never gets removed.
Occasionally we see huge docker usage outside of CRI due to stuff that was never removed.
System Configuration
-------
AIO-DX. All K8S configurations.
Branch/Pull Time/Commit
-------
BUILD_DATE=
Last Pass
---------
Day one issue.
Timestamp/Logs
--------------
Can see collectd related filesystem logs change over time as there are step jumps in usage:
zgrep collectd daemon.log.3.gz daemon.log.4.gz daemon.log.3.gz daemon.log.2.gz daemon.log.1.gz daemon.log |grep -e reading |grep docker
daemon.
When kubelet image GC runs, will see logs like:
2022-04-
Will see overall file-system usage like this:
Filesystem Type 1M-blocks Used Available Use% Mounted on
/dev/mapper/
In the case where too much space is chewed up (eg, say 6GB or lots more, not in CRI), can see this where image GC no longer can cleanup:
daemon.
Test Activity
-------------
Feature Testing, Evaluation
Workaround
----------
Periodically and manually cleanup /var/lib/docker using commands like:
docker system prune --force
crictl rmi --prune
Can manually inspect and remove individual images too,
eg, "docker rmi x", "crictl rmi x"
Manually cleanup evicted pods like this:
crictl ps --state=Exited --quiet | xargs -r -I {} crictl rm {}
This does not address being able to reconfigure kubelet settings.
Changed in starlingx: | |
assignee: | nobody → Jim Gauld (jgauld) |
Changed in starlingx: | |
status: | New → In Progress |
tags: | added: stx.containers |
Changed in starlingx: | |
importance: | Undecided → Medium |
Changed in starlingx: | |
status: | In Progress → Fix Released |
Reviewed: https:/ /review. opendev. org/c/starlingx /ansible- playbooks/ +/844305 /opendev. org/starlingx/ ansible- playbooks/ commit/ 9400162956a0327 47866cb2dc3a1a0 52bf014a5b
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 9400162956a0327 47866cb2dc3a1a0 52bf014a5b
Author: Jim Gauld <email address hidden>
Date: Wed Jun 1 10:24:46 2022 -0400
Updated kubelet imageGC and evictionHard settings
Configure kubelet-config settings for image garbage collection and
hard eviction. New settings reduce likelihood of Node-Pressure
Eviction that occurs essentially near 86% /var/lib/docker usage.
The default upstream default imageGCHighThre sholdPercent 85 is too high,
especially with evictionHard imagefs.available default of 15%.
The new image garbage collection parameters are engineered below
the system global default 80% file-system threshold. This allows
kubelet imageGC to cleanup space prior to hitting /var/lib/docker
alarms.
The evictionHard imagefs.available is reduced to 2Gi,
from the previous setting 15% which translated to 4.5Gi.
TESTING:
PASS - AIO-DX fresh install gets updated kubelet config
PASS - manually fill /var/lib/docker to exceed imageGC and
verify GC operate
PASS - manually fill /var/lib/docker to exceed 'size - 2Gi'
and verify Node-Pressure eviction
Partial-Bug: 1977754
Signed-off-by: Jim Gauld <email address hidden> 854084ee954338d 974726ea453
Change-Id: I5c5c7ba5dfcd8f