increase registry memory limit

Bug #1815685 reported by Paul Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Worker Charm
New
Undecided
Unassigned

Bug Description

One of our larger kubernetes clusters recently encountered difficulty respawning pods after a node reboot. We had about 24 pods stuck in ImagePullBackoff due to image pulls repeatedly timing out. Manual "docker pull" and even simple GET requests with curl also yielded hangs and/or timeouts.

The node hosting the registry pod seemed generally healthy, with plenty of free memory, although i/o wait was averaging 50% or so. iotop attributed almost all of the i/o to the registry process, although it was somewhat bursty. The VM hosting the worker instance was also extremely lightly loaded (and its VM storage uses an SSD cache).

Curiously (or so it seemed at the time) the registry process had an RSS of 98M or so and a VSZ of over 300M, despite the worker having multiple GB of memory free. Eventually I noticed that paging i/o on this worker was also high.

It seems that what happened is that the registry process was thrashing due to the low memory limit and comparatively high request rate. Later, as pods finally started to spawn successfully, I observed that both i/o wait and paging i/o on the node in question were almost zero, and my test docker pulls were also working quickly.

It's probably difficult to pick a limit that makes sense for every possible deployment, but if the registry isn't known to have ruinous memory leaks, I'd suggest that increasing the memory limit by a few hundred M would make sense, especially for deployments such as ours that have 16G instances serving as kubernetes-worker units.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.