Disk cache can use all of memory leaving insufficient memory for squid in-memory cache
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Repository Cache Charm |
New
|
Medium
|
Unassigned | ||
ubuntu-repository-cache (Juju Charms Collection) |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
Reviewing size_caches() in lib/ubuntu-
The design of size_caches() was informed by http://
Squid can use up to 1/2 of total system memory, the remainder was set aside for OS, file cache of metadata for Apache, and file cache of squid objects. Of this half of memory, it was allotted as:
100MB set aside for squid overhead
256MB minimum for memory cache plus remainder of 0.5*total_
disk cache up to (available_disk / 1024 * 20)MB
My concern is that there is no cap on disk cache usage such that a very large disk would result in very little memory for in memory squid caching (cache_mem config option). Per https:/
The example configuration that concerns me would be EC2 i2.xlarge with 30GB RAM and an 800GB ephemeral disk. 800GB would size the disk cache at a max of 15GB which means memory cache would be 256MB which is too small.
We have recommended 200GB of storage and 24GB ram which leads to better balance and the documented testing configuration is a c3.8xlarge with 32GB RAM and a 320GB ephemeral disk which in practice gives an in-memory cache of 24273MB and on-disk cache of 293634MB. This seems to be a good balance, we just need the code to provide this balance when the machine configuration has more disk. That probably means scaling the minimum allocation for in-memory cache with total system memory.
Memory usage should then be documented, probably in depth in DESIGN.md
Changed in ubuntu-repository-cache: | |
importance: | Undecided → Medium |
Changed in ubuntu-repository-cache (Juju Charms Collection): | |
status: | New → Won't Fix |
We hit what we suspect may be an instance of this problem in Azure's west US last night. Squid got OOM killed and it caused the unit to serve some 503s for about a minute. Looking at the unit's config, it actually seems pretty reasonable, Squid has a 2164M in memory cache on a 7G RAM unit and after about a day of serving traffic is sitting right around 33% of RAM, 2515M in memory. I'm not sure what circumstances would lead to an OOM unless another process were talking up substantial amounts of memory while Squid tried to allocate a big chunk, but both landscape and juju are known to have spiky memory usage and I can certainly imagine a situation where a confluence of spikes in memory usage by other processes leads to an OOM.
Maybe we should drop the initial allocation of memory for Squid's in memory cache down to to a third from half? That's still a VERY substantial cache and considering the hit rate for busy regions hovers around 100%, I don't think we need to optimize quite so heavily for in memory caching.