Disk cache can use all of memory leaving insufficient memory for squid in-memory cache

Bug #1465625 reported by Robert C Jennings
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Repository Cache Charm
New
Medium
Unassigned
ubuntu-repository-cache (Juju Charms Collection)
Won't Fix
Medium
Unassigned

Bug Description

Reviewing size_caches() in lib/ubuntu-repository-cache/squid I'm not sure that the intention captured by the commends is not served by the code.

The design of size_caches() was informed by http://wiki.squid-cache.org/SquidFaq/SquidMemory

 Squid can use up to 1/2 of total system memory, the remainder was set aside for OS, file cache of metadata for Apache, and file cache of squid objects. Of this half of memory, it was allotted as:
 100MB set aside for squid overhead
 256MB minimum for memory cache plus remainder of 0.5*total_system_memory after disk cache is allocated
 disk cache up to (available_disk / 1024 * 20)MB

My concern is that there is no cap on disk cache usage such that a very large disk would result in very little memory for in memory squid caching (cache_mem config option). Per https://wiki.ubuntu.com/Mirrors#Mirror_Guidelines, the package repository (/pool served by squid) can take up ~650GB (ever increasing). However, it is not critical that we cache the entire pool, many of those packages will not be hot and may see no use whatsoever. It would be better to have additional memory for the in-memory cache of pool objects.

The example configuration that concerns me would be EC2 i2.xlarge with 30GB RAM and an 800GB ephemeral disk. 800GB would size the disk cache at a max of 15GB which means memory cache would be 256MB which is too small.

We have recommended 200GB of storage and 24GB ram which leads to better balance and the documented testing configuration is a c3.8xlarge with 32GB RAM and a 320GB ephemeral disk which in practice gives an in-memory cache of 24273MB and on-disk cache of 293634MB. This seems to be a good balance, we just need the code to provide this balance when the machine configuration has more disk. That probably means scaling the minimum allocation for in-memory cache with total system memory.

Memory usage should then be documented, probably in depth in DESIGN.md

Revision history for this message
David Lawson (deej) wrote :

We hit what we suspect may be an instance of this problem in Azure's west US last night. Squid got OOM killed and it caused the unit to serve some 503s for about a minute. Looking at the unit's config, it actually seems pretty reasonable, Squid has a 2164M in memory cache on a 7G RAM unit and after about a day of serving traffic is sitting right around 33% of RAM, 2515M in memory. I'm not sure what circumstances would lead to an OOM unless another process were talking up substantial amounts of memory while Squid tried to allocate a big chunk, but both landscape and juju are known to have spiky memory usage and I can certainly imagine a situation where a confluence of spikes in memory usage by other processes leads to an OOM.

Maybe we should drop the initial allocation of memory for Squid's in memory cache down to to a third from half? That's still a VERY substantial cache and considering the hit rate for busy regions hovers around 100%, I don't think we need to optimize quite so heavily for in memory caching.

Chris Glass (tribaal)
Changed in ubuntu-repository-cache:
importance: Undecided → Medium
Haw Loeung (hloeung)
Changed in ubuntu-repository-cache (Juju Charms Collection):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.