GC and memory-hungry parallel threads

Bug #1204689 reported by Andreas Franke (FMC) on 2013-07-24
Bug Description

This is (at least) a specific case of "wanting a better documentation on tuning the GC",
see https://bugs.launchpad.net/sbcl/+bug/798913 .

When running multiple threads in parallel, SBCL tends to exhaust its heap during GC
even though only a small fraction of the dynamic-space-size is used for live data
at any time. Specifically, in a 64bit VM with a 4Gb dynamic-space-size for SBCL
and 5 parallel worker threads, temporarily allocating 5000 * (2500 * 8 bytes) = 100MB
each time, the live data should not be much more than 500MB total at any time,
which is just 1/8 of the available space. Is this expected?

For the development of a web-application, we need to know how much live data SBCL can safely accomodate.
What is the worst-case memory requirement factor for a given amount of live data?
  Is it really 2 * (1+ highest-normal-generation) ?
Can this ratio be improved, maybe by using fewer generations? How?
Does it make sense to build SBCL for SMP with a non-generational GC? Is it possible?
Could it be a workaround to manually trigger a :full GC at some point when it's still safe?
How to tune the GC for a robust web-application with many worker threads and varying allocations?

Useful references:

Testcase (see attachment):
1. build executable with
  sbcl --dynamic-space-size 4Gb --load gc-stress-parallel.lisp
2. run e.g. like this:
  ./gc-stress-parallel.exe --parallel 5 --total 100000 --size 2500 --count 5000

$ sbcl --version

$ uname -a
Linux leo-2013 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed Jun 12 03:34:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

* *features*


Attaching dynamic usage graphs for 5 testcases:
default: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --total 100000
default2: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --total 10000
with-extra-gc: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-extra-gc t
with-extra-sleep: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-extra-sleep 0.01
without-clear-cells: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-clear-cells nil

Note that clearing out the list cells referencing the garbage makes a huge difference:
without it, the heap exhaustion occurs already after <1000 iterations:

#iterations until out-of-heap
  1666 default/dynamic-space-usage__parallel-5__size-2500__count-5000.csv
  2289 default2/dynamic-space-usage__parallel-5__size-2500__count-5000.csv
  2505 with-extra-gc/dynamic-space-usage__parallel-5__size-2500__count-5000__gc-T.csv
  2528 with-extra-sleep/dynamic-space-usage__parallel-5__size-2500__count-5000__sleep-0.01.csv
   544 without-clear-cells/dynamic-space-usage__parallel-5__size-2500__count-5000.csv

It seems that even for sequential requests allocating 100MB each, the dynamic-space usage grows up to >=~1220MB.
But at least there's no crash here (100000+ iterations ok).

bzip2'ed gc-logfiles for testcases described above

