GC and memory-hungry parallel threads

Bug #1204689 reported by Andreas Franke (FMC)
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

This is (at least) a specific case of "wanting a better documentation on tuning the GC",
see https://bugs.launchpad.net/sbcl/+bug/798913 .

When running multiple threads in parallel, SBCL tends to exhaust its heap during GC
even though only a small fraction of the dynamic-space-size is used for live data
at any time. Specifically, in a 64bit VM with a 4Gb dynamic-space-size for SBCL
and 5 parallel worker threads, temporarily allocating 5000 * (2500 * 8 bytes) = 100MB
each time, the live data should not be much more than 500MB total at any time,
which is just 1/8 of the available space. Is this expected?

For the development of a web-application, we need to know how much live data SBCL can safely accomodate.
What is the worst-case memory requirement factor for a given amount of live data?
  Is it really 2 * (1+ highest-normal-generation) ?
Can this ratio be improved, maybe by using fewer generations? How?
Does it make sense to build SBCL for SMP with a non-generational GC? Is it possible?
Could it be a workaround to manually trigger a :full GC at some point when it's still safe?
How to tune the GC for a robust web-application with many worker threads and varying allocations?

Useful references:
  http://john.freml.in/sbcl-optimise-gc
  http://permalink.gmane.org/gmane.lisp.steel-bank.devel/15630
  http://comments.gmane.org/gmane.lisp.steel-bank.general/3651

Testcase (see attachment):
1. build executable with
  sbcl --dynamic-space-size 4Gb --load gc-stress-parallel.lisp
2. run e.g. like this:
  ./gc-stress-parallel.exe --parallel 5 --total 100000 --size 2500 --count 5000

$ sbcl --version
SBCL 1.1.9.4-220651c

$ uname -a
Linux leo-2013 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed Jun 12 03:34:52 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

* *features*

(:ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS :C-STACK-IS-CONTROL-STACK
 :COMMON-LISP :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :ELF
 :FLOAT-EQL-VOPS :GENCGC :IEEE-FLOATING-POINT :INLINE-CONSTANTS :LARGEFILE
 :LINKAGE-TABLE :LINUX :LITTLE-ENDIAN :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN
 :OS-PROVIDES-GETPROTOBY-R :OS-PROVIDES-POLL :OS-PROVIDES-PUTWC
 :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES :RAW-INSTANCE-INIT-VOPS
 :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS :SB-SIMD-PACK
 :SB-SOURCE-LOCATIONS :SB-TEST :SB-THREAD :SB-UNICODE :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNIX :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

Revision history for this message
Andreas Franke (FMC) (andreas-franke) wrote :
Revision history for this message
Andreas Franke (FMC) (andreas-franke) wrote :

Attaching dynamic usage graphs for 5 testcases:
default: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --total 100000
default2: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --total 10000
with-extra-gc: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-extra-gc t
with-extra-sleep: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-extra-sleep 0.01
without-clear-cells: gc-stress-parallel.exe --parallel 5 --size 2500 --count 5000 --with-clear-cells nil

Note that clearing out the list cells referencing the garbage makes a huge difference:
without it, the heap exhaustion occurs already after <1000 iterations:

#iterations until out-of-heap
  1666 default/dynamic-space-usage__parallel-5__size-2500__count-5000.csv
  2289 default2/dynamic-space-usage__parallel-5__size-2500__count-5000.csv
  2505 with-extra-gc/dynamic-space-usage__parallel-5__size-2500__count-5000__gc-T.csv
  2528 with-extra-sleep/dynamic-space-usage__parallel-5__size-2500__count-5000__sleep-0.01.csv
   544 without-clear-cells/dynamic-space-usage__parallel-5__size-2500__count-5000.csv

Revision history for this message
Andreas Franke (FMC) (andreas-franke) wrote :

It seems that even for sequential requests allocating 100MB each, the dynamic-space usage grows up to >=~1220MB.
But at least there's no crash here (100000+ iterations ok).

Revision history for this message
Andreas Franke (FMC) (andreas-franke) wrote :

bzip2'ed gc-logfiles for testcases described above

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.