Heap exhaustion and ldb with (loop :do (gc))

Bug #1850709 reported by Mikko Markus Torni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

When executing (loop :do (gc :full nil)) SBCL crashes into ldb with "Heap exhaustion".

It happens with all dynamic sizes that I've tried in the range 64-1024.
The Gen1 Waste seems to fill the dynamic space.

Below are test results on two SBCL versions.

With full garbage collections (gc :full t) it seems to work. Adding allocations to the loop causes crashes much faster. By forcing a full garbage collection just before it drops into ldb, waste is cleared and does not accumulate. (declare (optimize (speed ...) (safety ...)) does not seem to affect results with any combination of 0 and 3.

Depending on exact dynamic space size the error message is either "Heap exhausted during garbage collection" or "Heap exhausted during allocation".

Attached is a more featured test program.

-------------------------------------------------------------------

$ uname -a
Linux cs-216 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ sbcl --dynamic-space-size 1024 --eval "(loop :do (gc :full nil))"
This is SBCL 1.5.3, an implementation of ANSI Common Lisp.
[...]
Heap exhausted during garbage collection: 0 bytes available, 16 requested.
Gen Boxed Unboxed LgBox LgUnbox Pin Alloc Waste Trig WP GCs Mem-age
 0 1 0 0 0 1 16 32752 10737434 1 1 0.0000
 1 32044 43 0 0 31928 5761056 1045665760 15987722 32087 1 30506.9178
 2 0 0 0 0 0 0 0 2000000 0 0 0.0000
 3 0 0 0 0 0 0 0 2000000 0 0 0.0000
 4 0 0 0 0 0 0 0 2000000 0 0 0.0000
 5 0 0 0 0 0 0 0 2000000 0 0 0.0000
 6 424 183 52 21 0 21695568 586672 2000000 680 0 0.0000
 7 0 0 0 0 0 0 0 2000000 0 0 0.0000

GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = false
   *STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 19525(tid 0x7ff1024cab80):
Heap exhausted, game over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>

CL-USER> (disassemble (lambda () (loop :do (gc))))
; disassembly for (LAMBDA ())
; Size: 48 bytes. Origin: #x52CB959C ; (LAMBDA ())
; 9C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; A0: 488945F8 MOV [RBP-8], RAX
; A4: 660F1F840000000000 NOP
; AD: 0F1F00 NOP
; B0: L0: 4883EC10 SUB RSP, 16
; B4: 31C9 XOR ECX, ECX
; B6: 48892C24 MOV [RSP], RBP
; BA: 488BEC MOV RBP, RSP
; BD: B8B8E0B252 MOV EAX, #x52B2E0B8 ; #<FDEFN GC>
; C2: FFD0 CALL RAX
; C4: 480F42E3 CMOVB RSP, RBX
; C8: EBE6 JMP L0
; CA: CC0F BREAK 15 ; Invalid argument count trap

CL-USER> *features*
(:QUICKLISP :SB-BSD-SOCKETS-ADDRINFO :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2
 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :64-BIT
 :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :AVX2 :C-STACK-IS-CONTROL-STACK
 :CALL-SYMBOL :COMMON-LISP :COMPACT-INSTANCE-HEADER :COMPARE-AND-SWAP-VOPS
 :CYCLE-COUNTER :ELF :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT
 :IMMOBILE-CODE :IMMOBILE-SPACE :INTEGER-EQL-VOP :LARGEFILE :LINKAGE-TABLE
 :LINUX :LITTLE-ENDIAN :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR
 :OS-PROVIDES-DLOPEN :OS-PROVIDES-GETPROTOBY-R :OS-PROVIDES-POLL
 :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
 :RELOCATABLE-HEAP :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS
 :SB-SIMD-PACK :SB-SIMD-PACK-256 :SB-SOURCE-LOCATIONS :SB-THREAD :SB-UNICODE
 :SBCL :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNDEFINED-FUN-RESTARTS :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP)

-------------------------------------------------------------------

$ uname -a
Linux premium 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux

$ sbcl --dynamic-space-size 64 --eval "(loop :do (gc))"
This is SBCL 1.4.16.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Heap exhausted during allocation: 0 bytes available, 16 requested.
Gen Boxed Unboxed LgBox LgUnbox Pin Alloc Waste Trig WP GCs Mem-age
 0 1 0 0 0 1 16 32752 671104 1 1 0,0000
 1 836 4 0 0 823 518416 27006704 671088 840 0 810,5858
 2 237 76 24 0 67 10723376 319440 6626896 337 1 0,5554
 3 0 0 0 0 0 0 0 2000000 0 0 0,0000
 4 0 0 0 0 0 0 0 2000000 0 0 0,0000
 5 0 0 0 0 0 0 0 2000000 0 0 0,0000
 6 501 258 64 47 0 27637552 870608 2000000 870 0 0,0000
 7 0 0 0 0 0 0 0 2000000 0 0 0,0000
           Total bytes allocated = 38879360
           Dynamic-space-size bytes = 67108864
GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = false
   *STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 11812(tid 0x7fe62c3bd280):
Heap exhausted, game over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> q

CL-USER> *features*
(:QUICKLISP :SB-BSD-SOCKETS-ADDRINFO :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2
 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :64-BIT
 :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :AVX2 :C-STACK-IS-CONTROL-STACK
 :CALL-SYMBOL :COMMON-LISP :COMPACT-INSTANCE-HEADER :COMPARE-AND-SWAP-VOPS
 :CYCLE-COUNTER :ELF :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT
 :IMMOBILE-CODE :IMMOBILE-SPACE :INTEGER-EQL-VOP :LARGEFILE :LINKAGE-TABLE
 :LINUX :LITTLE-ENDIAN :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR
 :OS-PROVIDES-DLOPEN :OS-PROVIDES-GETPROTOBY-R :OS-PROVIDES-POLL
 :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
 :RELOCATABLE-HEAP :SB-CORE-COMPRESSION :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB
 :SB-PACKAGE-LOCKS :SB-SIMD-PACK :SB-SIMD-PACK-256 :SB-SOURCE-LOCATIONS
 :SB-THREAD :SB-UNICODE :SB-XREF-FOR-INTERNALS :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNDEFINED-FUN-RESTARTS :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP)

Tags: gc-gencgc
Revision history for this message
Mikko Markus Torni (mtorni) wrote :

Test program with more features:

sbcl --dynamic-space-size 64 --eval "
(progn
  (locally (declare (optimize (speed 3) (debug 0) (safety 0)) (inline make-array))
    (defconstant gc-stats (define-alien-routine "print_generation_stats" void))
    (defconstant full-gc-interval most-positive-fixnum)
    (defconstant array-size 10900)
    (defun gc-alloc-test ()
      (loop :for i fixnum :upfrom 0 :do
           (progn
             '(funcall gc-stats)
             '(make-array array-size :element-type 'fixnum :initial-element 0)
             (gc :full (zerop (mod i full-gc-interval)))
             '(print i)))))
  (gc-alloc-test))
"

Revision history for this message
Mikko Markus Torni (mtorni) wrote :

On further investigation, it appears that accumulation of waste does not trigger next generation garbage collection.

Thus programs with a high Waste-to-Alloc ratio and regular calls to (gc :full nil) can trigger HEAP-EXHAUSTED-ERROR or crashes to ldb.

I wonder if any of the following statements are true:

1) foreign function calls are creating waste (such as requesting a collection through (gc))
2) the (gc :full nil) implementation is broken or incomplete

Revision history for this message
Douglas Katzman (dougk) wrote :

A size of 64 megabytes shouldn't work, that's scarcely enough to run at all on x86-64.

Is it correct to say that with a 1GB heap, your failure is after a *very* *long* time?
You're not claiming a failure after like 10 GC cycles, right?

It looks to me that the 'pin' count is the problem. I'm able to produce a steady increase of pinned objects in generation 1 with that test loop, but so far it's been running for at least half an hour without crashing. I think it will crash though, I'm just wondering at what point.

Revision history for this message
Mikko Markus Torni (mtorni) wrote :

Even for dynamic space size of 1GiB, it takes less than a minute on my two machines.

The failure happened in 800-2500 GCs for a small dynamic space size of 64MB.

I'll continue tomorrow to test whether the largest succeeding allocation reduce in size as the GCs accumulate waste.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.