Memory leak in REPL

Bug #936304 reported by Andrei Dubovik
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
High
Unassigned

Bug Description

I defined two functions (in REPL):

(defun test-gc ()
  (let ((x (make-array 50000000 :initial-element 0)))
    (elt x 0)))

(defun add-one (x) (+ 1 x))

Then, after repeating (add-one (test-gc)) a couple of times, I got a 'Heap exhausted' error. This happend in sbcl-1.0.54, and also in the today's git version (i.e. of 19.02.2012). So, the garbage collector does not free up the memory. Just to make sure I'm not missing something trivial, I also tried in clisp, there was no problem (except the maximum possible size of an array was smaller).

Revision history for this message
Miron Brezuleanu (mbrezu) wrote :
Download full text (3.4 KiB)

I can confirm this happens on a Linux X64 box. Code used to test it:

(defun test-gc ()
  (let ((x (make-array 50000000)))
    (elt x 0)))

(defun add-one (x) (+ 1 x))

(time (handler-case
          (progn (loop repeat 100 do
                      (add-one (test-gc)))
                 (print "PASSED"))
        (sb-kernel::heap-exhausted-error ()
          (print "FAILED"))))

;;; starting git bisect with
;;; good: 823eef4db348c205a6fd1bf1fa7d3c71c9d5f603
;;; bad: cb1734df0f1d211b22071632e7278d8f278b8fe3

;;; compilation failed for step before
;;; 2a9b33fc45842ec0335fef2988e07337ae8f1fc7, so I marked it as bad

Commit that seems to be the problem (this is my first use of git bisect, so take this with a grain of salt):

aadc8391153b7b939462752dfa3f8dec49b2eba1 is the first bad commit
commit aadc8391153b7b939462752dfa3f8dec49b2eba1
Author: Nikodemus Siivola <email address hidden>
Date: Mon Nov 21 16:53:17 2011 +0200

    smaller default dynamic-space size on GENCGC

     512Mb for 32-bit platforms, 1Gb for 64-bit ones.

     (OpenBSD/x86-64 is the exception using 444Mb to fit under default ulimits.)

:100644 100644 b1413abc9a27a023c6151c6cc6360e8089d00761 31377099a17aad92153542e24620bd44a05935f1 M INSTALL
:100644 100644 b9fcd77355dac24339222e9acf554a73936e0eea 5906f33e10b7ad3b3d2b5823acb56278c023393c M NEWS
:040000 040000 26c72c7c9d8c3c195f7110cabce478d77239f7c3 ae5b5a0eef2ddb042362376424a3838fa7263432 M src

Output of git bisect log:

git bisect start
# good: [823eef4db348c205a6fd1bf1fa7d3c71c9d5f603] 1.0: release, will be tagged as sbcl_1_0
git bisect good 823eef4db348c205a6fd1bf1fa7d3c71c9d5f603
# bad: [cb1734df0f1d211b22071632e7278d8f278b8fe3] coreparse.c: Fix a typo WORD_FTMX -> WORD_FMTX.
git bisect bad cb1734df0f1d211b22071632e7278d8f278b8fe3
# good: [ef5fdd6fc577298d1cef8eb97de5f577e30dd642] 1.0.28.12: preserve non-toplevelness of macro subforms
git bisect good ef5fdd6fc577298d1cef8eb97de5f577e30dd642
# good: [ce2a580a469d285e7054ada13ef456e3dad08a34] 1.0.43.63: storing function documentation under names as well
git bisect good ce2a580a469d285e7054ada13ef456e3dad08a34
# good: [7254da92a1ba1bf8bc5a2e78a29d993f272d526e] sb-bsd-sockets: Fix error code handling on Windows
git bisect good 7254da92a1ba1bf8bc5a2e78a29d993f272d526e
# good: [b90e13dea92ee66f06f66baf17c3e3e23c89575f] Make EXPT use double-precision throughout in more cases
git bisect good b90e13dea92ee66f06f66baf17c3e3e23c89575f
# bad: [7df3c11a37f85e474e35af14fb40d9fa62843c79] Fix my previous commit on setf expansions.
git bisect bad 7df3c11a37f85e474e35af14fb40d9fa62843c79
# bad: [4b93cd5bf1e3ced87fe0998f549e4caaab4eed7b] NEWS prettification
git bisect bad 4b93cd5bf1e3ced87fe0998f549e4caaab4eed7b
# bad: [6c0d91d49022c44b0d88bf9a69579b7cddd80971] cleanup: use os_vm_size_t in npage_bytes
git bisect bad 6c0d91d49022c44b0d88bf9a69579b7cddd80971
# bad: [2a9b33fc45842ec0335fef2988e07337ae8f1fc7] Fix errno value for sb-posix rmdir.error.3 test on OpenBSD.
git bisect bad 2a9b33fc45842ec0335fef2988e07337ae8f1fc7
# bad: [aadc8391153b7b939462752dfa3f8dec49b2eba1] smaller default dynamic-space size on GENCGC
git bisect bad aadc8391153b7b939462752dfa3f8dec49b2eba1
# good: [f...

Read more...

Revision history for this message
Miron Brezuleanu (mbrezu) wrote :

What seems to happen is that the garbage collector is not called. If I call (sb-ext:gc) in the loop above, all is well. So it's not a memory leak, but a (weird?) decision to throw an error instead of calling the garbage collector.

The git bisect above used the repository at 'git://repo.or.cz/sbcl'.

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

Something funny going on: successive allocations are going past the GC trigger threshold without GC being run.

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
status: New → In Progress
importance: Undecided → High
Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

Feh. First look lead to misdiagnosis. Opening a GC-LOGFILE shows GCs happening just as expected.

What is happening here is that since the allocated object is larger than BYTES-CONSED-BETWEEN-GCS it tends to get promoted to gen 1 right after allocation, while the reference is still live.

Adding manual GCs to the mix just increases the total number of GCs sufficiently that the promoted objects also get collected before heap runs out.

Thinking about how to best deal with this.

(It's more than a bit tricky right now to allow a failed allocation attempt to run a GC, unfortunately.)

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

So.

If you allocate two objects larger than BYTES-CONSED-BETWEEN-GCs in a row, the second one will be promoted to next generation after allocation. (auto_gc_trigger looks at bytes_allocated, not bytes_allocated+nbytes, but if it did the latter, it would only cause this to happen every time)

As long as those object are small in comparison to heap size you're fine. However, if their size is a reasonable fraction of heap size (over 1/3rd in this example!), a multi-generation GC might not happen soon enough to reclaim them.

Here's a somewhat promising approach which, it seems to me, smoothes this particular pothole in the generational system:

1. Keep track of the size of the largest object allocated since last GC.

2. If, when about to collect the last generation due a collection there isn't yet enough free space to allocate at least two objects of the largest size seen recently, collect one extra generation.

It's feels a bit KLUDGY, yes, but it is working around the misfeature of the current allocation system that can cause promotion of brand new objects before they've even been initialized, so...

Attached patch does this, and a couple of other things -- allowing (loop (add-one (stess-gc))) to run indefinitely in constant space in my tests. Haven't studied other performance implications yet.

commit 7d6281d397ee97bab4d94f0c24b7f4285b7e4c4e
Author: Nikodemus Siivola <email address hidden>
Date: Sat Mar 31 00:56:44 2012 +0300

    gencgc: reclaim space more aggressively

     * When considering auto_gc_trigger, take the number of bytes about to be
       allocated into account.

     * If bytes_consed_between_gcs is more than the amount of free heap, set next
       GC to occur when half of the remaining free heap has been consumed.

     * Keep track of the size of the largest object allocated between two GCs.
       When collecting garbage, if there isn't enough space to allocate at least
       two such objects before collecting the last generation due a collection,
       extend the collection by one extra generation.

       This works around our tendency to immediately promote large objects to
       generation 1, due to auto_gc_trigger causing a GC _after_ the allocation.

     Fixes lp#936304.

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

commit 6b1b11a6c51e1c29aee947f1fde7f91651ca3763
Author: Nikodemus Siivola <email address hidden>
Date: Sat Mar 31 00:56:44 2012 +0300

    gencgc: reclaim space more aggressively

     * When considering auto_gc_trigger, take the number of bytes about to be
       allocated into account.

     * If bytes_consed_between_gcs is more than the amount of free heap, set next
       GC to occur when half of the remaining free heap has been consumed.

     * Keep track of the size of the largest object allocated between two GCs.
       When collecting garbage, if there isn't enough space to allocate at least
       two such objects before collecting the last generation due a collection,
       extend the collection by one extra generation.

       This works around our tendency to immediately promote large objects to
       generation 1, due to auto_gc_trigger causing a GC _after_ the allocation.

     Fixes lp#936304.

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Stas Boukarev (stassats)
Changed in sbcl:
status: Fix Committed → Fix Released
Revision history for this message
Faheem Mitha (faheem-j) wrote :

I tried Andrei's code (original bug report), i.e.

(defun test-gc ()
  (let ((x (make-array 50000000 :initial-element 0)))
    (elt x 0)))

(defun add-one (x) (+ 1 x))

and I get a crash after running (add-one (test-gc)) 4 times. This is on a 32 Debian squeeze machine with SBCL 1.0.57 (Debian binary package).

##################################################

faheem@orwell[default branch:rev 64]:~/lisp$ rlwrap sbcl
This is SBCL 1.0.57.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
(defun test-gc ()
  (let ((x (make-array 50000000 :initial-element 0)))
    (elt x 0)))

TEST-GC
* (defun add-one (x) (+ 1 x))

ADD-ONE
* (add-one (test-gc))

1
* (add-one (test-gc))

1
* (add-one (test-gc))

1
* (add-one (test-gc))
Heap exhausted during allocation: 107806720 bytes available, 200000008 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
   0: 0 0 0 0 1 0 0 0 0 0 4096 5368709 0 0 0.0000
   1: 0 0 0 0 40 3 97658 0 97667 400161176 22120 405529885 31 2 0.0000
   2: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   4: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   6: 0 0 0 0 5696 1247 0 0 0 28438528 0 2000000 5636 0 0.0000
   Total bytes allocated = 428599704
   Dynamic-space-size bytes = 536870912
GC control variables:
   *GC-INHIBIT* = false
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false

debugger invoked on a SB-KERNEL::HEAP-EXHAUSTED-ERROR in thread
#<THREAD "main thread" RUNNING {AB1F7C1}>:
  Heap exhausted (no more space for allocation).
There are still 107806720 bytes available; the request was for 200000008 bytes.

PROCEED WITH CAUTION.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [ABORT] Exit debugger, returning to top level.

(SB-KERNEL::HEAP-EXHAUSTED-ERROR 107806720 200000008)
0]

Revision history for this message
Faheem Mitha (faheem-j) wrote :

I just tested the same code on an AMD64 machine, and could not reproduce the problem. The default dynamic space size on 32 bit seems to be half of that on 64 bit (512 MB vs 1024 MB), but adjusting this to be the same size (1024 MB) makes no difference. It is possible there are significant differences between how GC works on 32 bit vs 64 bit.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.