(gc :full t) has no effect under certain circumstances

Bug #1009267 reported by Faheem Mitha
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

Hi,

$ sbcl --version
SBCL 1.0.57.0.debian

apt-cache policy sbcl
sbcl:
  Installed: 2:1.0.57.0-1
  Candidate: 2:1.0.57.0-1
  Version table:
 *** 2:1.0.57.0-1 0
        100 /var/lib/dpkg/status

This is backported from the unstable sources, which for some reason
didn't build on the Debian servers, so currently there isn't a version
of this in unstable.

CL-USER> *FEATURES*
(CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86 CFFI-FEATURES:UNIX :CFFI
 CFFI-SYS::FLAT-NAMESPACE :SWANK :SB-BSD-SOCKETS-ADDRINFO :ASDF2 :ASDF
 :ASDF-UNICODE :X86 :UNIX :ELF :LINUX :LARGEFILE :GENCGC
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK
 :COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP :RAW-INSTANCE-INIT-VOPS
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-VECTORS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-FIXED-OBJECTS :ALIEN-CALLBACKS
 :CYCLE-COUNTER :INLINE-CONSTANTS :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :LINKAGE-TABLE :LITTLE-ENDIAN :OS-PROVIDES-DLOPEN :OS-PROVIDES-DLADDR
 :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :OS-PROVIDES-GETPROTOBY-R
 :OS-PROVIDES-POLL :IEEE-FLOATING-POINT :SB-SOURCE-LOCATIONS :SB-EVAL
 :SB-UNICODE :SB-PACKAGE-LOCKS :SB-LDB :SB-TEST :SB-DOC :SBCL :COMMON-LISP
 :ANSI-CL :SB-CORE-COMPRESSION :SB-THREAD :SB-FUTEX)
CL-USER>

$ uname -a
Linux orwell 2.6.32-5-vserver-686-bigmem #1 SMP Mon Mar 26 09:36:57 UTC 2012 i686 GNU/Linux

I hope the SBCL developers consider this is a bug, because I find this
quite worrying behavior.

I had earlier posted about the issue in the SBCL users mailing list;
see

http://thread.gmane.org/gmane.lisp.steel-bank.general/3543/focus=3556

I found this apparently related SO question

http://stackoverflow.com/questions/9342871/memory-leak-in-sbcls-repl

and the corresponding SBCL bug report.

https://bugs.launchpad.net/sbcl/+bug/936304

So, consider the code below. If I run

(gibbs 10000 1000)

a few times, it quickly gets to the point where

(gc :full t)

doesn't have any effect. From the documentation, and from the above
question (and other places) it seems like this command should remove
all the garbage, and return the dynamic space usage of the system to a
state similar to newly started system.

In my case, I could do this with two calls. Here is the breakdown.

Starting SBCL in SLIME.

CL-USER> (room)
Dynamic space usage is: 52,418,208 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  13,761,968 bytes for 312,044 instance objects.
  11,280,888 bytes for 14,802 code objects.
   8,596,016 bytes for 1,074,502 cons objects.
   5,949,728 bytes for 108,096 simple-vector objects.
   3,182,624 bytes for 23,271 simple-character-string objects.
   9,940,888 bytes for 384,486 other objects.
  52,712,112 bytes for 1,917,201 dynamic objects (space total.)

CL-USER> (time (gibbs 10000 1000))
Evaluation took:
  11.412 seconds of real time
  11.132697 seconds of total run time (10.708670 user, 0.424027 system)
  [ Run times consist of 0.260 seconds GC time, and 10.873 seconds non-GC time. ]
  97.56% CPU
  22,931,017,211 processor cycles
  960,000,232 bytes consed
NIL

CL-USER> (room)
Dynamic space usage is: 205,911,504 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  160,000,024 bytes for 3 simple-array-double-float objects.
  11,920,480 bytes for 15,835 code objects.
  34,437,464 bytes for 1,584,980 other objects.
  206,357,968 bytes for 1,600,818 dynamic objects (space total.)
; No value
CL-USER> (gc :full t)
NIL
CL-USER> (room)
Dynamic space usage is: 44,923,472 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  11,920,288 bytes for 15,833 code objects.
   9,031,992 bytes for 204,878 instance objects.
   7,645,000 bytes for 955,625 cons objects.
   5,258,768 bytes for 84,320 simple-vector objects.
   2,717,856 bytes for 22,062 simple-character-string objects.
   2,472,504 bytes for 18,261 simple-array-unsigned-byte-32 objects.
   6,097,944 bytes for 232,791 other objects.
  45,144,352 bytes for 1,533,770 dynamic objects (space total.)
; No value

So, the GC collector can deal with one run. Now two runs.

CL-USER> (time (gibbs 10000 1000))
Evaluation took:
  10.556 seconds of real time
  10.372648 seconds of total run time (10.076630 user, 0.296018 system)
  [ Run times consist of 0.092 seconds GC time, and 10.281 seconds non-GC time. ]
  98.27% CPU
  21,210,966,230 processor cycles
  959,997,968 bytes consed

NIL
CL-USER> (time (gibbs 10000 1000))
Evaluation took:
  10.632 seconds of real time
  10.612663 seconds of total run time (10.292643 user, 0.320020 system)
  [ Run times consist of 0.100 seconds GC time, and 10.513 seconds non-GC time. ]
  99.82% CPU
  21,364,331,941 processor cycles
  959,997,968 bytes consed
NIL
CL-USER> (room)
Dynamic space usage is: 333,458,656 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  240,000,032 bytes for 4 simple-array-double-float objects.
  48,337,280 bytes for 3,021,080 double-float objects.
  45,948,400 bytes for 1,617,867 other objects.
  334,285,712 bytes for 4,638,951 dynamic objects (space total.)
; No value
CL-USER> (gc :full t)
NIL

CL-USER> (room)
Dynamic space usage is: 204,782,320 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  160,000,024 bytes for 3 simple-array-double-float objects.
  11,920,288 bytes for 15,833 code objects.
  33,304,480 bytes for 1,539,249 other objects.
  205,224,792 bytes for 1,555,085 dynamic objects (space total.)
; No value
CL-USER> (gc :full t)
NIL
CL-USER> (room)
Dynamic space usage is: 124,779,480 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  80,000,016 bytes for 2 simple-array-double-float objects.
  11,920,288 bytes for 15,833 code objects.
   9,025,952 bytes for 204,590 instance objects.
   7,621,296 bytes for 952,662 cons objects.
  16,432,264 bytes for 355,744 other objects.
  124,999,816 bytes for 1,528,831 dynamic objects (space total.)
; No value

CL-USER> (gc :full t)
NIL

CL-USER> (room)
Dynamic space usage is: 124,792,128 bytes.
Read-only space usage is: 3,512 bytes.
Static space usage is: 2,224 bytes.
Control stack usage is: 5,928 bytes.
Binding stack usage is: 536 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Breakdown for dynamic space:
  80,000,016 bytes for 2 simple-array-double-float objects.
  11,920,288 bytes for 15,833 code objects.
   9,026,480 bytes for 204,601 instance objects.
   7,622,000 bytes for 952,750 cons objects.
  16,648,616 bytes for 381,389 other objects.
  125,217,400 bytes for 1,554,575 dynamic objects (space total.)
; No value

Running (gc :full t) more times has no effect on those 80 million bytes.

Note, I have the following in my .emacs

(setq slime-lisp-implementations '(("sbcl" ("sbcl" "--dynamic-space-size" "1024")) ("clisp" ("clisp")) ("ecl" ("ecl")) ("cmucl" ("cmucl"))))

So I am running SBCL with dynamic space size set to 1024Mb.

Running gibbs more times causes the gc to get stuck at higher levels
of garbage, but basically it's the same situation. Though I hardly
know anything about CL or garbage collection, this doesn't look good
to me.

                                                                                                          Regards, Faheem

###############################################

(eval-when (:compile-toplevel :load-toplevel :execute)
  (require :cl-rmath) (setf *read-default-float-format* 'double-float))

(declaim (inline cl-rmath:rnorm cl-rmath:rgamma))

(cffi:defcfun ("rgamma" rgamma) :double
  (arg0 :double)
  (arg1 :double))

(cffi:defcfun ("rnorm" rnorm) :double
  (arg0 :double)
  (arg1 :double))

(defun gibbs (N thin)
  (declare ((integer 0 #.most-positive-fixnum) N thin))
  (declare (optimize (speed 3) (safety 1)))
  (declare (optimize debug))
  (let ((x 0.0) (y 0.0)
        (xvec (make-array 10000000 :element-type '(double-float 0.0 *) :adjustable nil :fill-pointer nil :displaced-to nil))
        (yvec (make-array 10000000 :element-type 'double-float :adjustable nil :fill-pointer nil :displaced-to nil)))
    (declare (double-float x y))
    (declare (type (simple-array (double-float 0.0 *) (10000000)) xvec))
    (declare (type (simple-array double-float (10000000)) yvec))
    (dotimes (i N)
      (dotimes (j thin)
        (symbol-macrolet ((xval (aref xvec (+ (the fixnum (* i thin)) j)))
                          (yval (aref yvec (+ (the fixnum (* i thin)) j))))
          (setf xval (cl-rmath::rgamma 3.0 (/ 1 (+ (* yval yval) 4))))
          (setf yval (cl-rmath::rnorm (/ 1 (+ xval 1)) (/ 1 (sqrt (+ (* 2 xval) 2)))))
          (setf x xval)
          (setf y yval))))))

(defun main (argv)
  (declare (ignore argv))
  (gibbs 10000 1000))

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

I don't have R installed here, so I'm using dummy definitions for RNORM and RGAMMA. (Note also that your CFFI definitions are in the wrong package -- unless you're missing an assumed IN-PACKAGE :CL-RMATH there.)

  (declaim (inline rnorm rgamma))
  (defun rnorm (x y)
    (declare (double-float x y))
    (* x y))
  (defun rgamma (x y)
    (declare (double-float x y))
    (+ x y))

Given these, and running

  (loop repeat 100
        do (gibbs 10000 1000)
           (room)
           (write-line "---"))

  (write-line "--- After full GC ---")

  (gc :full t)

  (room)

I could not reproduce your symptoms so far.

Can you reproduce your symptoms without using CL-RMATH? If so, please provide a full terminal (not Slime) session transcript.

Changed in sbcl:
status: New → Incomplete
Revision history for this message
Faheem Mitha (faheem-j) wrote :

Hi Nikodemus,

Thank you for taking the time to look at this.

Update. I can reproduce the problem using essentially your code (see nik.lisp attached) on my 386 Debian installation, but cannot reproduce it on an amd64 Debian machine. Both machines are running Debian squeeze, and both use sbcl 1.0.57 (Debian version 2:1.0.57.0-1). On amd64, I did manage to get sbcl to give a heap error if I ran (main 50000), but even so, (gc :full t) was always able to clean up.

See attached code file nik.lisp and 386 home machine transcript nik-out.txt.

                                                                                                                      Regards, Faheem

Home machine:

uname -a
Linux orwell 2.6.32-5-vserver-686-bigmem #1 SMP Mon Mar 26 09:36:57 UTC 2012 i686 GNU/Linux

Revision history for this message
Faheem Mitha (faheem-j) wrote :
Revision history for this message
Faheem Mitha (faheem-j) wrote :
Revision history for this message
Faheem Mitha (faheem-j) wrote :

In the transcript posted, sbcl was called as

rlwrap sbcl

Revision history for this message
Faheem Mitha (faheem-j) wrote :

For 386 above replace with x86 or 32 bit according to preference.

Revision history for this message
Faheem Mitha (faheem-j) wrote :

H4ns on #lisp cannot reproduce with 32 bit Ubuntu ubuntu 11.10, kernel 3.0.0-12 with sbcl from quicklisp?

"i could run the example and gc, and i always ended up with the same 40 mb heap that i started off with.

Revision history for this message
Faheem Mitha (faheem-j) wrote :

Further updates:

I tried this for two other versions of sbcl, the Debian package of 1.0.56, which is currently in testing/unstable, as well as building from the source of the current git repos. Both give similar results to what I've already seen. Here is output of the latter sbcl version.

faheem@orwell[default branch:rev 64]:~/lisp$ rlwrap sh /usr/local/src/sbcl/sbcl/run-sbcl.sh
(running SBCL from: /usr/local/src/sbcl/sbcl)
This is SBCL 1.0.57.48-5cf2958, an implementation of ANSI Common Lisp.

Revision history for this message
Faheem Mitha (faheem-j) wrote :

Further update. I remembered I have a laptop, 32 bit, also running Debian squeeze, with the 2.6.32 kernel. I get the same behavior with it running the Debian 1.0.56 package. I guess I could try other versions of sbcl, eg from the git repos. What is the probability the kernel might have something to do with this? I just installed 3.2 on my main machine, so I can try again with that.

Revision history for this message
Faheem Mitha (faheem-j) wrote :

I got an independent confirmation from a friend. He is running Ubuntu 12.04 in 32-bit mode.

$ sbcl --version
SBCL 1.0.55.0.debian

$ uname -a
Linux xs35 3.2.0-24-generic-pae #39-Ubuntu SMP Mon May 21 18:54:21 UTC 2012 i686 i686 i386 GNU/Linux

He gets very similar behavior to what I do, including the heap exhaustion crash after a few cycles, and the

 80,000,016 bytes for         2 simple-array-double-float objects.

not going away with

(gc :full t)

I think at this point there seems to be a real possibility that something is going on here. Do you want more data points?

Revision history for this message
Faheem Mitha (faheem-j) wrote :

Also, I tested my laptop with the SBCL version from git, and got the same behavior. So this problem does not seem sensitive to SBCL versions.

Revision history for this message
Faheem Mitha (faheem-j) wrote :

I've now also reported this to Debian; see http://bugs.debian.org/676979

This report includes a reproduction recipe.

Changed in sbcl:
status: Incomplete → Confirmed
Revision history for this message
Devon Sean McCullough (k-launchpad-jovi-net) wrote :

I replicated your bug recipe on an Intel Core 2 Duo
running MacOS 10.6.8 SBCL 1.0.58,
it ran (test-gc) many times flawlessly
but as soon as I repeated the recipe,
re-compiling the code, re-defining the defuns,
it failed on the very next try.

Transcript at http://csail.mit.edu/~devon/sbcl-bug-1009267-wall.html

Revision history for this message
Faheem Mitha (faheem-j) wrote : Re: [Bug 1009267] Re: (gc :full t) has no effect under certain circumstances

On Wed, 2 Jan 2013, Devon Sean McCullough wrote:

> I replicated your bug recipe on an Intel Core 2 Duo running MacOS 10.6.8
> SBCL 1.0.58, it ran (test-gc) many times flawlessly but as soon as I
> repeated the recipe, re-compiling the code, re-defining the defuns, it
> failed on the very next try.

> Transcript at http://csail.mit.edu/~devon/sbcl-bug-1009267-wall.html

Hi Devon,

Thanks for checking this, but this code corresponds to the bug report
https://bugs.launchpad.net/sbcl/+bug/936304.

The code for testing LP #1009267 is different, and #1009267 may not be the
same as LP #936304.

There is a reproduction recipe at http://bugs.debian.org/676979 for LP
#1009267. You could add a transcript to LP #1009267.

The most significant issue is that (gc :full t) breaks, as mentioned in
the title of the bug. If you could try and reproduce this, it would be
helpful.

The transcript you have already provided could reasonably be posted to LP
#936304. This bug is currently closed.

                                                           Regards, Faheem

Revision history for this message
Paul Khuong (pvk) wrote :

I have replicated both the test case in Devon McCullough's transcript and the debian recipe in older (pre 1.1.10) x86 builds. The smaller initial stack frames on x86oids since 1.1.12 seems to have fixed both, at least with 2 GB heaps. In the end, the fundamental conservativeness issues are still the same; the only difference is that smaller stack frames reduce the odds of really bad circumstances.

Changed in sbcl:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.