Memory leak in MAKE-INSTANCE 'STANDARD-CLASS

Bug #1903413 reported by Michał "phoe" Herda
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

SBCL 2.0.10 on Linux.

Discovered by Michael Fiano.

Creating anonymous standard-class instances causes a memory leak even if the instances become immediately unreachable. The below example causes a 60MB increase in dynamic space usage not reclaimable even with a full GC.

----------------

* (sb-ext:gc :full t)
NIL

* (room nil)
Dynamic space usage is: 26,948,112 bytes.
Immobile space usage is: 13,945,920 bytes (26,688 bytes overhead).
Read-only space usage is: 0 bytes.
Static space usage is: 736 bytes.
Control stack usage is: 1,952 bytes.
Binding stack usage is: 640 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

* (dotimes (i 100000) (make-instance 'standard-class))
NIL

* (sb-ext:gc :full t)
NIL

* (room nil)
Dynamic space usage is: 84,518,336 bytes.
Immobile space usage is: 26,741,824 bytes (22,592 bytes overhead).
Read-only space usage is: 0 bytes.
Static space usage is: 736 bytes.
Control stack usage is: 1,952 bytes.
Binding stack usage is: 640 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

One reason for this state of matters is because the STANDARD-CLASSOID for SLOT-OBJECT holds possibly permanent strong references to all of its subclasses.

For instance, in the below example after creating some anonymous classes, the value of SUBCLASSES in that classoid is a non-weak hash table that holds 12157 elements, therefore keeping all the classoids and their layouts live and contributing to the aforementioned memory leak.

---------------------------

CL-USER> (let* ((slot-object-class (find-class 'sb-pcl::slot-object))
                (slot-object-wrapper (slot-value slot-object-class 'sb-pcl::wrapper))
                (slot-object-classoid (slot-value slot-object-wrapper 'sb-pcl::classoid)))
           (swank:inspect-in-emacs slot-object-classoid))

#<SB-KERNEL:STANDARD-CLASSOID {1000052D03}>
--------------------
The object is a STRUCTURE-OBJECT of type SB-KERNEL:STANDARD-CLASSOID.
%BITS: 1387484958
NAME: SB-PCL::SLOT-OBJECT
LAYOUT: #<SB-KERNEL:LAYOUT for SB-PCL::SLOT-OBJECT {50217103}>
STATE: NIL
DIRECT-SUPERCLASSES: NIL
SOURCE-LOCATION: NIL
SUBCLASSES: #<HASH-TABLE :TEST EQ :HASH-FUNCTION #<FUNCTION SB-KERNEL:TYPE-HASH-VALUE> :COUNT 12157 {1000250603}>
PCL-CLASS: #<SB-PCL::SLOT-CLASS SB-PCL::SLOT-OBJECT>
OLD-LAYOUTS: NIL

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

I wondered what would break if we added weakness to the hash tables instantiated in these two places:

https://github.com/sbcl/sbcl/blob/8a2df46b46e3feb201c7f19096018a3f4b1c7d8a/src/code/class.lisp#L290-L292

https://github.com/sbcl/sbcl/blob/532b885acb3da5bf927b600f05d568507075ae3c/src/pcl/std-class.lisp#L186-L188

And built SBCL with with :WEAKNESS :KEY-AND-VALUE in both hash tables.

The change does not seem to break anything, the resulting hash tables have :WEAKNESS :KEY-AND-VALUE, but it does not fix the issue either: the classoids in question are still not purged from the system.

This is likely because the classoids (or references to them) are allocated in static space (at least this is how I interpret the results given to me by SEARCH-ROOTS) and therefore the classoids are not collectable by the GC, which means that simple hash table weakness is not enough to remove them.

Revision history for this message
Douglas Katzman (dougk) wrote :

Two problems:
1. there a billion other things going on aside from the classoid. The anonymous class is a direct-subclass of standard-object. Are classes allowed to remove unreferenced subclasses? (Is this specified behavior? That direct subclasses are weakly referenced?)

* (let ((so (find-class'standard-object))) (print (length (sb-mop:class-direct-subclasses so))) (make-instance 'standard-class) (print (length (sb-mop:class-direct-subclasses so))))
5
6

2. it is virtually impossible to guarantee that something in the MOP has not put your object into a cache of some kind, especially as it is a metaobject. If make-instance on all metaobjects were forced to completely flush all caches in the system, maybe there would be a chance of "fixing" this issue.

Also, generally speaking:
- no new objects ever go into static space, barring some trampolines for alien-lambda
- it's very difficult to get good answers from traceroot where weakness is involved. Pretty much anything it does is going to be wrong. If you assume that weak structures may _not_ participate in any path, then it'll miss some paths. If you assume that they _may_ participate, then it'll act like falsely enlivening the object, not being truly "weak". But if some object that is a-priori alive enlivens a k/v pair in a weak table which makes the path valid, that should be fine. Except that this is beyond the scope of anything I want to solve.

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

Yes, I see now; I remember that this issue was discussed previously, and the MOP does not specify whether unreferenced subclasses are allowed to be removed or whether references are strong.

I know that e.g. CCL does not suffer from this kind of memory leak, while ECL seems to copy SBCL's behavior. But this in turn might be a CCL bug because (make-instance 'standard-class) there does *NOT* seem to push the newly created class into (class-direct-subclasses (find-class 'standard-object)); is this legal?

Regarding the "generally speaking" points - thanks for the explanation, I got slightly confused and a comment from no-defun-allowed on #sbcl helped me:

22:37 < no-defun-allowed> Oh, I remember this.
22:37 < no-defun-allowed> What is happening is that the classes are being linked as subclasses of
                          standard-instance or whatever that class is called.
22:38 < no-defun-allowed> https://gitlab.com/cal-coop/netfarm/netfarm/-/blob/master/Code/Objects/MOP/netfarm-class.lisp#L180 adds methods to not do that for Netfarm classes.
22:39 < phoe> hmmmmmmm
22:40 < no-defun-allowed> So they are live unless you make sure those references aren't made.
22:40 < phoe> yes, I see
22:41 < phoe> standard-object, yes

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

I think that at this point, this issue kinda looks like a request for a MOP extension.

I am currently thinking of establishing and documenting some sort of protocol for classes that want to exhibit the behavior that is wanted in this case: to be creatable en masse and fully garbage-collectable, at the cost of not being listed in the direct subclass list of its superclass and not being able to participate in e.g. MOP:MAP-DEPENDENTS. (Let's name these transient classes for the time being.)

Such a protocol would indirectly require specifying all places that are allowed to hold a strong reference to the class metaobject. Classes that want to be transient can follow this protocol, e.g. by specializing on ADD-DIRECT-SUBCLASS and declining to CALL-NEXT-METHOD, therefore not creating a strong reference from its superclass to itself.

The implementation of this protocol would need to follow into the implementation internals, so e.g. classoids are collectable when the class that they point at becomes unreachable.

Short writeup from my point of view follows. (I'm not a SBCL wizard, so this is going to be painfully incomplete.

Benefits:
* Transient classes don't pollute the image upon being created and collected; therefore, the memory leak is plugged.

Costs:
* All related storages of metametaobjects (e.g. classoids/wrappers case of SBCL) would need to become weak, so they don't get in the way when the strong references to class objects are freed.
* New unit/regression tests.

--------------

> 2. it is virtually impossible to guarantee that something in the MOP has not put your object into a cache of some kind, especially as it is a metaobject. If make-instance on all metaobjects were forced to completely flush all caches in the system, maybe there would be a chance of "fixing" this issue.

I wonder if it should be MAKE-INSTANCE in this case; this seems expensive to flush the caches every time an instance is created.

Maybe there should be a separate function for flushing these caches, e.g. SB-PCL:FLUSH-MOP-CACHES, optionally invokable via the SB-EXT:GC interface by another keyword argument. Applications that would create tons of transient classes are also likely to call GC frequently; at this point, they would be able to also request the MOP caches to be flushed.

This seems feasible because it seems that class creation is usually tied to compilation and/or recompilation in the two concrete use cases I've seen (mfiano's game engine with live code reloading and pve's Smalltalk implementation in Common Lisp). This means that the warm-up costs of regenerating these caches will only need to be paid once after the flushing, and not during runtime of the game engine or generated Smalltalk programs.

Revision history for this message
Douglas Katzman (dougk) wrote :

In addition to this, there's a question of whether DELETE-PACKAGE is allowed to cause all the defstructs and defclasses named by symbols (formerly) in that package to disappear.
Personally I'd like DELETE-PACKAGE to be a way to clean house, but it isn't. There's always the looming possibility of enumerating the descendants of T and finding all the types that have become named by uninterned symbols.

Revision history for this message
Michał "phoe" Herda (phoe-krk) wrote :

I think I could imagine either some sort of dynamic variable that drives the behavior of DELETE-PACKAGE, so we can have:

(let ((sb-ext:*delete-package-nukes-package-contents* t))
  (delete-package "FOO"))

Or, maybe preferably, some separate function, such as SB-EXT:NUKE-PACKAGE, which performs keyword-driven cleanup before doing the proper DELETE-PACKAGE. Something like:

(sb-ext:nuke-package "FOO" :classes t :structures t :functions t :variables t ...)

I think this warrants a separate wishlist ticket though.

Revision history for this message
Douglas Katzman (dougk) wrote :

I have to disagree that nuke-package is useful. The issue of object retention is entirely about the type system and nothing but that. A simple test case proves that deleted packages don't accidentally retain objects, but for the type system's hierarchy:

(make-package "TEST" :use '("CL" "SB-EXT"))
(in-package test)
(defstruct test x)
(defvar *testvar* (cons 1 2))
(defun testfun (x) x)
(deftype testtype () 'integer)

(defvar cl-user::*wps*
  (list (make-weak-pointer '*testvar*) ; name of var
        (make-weak-pointer *testvar*) ; value of the name
        (make-weak-pointer 'testfun) ; name of fun
        (make-weak-pointer #'testfun) ; value of the name
        (make-weak-pointer 'make-test))) ; structure ctor

(in-package cl-user)
(delete-package "TEST")
(gc :full t)
* *wps*
(#<broken weak pointer> #<broken weak pointer> #<broken weak pointer>
 #<broken weak pointer> #<weak pointer: #:MAKE-TEST>)

#:MAKE-TEST was retained only because type named #:TEST was retained, which was retained because it is the name of a classoid; a structure-classoid has a defstruct-description, which points to its constructor function, and so on and so on. Again, it is entirely the fault of the MOP -
* (car(sb-mop::class-direct-subclasses(find-class'structure-object)))
#<STRUCTURE-CLASS #:TEST>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.