Comment 23 for bug 1982608

Revision history for this message
Andrew Berkley (ajberkley) wrote :

The places where set-slot-old-p returns true and seems wrong are something of the following sort:

(defun test-it ()
  (let ((p (cons nil nil)))
    (values
     (lambda (blarg) (setf (cdr p) blarg))
     (lambda () p))))

My understanding, which I am writing just for my benefit to get you to correct it if it is wrong is: The first function (a 'setter') erroneously does not emit a storage barrier / mark the page p lives on when updating (cdr p) which means whatever blarg is will not be kept alive because p will not be scanned next gc. gc verification catches this early because p, on a still write protected page, points to the younger blarg and won't be scanned to keep it alive. The crash occurs when gc comes along later and deletes blarg and then someone triggers the gc to scan p and it wanders off into the deleted land pointed at by (cdr p). In our code, most of the time these are all short lived objects and so it doesn't crash all the time (they all live in the nursery happily).

Paste the following into a repl to trigger this. It's probably not minimal but it's late here.

(defun test-it ()
  (let ((p (cons nil nil)))
    (values
     (lambda (blarg) (setf (cdr p) blarg))
     (lambda () p))))

(progn (setf (extern-alien "verify_gens" char) 0)
       (setf (extern-alien "pre_verify_gen_0" int) 1)
       (setf (extern-alien "gencgc_verbose" char) 1))

(multiple-value-bind (setter getter)
    (test-it)
  (defparameter *setter* setter)
  (defparameter *getter* getter))
(gc :full t)
(progn (funcall *setter* (list 1 2 3)) nil)
(gc :full t)