"Problem forcing cache flushes"

Bug #1272742 reported by 3b on 2014-01-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

    Problem forcing cache flushes. Please report to sbcl-devel.

        Owrapper: (#<SB-PCL::WRAPPER #<STANDARD-CLASS X> {10053B81C3}> . T)

        Wrapper-of: (#<SB-PCL::WRAPPER #<STANDARD-CLASS X> {10053B81C3}> . T)

        Class-wrapper: (#<SB-PCL::WRAPPER #<STANDARD-CLASS X> {10057C3503}>)

  This is probably a bug in SBCL itself. (Alternatively, SBCL might have been
  corrupted by bad user code, e.g. by an undefined Lisp operation like
  (FMAKUNBOUND 'COMPILE), or by stray pointers from alien code or from unsafe
  Lisp code; or there might be a bug in the OS or hardware that SBCL is running
  on.) If it seems to be a bug in SBCL itself, the maintainers would like to
  know about it. Bug reports are welcome on the SBCL mailing lists, which you
  can find at <http://sbcl.sourceforge.net/>.

Backtrace for: #<SB-THREAD:THREAD RUNNING {10053B6933}>
0: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK :IN SB-PCL::CHECK-WRAPPER-VALIDITY))
1: ((FLET #:WITHOUT-INTERRUPTS-BODY-566 :IN SB-THREAD::CALL-WITH-RECURSIVE-LOCK))
2: (SB-THREAD::CALL-WITH-RECURSIVE-LOCK #<CLOSURE (FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK :IN SB-PCL::CHECK-WRAPPER-VALIDITY) {7FFFF59D65DB}> #<SB-THREAD:MUTEX "World Lock" owner: #<SB-THREAD:THREAD RUNNING {10053B6933}>> T NIL)
3: (SB-PCL::CHECK-WRAPPER-VALIDITY #<error printing object>)
4: (SB-KERNEL:CLASSOID-TYPEP #<unavailable argument> #<unavailable argument> #<unavailable argument>)
5: (TEST)

reduced test case from stassats:

(defclass x () ())
(defvar *instance* (make-instance 'x))

(defun redefine-class ()
  (defclass x ()
    (a))
  (defclass x ()
    ()))

(sb-thread:make-thread
 (lambda ()
   (loop repeat 20000 do (redefine-class))))

(defun test ()
  (typep *instance* 'x))

(loop repeat 10
      do (sb-thread:make-thread
          (lambda () (loop repeat 100000000000
                           do (test)))))

also sometimes gets

  failed AVER: (< I 2)
   This is probably a bug in SBCL itself. (Alternatively, SBCL might have been
   corrupted by bad user code, e.g. by an undefined Lisp operation like
   (FMAKUNBOUND 'COMPILE), or by stray pointers from alien code or from unsafe
   Lisp code; or there might be a bug in the OS or hardware that SBCL is
   running on.) If it seems to be a bug in SBCL itself, the maintainers would
   like to know about it. Bug reports are welcome on the SBCL mailing lists,
   which you can find at <http://sbcl.sourceforge.net/>.

0: (SB-INT:BUG "~@<failed AVER: ~2I~_~A~:>" (< SB-KERNEL::I 2))
1: (SB-IMPL::%FAILED-AVER (< SB-KERNEL::I 2))
2: (SB-KERNEL:CLASSOID-TYPEP #<unavailable argument> #<unavailable argument> #<unavailable argument>)
3: (TEST)

3b (00003b) wrote :

similar test cases have also errored with

Index 5550 out of bounds for (SIMPLE-ARRAY (UNSIGNED-BYTE 64)
                              (4096)), should be nonnegative and <4096.
   [Condition of type SB-INT:INVALID-ARRAY-INDEX-ERROR]

Restarts:
 0: [ABORT] Abort thread (#<THREAD RUNNING {10041B62E3}>)

Backtrace:
  0: ((FLET #:BODY-FUN-834 :IN SB-KERNEL:%PUTHASH))
  1: (SB-KERNEL:%PUTHASH #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}> #<HASH-TABLE :TEST EQL :COUNT 10559 {1000311F83}> ((:FLUSH #1=#<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}>) (:FLUSH #1#) ..
      Locals:
        SB-DEBUG::ARG-0 = #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}>
        SB-DEBUG::ARG-1 = #<HASH-TABLE :TEST EQL :COUNT 10559 {1000311F83}>
        SB-DEBUG::ARG-2 = ((:FLUSH #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}>) (:FLUSH #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}>) (:FLUSH #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185EC163}>) ..)
  2: (SB-KERNEL::%ENSURE-CLASSOID-VALID #<SB-KERNEL:STANDARD-CLASSOID F> #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185864C3}> "typep")
      Locals:
        SB-DEBUG::ARG-0 = #<SB-KERNEL:STANDARD-CLASSOID F>
        SB-DEBUG::ARG-1 = #<SB-PCL::WRAPPER #<STANDARD-CLASS F> {10185864C3}>
        SB-DEBUG::ARG-2 = "typep"
  3: (SB-KERNEL:CLASSOID-TYPEP #<unavailable argument> #<unavailable argument> #<unavailable argument>)
      [No Locals]

where most of the backtrace seems to have been optimized away by TCO, but probably looks something like
typep -> %typep -> %%typep -> classoid-typep -> %ensure-classoid-valid -> %force-cache-flushes -> %invalidate-wrapper -> (setf (gethash nwrapper *previous-nwrappers*))
with *previous-nwrappers* being the hash table in the error

Stas Boukarev (stassats) wrote :

Grabbing the world-lock in %force-cache-flushes fixes both issues, will commit after the freeze.

Changed in sbcl:
status: New → Triaged
3b (00003b) wrote :

not-quite-as-reliable test case that doesn't rely on redefining class being tested, and which triggered all 3 of the above errors:

(defclass a () ())
(defclass b (a) ())
(defclass c (a) ())
(defclass d (b c) ())
(defclass e (a) ())
(defclass f (e b) ())

(defun spam ()
  (declare (notinline typep))
  (loop count (typep :uint32 'd) into a
        count (typep :uint32 'f) into b
        until *stop*
        finally (return (list a b))))

(loop repeat 8 do (sb-thread:make-thread #'spam))

then while that is running, load a bunch of .fasl files
for example (asdf:load-systems 'iolib 'cl-glut-examples 'hunchentoot 'drakma)
(systems picked mostly at random, just looking for things with lots of files/dependencies)

not sure if any specific features of the .fasl files matter, or if the class hierarchy of the classes. They were shaped that way to be similar to a case that happened in real cffi code while loading iolib.

3b (00003b) wrote :

After trying it again, I haven't been able to get an error from the test case in comment #3

(require 'asdf)
(asdf:load-system 'cffi)
(defparameter *stop* nil)

(defun spam ()
  (loop count (cffi:foreign-type-size :uint32)
        until *stop*))
(loop repeat 8 do (sb-thread:make-thread #'spam))

which #3 was based on seems to be crashing more reliably while loading lots of .fasl files, but not as sure the class isn't being redefined

Stas Boukarev (stassats) wrote :

In 1d730f458618cec59c95d13f2972b5db06e51bbd.

Changed in sbcl:
status: Triaged → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers