*world-lock* deadlock issues

Bug #308959 reported by Nikodemus Siivola
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Medium
Nikodemus Siivola

Bug Description

Since running almost any user code can cause the *world-lock* to be grabbed (compiler, clos, type-system), it is not safe to run any user code which may communicate with another thread while holding it.

Worst offenders at the moment are compiler output (via Gray streams), and macro and compiler-macro expansion.

Update-instance-for-foo is also problematic: we cannot really avoid locking around it, but should possibly switch to per-instance locks, or document that users need to lock their own methods.

Changed in sbcl:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Attila Lendvai (attila-lendvai) wrote :

PCL also calls CL:COMPILE for filling its caches, which in turn can block the world.

annoying example:

 - asdf is loading something (so, there's a with-compilation-unit on the stack)

 - a compile error brings up the debugger

 - while investigating/fixing, user tries to use slime fuzzy completion which invokes some generics

 - first call, so cache is empty for the generic, thus PCL tries to grab the lock from a random swank worker (to compile a stub?)

 - slime hangs

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
Changed in sbcl:
status: Confirmed → In Progress
Revision history for this message
Attila Lendvai (attila-lendvai) wrote :

it's nothing urgent on my part, just as a data point that may help: i've updated the sbcl that runs dwim.hu from this (+ small changes):

commit abb03f939ada55bdc1856df5cc48815fd0dff69d
    1.0.55: will be tagged as "sbcl-1.0.55"

to the current head (+ the same small changes):

2b29a7c2b236cfab1d4d06311e84414abba71b4c
Dec 21, 2012

and when i want to start the server it hangs quite early in filling method caches:

2012-12-22T04:51:53.228687+01:00: Starting up server, PID is 11740
debugger invoked on a SB-SYS:INTERACTIVE-INTERRUPT in thread #<THREAD "main thread" RUNNING {100C5710A3}>: Interactive interrupt at #x7FFFF7488BE7.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [CONTINUE ] Return from SB-UNIX:SIGINT.
  1: [SAVE-CORE-AND-DIE] Save image to /tmp/sbcl.core and die
  2: [ABORT ] Give up starting the image and quit the VM process with exit code 2

("bogus stack frame")
0] back

  0: ("bogus stack frame")
  1: (SB-THREAD::%%WAIT-FOR-MUTEX
      #<unavailable argument>
      #<unavailable argument>
      #<unavailable argument>
      #<unavailable argument>
      #<unavailable argument>)
  2: (SB-THREAD::%WAIT-FOR-MUTEX
      #<SB-THREAD:MUTEX "World Lock" owner: #<SB-THREAD:THREAD "main thread" RUNNING {10001EA793}>>
      #<SB-THREAD:THREAD "main thread" RUNNING {100C5710A3}>
      NIL
      NIL
      NIL
      NIL
      NIL
      NIL)
  3: ((FLET #:WITHOUT-INTERRUPTS-BODY-465 :IN SB-THREAD::CALL-WITH-RECURSIVE-LOCK))
  4: (SB-THREAD::CALL-WITH-RECURSIVE-LOCK
      #<CLOSURE (FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK :IN SB-PCL::CHECK-WRAPPER-VALIDITY) {7FFFF6FEF7EB}>
      #<SB-THREAD:MUTEX "World Lock" owner: #<SB-THREAD:THREAD "main thread" RUNNING {10001EA793}>>
      T
      NIL)
  5: (SB-PCL::CHECK-WRAPPER-VALIDITY #<error printing a HU.DWIM.HOME:HOME-SERVER: #<SB-SYS:INTERACTIVE-INTERRUPT {100C892343}>>)
  6: (SB-PCL::CACHE-MISS-VALUES
      #<STANDARD-GENERIC-FUNCTION HU.DWIM.WEB-SERVER::LISTEN-ENTRIES-OF (1)>
      (#<error printing a CONS: #<SB-SYS:INTERACTIVE-INTERRUPT {100C897103}>> SB-PCL::ACCESSOR)
  7: (SB-PCL::INITIAL-DFUN
      #<STANDARD-GENERIC-FUNCTION HU.DWIM.WEB-SERVER::LISTEN-ENTRIES-OF (1)>
      (#<error printing a CONS: #<SB-SYS:INTERACTIVE-INTERRUPT {100C89B683}>>)
[...]

the behavior is very strange, because if i C-c it and ask for a backtrace, then i get some 4 frames, and then it hangs again. then if i C-c is again a couple of times, then i get some more frames. if i repeat, i can get a full backtrace incrementally.

the application is a saved executable core, and it installs signal handlers on C-c like this:

http://dwim.hu/darcsweb/darcsweb.cgi?r=HEAD%20hu.dwim.util;a=headblob;f=/source/production.lisp#l228

Revision history for this message
Faré (fahree) wrote :

In case it helps, you might want to try the various methods Christophe suggests for pre-compilation. See the November 2012 entries of:
http://www.advogato.org/person/crhodes/

And something like that works, I'm game to see what exactly.

Stas Boukarev (stassats)
Changed in sbcl:
status: In Progress → Triaged
Changed in sbcl:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.