Steel Bank Common Lisp

find-package deadlock

Reported by Stas Boukarev on 2010-01-22
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
SBCL
High
Nikodemus Siivola

Bug Description

I couldn't distill an independent test-case, I encountered this only in Slime.
After entering 'cl::foo in Slime's REPL everything hangs.
This is because find-package now has with-packages around its code.

A commentary in call-with-packages says

;; FIXME: Since name conflicts can be signalled while holding the
;; mutex, user code can be run leading to lock ordering problems.

Perhaps this is related.

nixeagle (nixeagle) wrote :

I ran into this today too. Note that unwinding the stack and re signaling gives desired behavior.

This hangs:
(progn (make-package :foobar) (make-package :foobar))

The following does the right behavior:
(progn (make-package :foobar) (handler-case (make-package :foobar) (error (condition) (error condition))))

You can replicate this bug in raw lisp by going to the command prompt, typing `sbcl --no-userinit' and typing this form.
(progn (make-package :foobar) (sb-thread:make-thread (lambda () (make-package :foobar))))

It will hang, you will forced to kill sbcl.

I talked to the original reporter in #lisp earlier, who clued me on to this, thanks very much for taking the time to discuss this with me as otherwise I'd still be hacking at slime, thinking the bug was there.

Attila Lendvai (attila-lendvai) wrote :

originally i've pointed out on sbcl10 that the package hashtable is not protected. Nikodemus added a guard around it which is causing these problems.

i'm running sbcl with :sb-hash-table-debug which signals a cerror if there is a concurrent access to a hashtable. when restarting the inferior lisp from slime and load-op'ing code there's a bit of concurrency with slime trying to look up argslists in the repl buffer (or something similar, i remember only vaguely). in these situations i often (once a day or some such) saw the concurrent access error being triggered from the package hashtable.

Attila Lendvai (attila-lendvai) wrote :

i've spent some time coming up with a fix for this.

the basic idea is to catch, unwind, and resignal errors coming from inside with-packages.

the patch works if it is C-c C-c'd in the right order, but bootstrapping unfortunately fails with something beyond me:

//entering make-target-2.sh
//doing warm init - compilation phase
This is SBCL 1.0.36.8, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
internal error #26 (An attempt was made to use an undefined SYMBOL-VALUE.)
    SC: 21, Offset: 0 $1= 0x10000513cf: other pointer
fatal error encountered in SBCL pid 9740(tid 140737353987824):
internal error too early in init, can't recover

Welcome to LDB, a low-level debugger for the Lisp runtime environment.

Attila Lendvai (attila-lendvai) wrote :

with nyef's help i've managed to make the patch bootstrap.

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
Changed in sbcl:
status: New → Confirmed
importance: Undecided → High
Nikodemus Siivola (nikodemus) wrote :

Unless I'm missing something, this doesn't quite cut it as it breaks any chance of offering restarts for package-related errors in several cases. See for example the CERROR in EXPORT.

Right now it seems to me that the cleanest way to handle this is to cook up enough lockless data structures that neither (SETF FIND-PACKAGE) nor INTERN need to grab any locks -- a single global lock is still needed to protect inter-package relationships.

Tobias C. Rittweiler (tcr) wrote :

Maybe anyone has time to cook up an independent test case
for this?

On Tue, 06 Apr 2010, Tobias C. Rittweiler wrote:

> Maybe anyone has time to cook up an independent test case
> for this?

These 11 lines are enough to lock up my SBCL 1.0.37 when run within
SLIME (I hope it is the same bug).

http://paste.lisp.org/display/97375

When run in a plain vanilla terminal, I get the usual restarts and it
works fine.

Tamas

Nikodemus Siivola (nikodemus) wrote :

Unless something unforeseen happens, I'm committing at least a partial fix to this today or tomorrow.

Tobias C. Rittweiler (tcr) wrote :

Tamas Papp <email address hidden> writes:

> On Tue, 06 Apr 2010, Tobias C. Rittweiler wrote:
>
>> Maybe anyone has time to cook up an independent test case
>> for this?
>
> These 11 lines are enough to lock up my SBCL 1.0.37 when run within
> SLIME (I hope it is the same bug).
>
> http://paste.lisp.org/display/97375
>
> When run in a plain vanilla terminal, I get the usual restarts and it
> works fine.
>
> Tamas

Yes, the point is to get a test case without Slime that can be added to
SBCL's test suite.

  -T.

--
Diese Nachricht wurde auf Viren und andere gefaerliche Inhalte untersucht
und ist - aktuelle Virenscanner vorausgesetzt - sauber.
Freebits E-Mail Virus Scanner

Nikodemus Siivola (nikodemus) wrote :

In SBCL 1.0.37.44. (The fix is not quite complete: interning new symbols can still be blocked indefinitely by another thread, but at least existing symbols can be read, etc.)

Changed in sbcl:
status: Confirmed → Fix Committed
nixeagle (nixeagle) wrote :

In reply to #6 and #8 In my comments in #1 I mentioned this that does not depend on slime at all. I don't know how sbcl's testcases are done but someone should be able to format this into something useful.

You can replicate this bug in raw lisp by going to the command prompt, typing `sbcl --no-userinit' and typing this form.
(progn (make-package :foobar) (sb-thread:make-thread (lambda () (make-package :foobar))))

The general theme is do it in a different thread then the repl prompt and you will cause a lock, that is what slime is doing when you trigger it in slime. For example in slime all you have to do to trigger the above is do (progn (make-package :barfoo) (make-package :barfoo)).

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers