Steel Bank Common Lisp

WITHOUT-INTERRUPTS+CONDITION-WAIT failure on FreeBSD

Reported by Thomas Bakketun on 2009-09-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

I ran the test with SBCL 1.0.30 on FreeBSD 7.2 and it hang on WITHOUT-INTERRUPTS+CONDITION-WAIT.

uname -a says:
FreeBSD nellis.copyleft.no 7.2-RELEASE-p3 FreeBSD 7.2-RELEASE-p3 #0: Mon Sep 14 18:28:18 CEST 2009

Output from SBCL:

// Running /usr/ports/lang/sbcl/work/sbcl-1.0.30/tests/threads.pure.lisp
::: Running MUTEX-OWNER
::: Success MUTEX-OWNER
::: Running SPINLOCK-OWNER
::: Success SPINLOCK-OWNER
::: Running WITHOUT-INTERRUPTS+CONDITION-WAIT

At this point the SBCL process hangs. ps aux reports this:

USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 44940 0.0 1.9 1072668 47272 pn I+J 2:18PM 0:19.37 /usr/ports/lang/sbcl/work/sbcl-1.0.30/tests/../src/runtime/sbcl

After killing with "kill -9" SBCL reports:

Killed
test failed, expected 104 return code, got 137
*** Error code 1

Nikodemus Siivola (nikodemus) wrote :

It would be a great help if you can follow the steps 3-6 from "1.3.2 Signal Related Bugs" at http://www.sbcl.org/manual/Reporting-Bugs.html#Reporting-Bugs

When I do "kill -ABRT" when SBCL hangs it just quits, like so:

fatal error encountered in SBCL pid 88933(tid 673190496):
SIGABRT received.

test failed, expected 104 return code, got 1
*** Error code 1

I rerun the test, and tried attach attach gdb to hanging SBCL, but gdb seems to be broken somehow on the server. I will try to get that fixed.

root#nellis [/usr/ports/lang/sbcl] gdb -p 90879
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".
Attaching to process 90879
/usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1443: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

/usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1443: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n

Now I got a working gdb.

root#nellis [/usr/ports/lang/sbcl] gdb66 /usr/ports/lang/sbcl/work/sbcl-1.0.30/tests/../src/runtime/sbcl -p 51296
GNU gdb 6.6 [GDB v6.6 for FreeBSD]
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-portbld-freebsd7.2"...
(no debugging symbols found)
Attaching to program: /usr/ports/lang/sbcl/work/sbcl-1.0.30/src/runtime/sbcl, process 51296
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
0x280aeaa9 in ?? ()
   from /lib/libthr.so.3
(gdb) thread apply all ba
(gdb)

Unfortunately no backtrace is printed on the standard output of SBCL.

Nikodemus Siivola (nikodemus) wrote :

Is this still current?

Excuse, I'm not familiar to English.

It seems like this.
1. GET-FOREGROUND wait on *SESSION*, just before WITHOUT-INTERRUPTS+CONDITION-WAIT test.
2. SBCL sends signal to TERMINATE-THREAD which is waiting on *SESSION*
3. SBCL transfers control to cleanup-form of UNWIND-PROTECT by signal
4. In cleanup-form SBCL calls RELEASE-MUTEX, but owner of mutex does not restored at this point
    # FreeBSD does not restore mutex's state for condvar when calling signal handler, it does in user land
5. After above, FreeBSD restores owner of mutex(maybe, thread termination sequence.)
6. WITHOUT-INTERRUPTS+CONDITION-WAIT acquires mutex of *SESSION*, but this mutex still owned by terminated thread.

Nikodemus Siivola (nikodemus) wrote :

Hiroyuki Komatsu's explanation makes sense to me, so marking this as Triaged despite not having a FreeBSD box to investigate this on.

Changed in sbcl:
importance: Undecided → Medium
status: New → Triaged
tags: added: os-freebsd threads
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers