GC hang while running load.impure test

Bug #1268710 reported by Robert Brown on 2014-01-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

Linux 12.04 LTS on 64-bit Intel x86 hardware. The output of "uname -a" is

Linux paradicsom.nyc.corp.google.com 3.2.5-gg1336 #1 SMP Thu Aug 29 02:37:18 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux

I'm running SBCL version 1.1.14.43 compiled from git HEAD. The value of
*features* is

(:ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :ALIEN-CALLBACKS :ANSI-CL
 :ASH-RIGHT-VOPS :C-STACK-IS-CONTROL-STACK :COMMON-LISP :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS
 :CYCLE-COUNTER :ELF :FLOAT-EQL-VOPS :GENCGC :IEEE-FLOATING-POINT :INLINE-CONSTANTS :LARGEFILE
 :LINKAGE-TABLE :LINUX :LITTLE-ENDIAN :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN :OS-PROVIDES-GETPROTOBY-R
 :OS-PROVIDES-POLL :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
 :RAW-INSTANCE-INIT-VOPS :SB-CORE-COMPRESSION :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS
 :SB-SAFEPOINT :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-TEST :SB-THREAD :SB-THRUPTION :SB-UNICODE
 :SB-WTIMER :SB-XREF-FOR-INTERNAL :SBCL :STACK-ALLOCATABLE-CLOSURES
 :STACK-ALLOCATABLE-FIXED-OBJECTS :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNIX :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

I was running all the tests and noticed that sbcl hung while running
the load.impure tests. Here are the thread stacks, which seem to indicate a
hang while running the garbage collector.

(gdb) info threads
  Id Target Id Frame
  4 Thread 0x7ffff6a37700 (LWP 32736) "sbcl" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 0x7ffff65df700 (LWP 32737) "sbcl" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 0x7ffff6187700 (LWP 32738) "sbcl" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 1 Thread 0x7ffff7fd8700 (LWP 32735) "sbcl" syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
(gdb) thread apply all call backtrace_from_fp($rbp, 100)

Thread 4 (Thread 0x7ffff6a37700 (LWP 32736)):
   0: Foreign function gc_stop_the_world, fp = 0x7ffff6a35390, ra = 0x41e4b5
   1: SB-KERNEL::GC-STOP-THE-WORLD
   2: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK KEYWORD::IN SB-KERNEL::SUB-GC)
   3: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-537 KEYWORD::IN SB-THREAD::CALL-WITH-MUTEX)
   4: (COMMON-LISP::FLET WITHOUT-GCING-BODY-61 KEYWORD::IN SB-KERNEL::SUB-GC)
   5: (COMMON-LISP::FLET SB-THREAD::EXEC KEYWORD::IN SB-KERNEL::SUB-GC)
   6: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-54 KEYWORD::IN SB-KERNEL::SUB-GC)
   7: SB-KERNEL::SUB-GC
   8: (SB-C::TL-XEP SB-KERNEL::SUB-GC)
   9: Foreign function call_into_lisp, fp = 0x7ffff6a35990, ra = 0x42ae3f
  10: Foreign function maybe_gc, fp = 0x7ffff6a359c0, ra = 0x411f22
  11: Foreign function interrupt_handle_pending, fp = 0x7ffff6a359f0, ra = 0x41441b
  12: Foreign function (null), fp = 0x7ffff6a35a30, ra = 0x412beb
  13: Foreign function (null), fp = 0x7ffff6a35f28, ra = 0x7ffff79c8cb0
  14: SB-C::ALLOCATE-CODE-OBJECT
  15: SB-FASL::LOAD-CODE
  16: SB-FASL::LOAD-FASL-GROUP
  17: SB-FASL::LOAD-AS-FASL
  18: (COMMON-LISP::FLET SB-FASL::LOAD-STREAM KEYWORD::IN COMMON-LISP::LOAD)
  19: COMMON-LISP::LOAD
  20: (COMMON-LISP::LABELS COMMON-LISP-USER::LOAD-LOOP KEYWORD::IN G695)
  21: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-1117 KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  22: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  23: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-537 KEYWORD::IN SB-THREAD::CALL-WITH-MUTEX)
  24: SB-THREAD::CALL-WITH-MUTEX
  25: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE
  26: Foreign function call_into_lisp, fp = 0x7ffff6a36e30, ra = 0x42ae3f
  27: Foreign function new_thread_trampoline, fp = 0x7ffff6a36ee0, ra = 0x421be3

Thread 3 (Thread 0x7ffff65df700 (LWP 32737)):
   0: Foreign function thread_in_lisp_raised, fp = 0x7ffff65ddbd0, ra = 0x41ca8e
   1: Foreign function handle_trap, fp = 0x7ffff65ddbf0, ra = 0x41577d
   2: Foreign function (null), fp = 0x7ffff65ddc30, ra = 0x412beb
   3: Foreign function (null), fp = 0x7ffff65de088, ra = 0x7ffff79c8cb0
   4: Foreign function post_signal_tramp, fp = 0x7ffff65de110, ra = 0x42b020
   5: SB-IMPL::OUTPUT-TO-C-STRING/ASCII
   6: SB-UNIX::UNIX-STAT
   7: SB-IMPL::QUERY-FILE-SYSTEM
   8: COMMON-LISP::PROBE-FILE
   9: COMMON-LISP::LOAD
  10: (COMMON-LISP::LABELS COMMON-LISP-USER::LOAD-LOOP KEYWORD::IN G695)
  11: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-1117 KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  12: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  13: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-537 KEYWORD::IN SB-THREAD::CALL-WITH-MUTEX)
  14: SB-THREAD::CALL-WITH-MUTEX
  15: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE
  16: Foreign function call_into_lisp, fp = 0x7ffff65dee30, ra = 0x42ae3f
  17: Foreign function new_thread_trampoline, fp = 0x7ffff65deee0, ra = 0x421be3

Thread 2 (Thread 0x7ffff6187700 (LWP 32738)):
   0: Foreign function thread_in_lisp_raised, fp = 0x7ffff61856d0, ra = 0x41b937
   1: Foreign function handle_trap, fp = 0x7ffff61856f0, ra = 0x41577d
   2: Foreign function (null), fp = 0x7ffff6185730, ra = 0x412beb
   3: Foreign function (null), fp = 0x7ffff6185bb8, ra = 0x7ffff79c8cb0
   4: Foreign function post_signal_tramp, fp = 0x7ffff6185c40, ra = 0x42b020
   5: (SB-C::TL-XEP SB-KERNEL::CLASSOID-TYPEP)
   6: SB-PCL::SET-FUN-NAME
   7: (COMMON-LISP::FLET SB-PCL::UPDATE KEYWORD::IN SB-PCL::UPDATE-DFUN)
   8: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-607 KEYWORD::IN SB-THREAD::CALL-WITH-RECURSIVE-SYSTEM-LOCK)
   9: SB-THREAD::CALL-WITH-RECURSIVE-SYSTEM-LOCK
  10: (SB-PCL::FAST-METHOD COMMON-LISP::REINITIALIZE-INSTANCE KEYWORD::AROUND (COMMON-LISP::STANDARD-GENERIC-FUNCTION))
  11: SB-PCL::REAL-ENSURE-GF-USING-CLASS--GENERIC-FUNCTION
  12: SB-PCL::REAL-ADD-NAMED-METHOD
  13: SB-PCL::LOAD-DEFMETHOD-INTERNAL
  14: SB-FASL::FOP-FUNCALL
  15: SB-FASL::LOAD-FASL-GROUP
  16: SB-FASL::LOAD-AS-FASL
  17: (COMMON-LISP::FLET SB-FASL::LOAD-STREAM KEYWORD::IN COMMON-LISP::LOAD)
  18: COMMON-LISP::LOAD
  19: (COMMON-LISP::LABELS COMMON-LISP-USER::LOAD-LOOP KEYWORD::IN G695)
  20: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-1117 KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  21: (COMMON-LISP::FLET SB-THREAD::WITH-MUTEX-THUNK KEYWORD::IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE)
  22: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-537 KEYWORD::IN SB-THREAD::CALL-WITH-MUTEX)
  23: SB-THREAD::CALL-WITH-MUTEX
  24: SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE
  25: Foreign function call_into_lisp, fp = 0x7ffff6186e30, ra = 0x42ae3f
  26: Foreign function new_thread_trampoline, fp = 0x7ffff6186ee0, ra = 0x421be3

Thread 1 (Thread 0x7ffff7fd8700 (LWP 32735)):
   0: SB-THREAD::%%WAIT-FOR-MUTEX
   1: SB-THREAD::%WAIT-FOR-MUTEX
   2: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-1193 KEYWORD::IN SB-THREAD::JOIN-THREAD)
   3: SB-THREAD::JOIN-THREAD
   4: (COMMON-LISP::LAMBDA () KEYWORD::IN "/usr/local/google/home/brown/software/source-trees/sbcl/tests/load.impure.lisp")
   5: SB-INT::SIMPLE-EVAL-IN-LEXENV
   6: SB-INT::SIMPLE-EVAL-IN-LEXENV
   7: SB-INT::SIMPLE-EVAL-IN-LEXENV
   8: SB-INT::SIMPLE-EVAL-IN-LEXENV
   9: SB-EXT::EVAL-TLF
  10: (COMMON-LISP::FLET SB-FASL::EVAL-FORM KEYWORD::IN SB-INT::LOAD-AS-SOURCE)
  11: SB-INT::LOAD-AS-SOURCE
  12: (COMMON-LISP::FLET SB-FASL::LOAD-STREAM KEYWORD::IN COMMON-LISP::LOAD)
  13: COMMON-LISP::LOAD
  14: (COMMON-LISP::LAMBDA ())
  15: SB-INT::SIMPLE-EVAL-IN-LEXENV
  16: COMMON-LISP::EVAL
  17: SB-IMPL::PROCESS-EVAL/LOAD-OPTIONS
  18: SB-IMPL::TOPLEVEL-INIT
  19: (COMMON-LISP::FLET WITHOUT-INTERRUPTS-BODY-60 KEYWORD::IN SB-EXT::SAVE-LISP-AND-DIE)
  20: (COMMON-LISP::LABELS SB-IMPL::RESTART-LISP KEYWORD::IN SB-EXT::SAVE-LISP-AND-DIE)
(gdb)

Robert Brown (robert-brown) wrote :

Here's the value of gc_state

(gdb) p gc_state
$5 = {initialized = 1, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 3,
      __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = '\000' <repeats 12 times>, "\003", '\000' <repeats 26 times>, __align = 0},
  phase_cond = {{__data = {__lock = 0, __futex = 9, __total_seq = 5, __wakeup_seq = 4,
        __woken_seq = 4, __mutex = 0x63a3e8 <gc_state+8>, __nwaiters = 2, __broadcast_seq = 2},
      __size = "\000\000\000\000\t\000\000\000\005\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\350\243c\000\000\000\000\000\002\000\000\000\002\000\000", __align = 38654705664}, {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
        __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
      __size = '\000' <repeats 47 times>, __align = 0}, {__data = {__lock = 0, __futex = 8,
        __total_seq = 4, __wakeup_seq = 4, __woken_seq = 4, __mutex = 0x63a3e8 <gc_state+8>,
        __nwaiters = 0, __broadcast_seq = 2},
      __size = "\000\000\000\000\b\000\000\000\004\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000\350\243c\000\000\000\000\000\000\000\000\000\002\000\000", __align = 34359738368}, {__data = {__lock = 0, __futex = 2, __total_seq = 1, __wakeup_seq = 1,
        __woken_seq = 1, __mutex = 0x63a3e8 <gc_state+8>, __nwaiters = 0, __broadcast_seq = 1},
      __size = "\000\000\000\000\002\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\350\243c\000\000\000\000\000\000\000\000\000\001\000\000", __align = 8589934592}, {__data = {__lock = 0, __futex = 6, __total_seq = 4, __wakeup_seq = 2,
        __woken_seq = 2, __mutex = 0x63a3e8 <gc_state+8>, __nwaiters = 4, __broadcast_seq = 2},
      __size = "\000\000\000\000\006\000\000\000\004\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\350\243c\000\000\000\000\000\004\000\000\000\002\000\000", __align = 25769803776}, {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0,
        __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
      __size = '\000' <repeats 47 times>, __align = 0}, {__data = {__lock = 0, __futex = 0,
        __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0,
        __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, phase_wait = {0,
    0, 0, 1, 2, 0, 0}, master = 0x0, collector = 0x7ffff6c40000, phase = GC_INVOKED}
(gdb)

Commits 229d6eb7478904ad5228ea6a6bd3fa67053e2198, 5338e756b46022ef748b969cfc56f9053fe46b2e, and bcc6a2d83b6cd64322ab63600e2af6db103c5d49 all address various safepoint deadlock scenarios. This particular deadlock scenario is one of them, but due to the lack of a "thread apply all backtrace" to obtain the C backtraces, I can't tell which of two possible deadlocks was hit in this case.

Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers