Steel Bank Common Lisp

rare failure of :two-threads-running-gc test in threads.impure.lisp

Reported by 3b on 2012-09-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

:two-threads-running-gc failed once out of a few hundred runs of threads.impure.lisp while trying to isolate something else.

following test case fails more reliably, usually within a few minutes (runs for around an hour here when it doesn't fail), starting at 1.0.56.55-f0da2f6 ('redesign exiting SBCL') on x8664 linux

(let (a-done
      (b-done t)
      done)
  (setf *debugger-hook*
        (lambda (&rest r)
          (format t "debugged~%") (finish-output)
          (setf done t) ))
  (sb-thread:make-thread (lambda ()
                           (loop while (not done)
                                 repeat 10000
                                 do (dotimes (i 50)
                                      (sb-ext:gc)
                                      (princ "\\") (finish-output)))
                           (setf a-done t)))
  (loop
    until a-done
    when b-done
      do (princ "|") (finish-output)
         (setf b-done nil)
         (sb-thread:make-thread (lambda ()
                                  (dotimes (i 2)
                                    (sb-ext:gc :full t)
                                    (princ "/") (finish-output))
                                  (setf b-done t)))))

3b (00003b) wrote :

stack trace from original failure in test suite:

::: Running (:TWO-THREADS-RUNNING-GC)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\CORRUPTION WARNING in SBCL pid 26293(tid 140737322415872):
Memory fault at 400 (pc=0x400, sp=0x7ffff61bee70)
The integrity of this image is possibly compromised.
Continuing with fingers crossed.
\\\\\\\\\\\\\\\\\\unhandled SB-SYS:MEMORY-FAULT-ERROR in thread #<SB-THREAD:THREAD
                                                "main thread" RUNNING
\ {10029DD473}>:
\\\\\\\\\\ Unhandled memory fault at #x400.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\0: (SB-DEBUG::MAP-BACKTRACE
    #<CLOSURE (LAMBDA # :IN BACKTRACE) {1002F8001B}>
    :START
    0
    :COUNT
    128)
1: (BACKTRACE 128 #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDERR* {100016C0C3}>)
2: (SB-DEBUG::DEBUGGER-DISABLED-HOOK
    #<SB-SYS:MEMORY-FAULT-ERROR {1002D28003}>
    #<unavailable argument>)
3: (SB-DEBUG::RUN-HOOK
    *INVOKE-DEBUGGER-HOOK*
    #<SB-SYS:MEMORY-FAULT-ERROR {1002D28003}>)
4: (INVOKE-DEBUGGER #<SB-SYS:MEMORY-FAULT-ERROR {1002D28003}>)
5: (ERROR SB-SYS:MEMORY-FAULT-ERROR :ADDRESS 1024)
6: (SB-SYS:MEMORY-FAULT-ERROR)
7: ("foreign function: call_into_lisp")
8: ("foreign function: post_signal_tramp")

unhandled condition in --disable-debugger mode, quitting
Argh! error within --disable-debugger error handling

stack traces from 1.0.56.55-f0da2f6 running test case at repl:
\\\\\\\\/\/\\|\\\\\\CORRUPTION WARNING in SBCL pid 4659(tid 140737315469056):
Memory fault at 1002 (pc=0x10033b0029, sp=0x7ffff5b1ee70)
The integrity of this image is possibly compromised.
Continuing with fingers crossed.
\\\\\\\\debugged\debugged\
\\\\\\
\\\\\debugger invoked on a SB-SYS:MEMORY-FAULT-ERROR in thread
\#<THREAD "main thread" RUNNING {1002971873}>:
\\\ Unhandled memory fault at #x1002.
\
\\Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

\(no restarts: If you didn't do this on purpose, please report it as a bug.)
\\\\\\
\\\\\\\\\\\\\\\\(SB-SYS:MEMORY-FAULT-ERROR)\\\\\
0] \\\\\ba

0: (SB-SYS:MEMORY-FAULT-ERROR)
1: ("foreign function: call_into_lisp")
2: ("foreign function: post_signal_tramp")
3: ("foreign function: #x10033B0029")

\CORRUPTION WARNING in SBCL pid 14004(tid 140737324545792):
Memory fault at 0 (pc=0x20100017, sp=0x7ffff63c6e70)
The integrity of this image is possibly compromised.
Continuing with fingers crossed.
\\debugged
debugged
\\\
debugger invoked on a SB-SYS:MEMORY-FAULT-ERROR in thread
#<THREAD "main thread" RUNNING {1002971873}>:
  Unhandled memory fault at #x0.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

(no restarts: If you didn't do this on purpose, please report it as a bug.)

(SB-SYS:MEMORY-FAULT-ERROR)
0] ba

0: (SB-SYS:MEMORY-FAULT-ERROR)
1: ("foreign function: call_into_lisp")
2: ("foreign function: post_signal_tramp")
3: ("foreign function: #x20100017")

Please try the attached patch, which fixes the problem for me.

Beware: f0da2f6 introduced two different ways of exiting the scope of that form, one RETURN-FROM for the regular case, and a GO for the error case. I've only fixed (or let's say, changed) the abnormal return. Doesn't it stand to reason that the error case is also buggy then? Or does that not matter? How can we test it?

Lacking other insights, I'll push the attached patch for SBCL 1.1 and then do a beautification of the error case early in 1.1.1, so that we can test the other half of the change for a month.

Nevermind the above patch, which fixed more problems than it solved.
Attached version #2 of the patch.

Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers