FUNCALL-WITH-DEBUG-IO-SYNTAX makes heap closures pointing to dx closures

Bug #1383749 reported by Douglas Katzman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Won't Fix
Undecided
Unassigned

Bug Description

There are references from the dynamic space into the stack of a thread that has undergone postmortem.
A slightly reduced test of 'backtrace' from threads.impure.lisp is able to consistently produce this error.
In fact the address of the word in the heap that contains the bad pointer is always the same, and the value in that word is almost always the same depending on where the OS allocated a thread stack.

This can't be made to fail in the same way with only the main thread, because is_in_stack_space() always sees the main thread stack. But it seems to me that it is always wrong for the heap to point to the stack unless extreme care is taken to clean the object that contains the pointer.

Below is the output of 'dump' from ldb interleaved with my comments delimiting the valid objects.
This was a non-unicode build so that simple-base-string = simple-character-string.

Ptr 0x31ff5fb @ 0x1003f501c8 sees junk.
page 2026 starts at 0x1003f50000, scan_start_ofs=0
0x1003f50000: 0x0000001003f47ff7 | ; a list. car is (simple-fun . simple-fun) and cdr nil
0x1003f50008: 0x0000000020100017 |
0x1003f50010: 0x0000001003f50007 | ; a list.
0x1003f50018: 0x0000000020100017 |
0x1003f50020: 0x0000000000000235 | 5 ; a closure
0x1003f50028: 0x0000001001867158 | Xq ; in #<FUNCTION (LAMBDA () :IN INITIAL-THREAD-FUNCTION-TRAMPOLINE) {100186709B}>
0x1003f50030: 0x00000000031ff950 | P ; fixnum
0x1003f50038: 0x0000000000000000 | ; padding
0x1003f50040: 0x0000000000000759 | Y ; instance, 7 words
0x1003f50048: 0x0000001000073263 | c2 ; a RESTART instance
0x1003f50050: 0x000000100007319f | 1 ; name = ABORT
0x1003f50058: 0x0000001003f5002b | + ; function = ?
0x1003f50060: 0x000000100186709b | p ; report-function = the function shown above
0x1003f50068: 0x0000000020100017 | ; iteractive-function = nil
0x1003f50070: 0x000000100008e58b | ; test-function
0x1003f50078: 0x0000000020100017 | ; coditionss = nil
0x1003f50080: 0x0000001003f50043 | C ; a cons of pointer to the RESTART and nil
0x1003f50088: 0x0000000020100017 |
0x1003f50090: 0x0000001003f50087 | ; a cons of that cons and nil
0x1003f50098: 0x0000000020100017 |
0x1003f500a0: 0x00000000000000e1 | ; (SIMPLE-BASE-STRING 64) header
0x1003f500a8: 0x0000000000000080 |
0x1003f500b0: 0x636172746b636142 | Backtrac ; start of 64 bytes
0x1003f500b8: 0x23203a726f662065 | e for: #
0x1003f500c0: 0x455248542d42533c | <SB-THRE
0x1003f500c8: 0x41455248543a4441 | AD:THREA
0x1003f500d0: 0x4e494e4e55522044 | D RUNNIN
0x1003f500d8: 0x46333030317b2047 | G {1003F
0x1003f500e0: 0x0a3e7d3334304634 | 4F043}>
0x1003f500e8: 0x4d414c2828203a30 | 0: ((LAM ; end of 64 bytes
0x1003f500f0: 0x0000000000000000 | ; null terminator plus padding
0x1003f500f8: 0x0000000000000000 |
0x1003f50100: 0x0000000000001359 | Y ; instance, 19 words
0x1003f50108: 0x0000001000434d73 | sMC ; STRING-OUTPUT-STREAM
0x1003f50110: 0x0000000020100017 | ; in-buffer
0x1003f50118: 0x0000000020100017 | ; cin-buffer
0x1003f50120: 0x0000000000000400 | ; in-index (256)
0x1003f50128: 0x0000001000d9d3fb | ; in = CLOSED-FLAME
0x1003f50130: 0x0000001000d9d3fb | ; bin, etc
0x1003f50138: 0x0000001000d9d3fb | ; n-bin
0x1003f50140: 0x0000001000d9d3fb | ; out
0x1003f50148: 0x0000001000d9d3fb | ; bout
0x1003f50150: 0x0000001000d9d3fb | ; sout
0x1003f50158: 0x0000001000d9d3fb | ; misc
0x1003f50160: 0x0000001003f6a4bf | ; ptr to simple-base-string
0x1003f50168: 0x0000000020100017 | ; prev
0x1003f50170: 0x0000000020100017 | ; next
0x1003f50178: 0x0000000000000000 | ; pointer
0x1003f50180: 0x0000000000000000 | ; index
0x1003f50188: 0x0000000000000000 | ; index-cache
0x1003f50190: 0x000000100009696f | oi ; CHARACTER
0x1003f50198: 0x0000000000000000 | ; (padding)
0x1003f501a0: 0x0000000000000535 | 5 ; closure, 5 words
0x1003f501a8: 0x000000100101da08 | ; in #<FUNCTION (LAMBDA () :IN FUNCALL-WITH-DEBUG-IO-SYNTAX) {100101D9DB}>
0x1003f501b0: 0x000000002010004f | O ; T
0x1003f501b8: 0x0000001000000153 | S ; #<PACKAGE "COMMON-LISP-USER">
0x1003f501c0: 0x0000000020100017 | ; NIL
0x1003f501c8: 0x00000000031ff5fb | ; !! function on thread stack
0x1003f501d0: 0x0000000000000535 | 5 ; closure, 5 words
0x1003f501d8: 0x000000100101dac8 | ; in #<FUNCTION (LAMBDA () :IN FUNCALL-WITH-DEBUG-IO-SYNTAX) {100101DA9B}>
0x1003f501e0: 0x000000002010004f | O ; T
0x1003f501e8: 0x0000001000000153 | S ; #<PACKAGE "COMMON-LISP-USER">
0x1003f501f0: 0x0000000020100017 | ; NIL
0x1003f501f8: 0x00000000031ff5fb | ; !! function on thread stack
0x1003f50200: 0x0000001003f4f043 | C ; cons of a THREAD and nil
0x1003f50208: 0x0000000020100017 |
0x1003f50210: 0x00000000000000e1 | ; SIMPLE-BASE-STRING
0x1003f50218: 0x000000000000001e |
0x1003f50220: 0x636172746b636142 | Backtrac
0x1003f50228: 0x00203a726f662065 | e for:
0x1003f50230: 0x0000001003f5021f |
0x1003f50238: 0x0000001003f50297 |

thread:
0x0000001003f4f043
  name = NIL
  %alive-p = NIL
  %ephemeral-p = NIL
  os-thread = NIL
  interruptions = NIL
  result = (T NIL)

I'm surprise the GC is able to cope with this problem, but I guess it's because this page is all dead objects,
so only the extra sanity checks in verify_space notice the problem. But it makes debugging hell if this problem is actually caused by asking for a backtrace.

Tested on Darwin but also reproducible on not Darwin, though I haven't analyzed the errant page to the degree above.

Revision history for this message
Douglas Katzman (dougk) wrote :

The reduced test performs no concurrent backtraces - just 1 thread at a time, run repeatedly, and looping only 10 times.

 (dotimes (i 50)
    (let* ((threads (loop repeat 1
   collect (sb-thread:make-thread
                                 (lambda ()
                                   (dotimes (i 10)
                                     (with-output-to-string (*debug-io*)
                                     (sb-debug::backtrace 10)
                                     )))))))
    (wait-for-threads threads)))

In schedule_thread_post_mortem, if I remove the os_invalidate(corpse->os_address, THREAD_STRUCT_SIZE) line to see what was at address 0x31ff5fb, it has unfortunately been stomped on. There is no closure-header at the expected address.

0x31ff5f0: 0x0000000000000000 |
0x31ff5f8: 0x0000000020100017 |
0x31ff600: 0x0000000000000008 |
0x31ff608: 0x0000000020100017 |
0x31ff610: 0x0000000000000000 |
0x31ff618: 0x0000000020100017 |
0x31ff620: 0x0000000020100017 |
0x31ff628: 0x00000000031ff678 | x
0x31ff630: 0x00000010007c503f | ?P|
0x31ff638: 0x0000001002949c03 |

Douglas Katzman (dougk)
summary: - "ptr sees junk" error if pre_verify_gen_0 is enabled
+ FUNCALL-WITH-DEBUG-IO-SYNTAX makes heap closures pointing to dx closures
Revision history for this message
Douglas Katzman (dougk) wrote :

I'm changing this to "invalid" because the fault is indeed with the sanity-checker, not the Lisp code.
Arguably the bug could be morphed into "sanity checker is buggy" but that's probably not important to most people.

Changed in sbcl:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.