Steel Bank Common Lisp

sb-posix:fork doesn't update the list of threads

Reported by Leslie P. Polzer on 2009-10-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

;; forkthread.lisp
(require :sb-posix)

(sb-thread:make-thread (lambda () (sleep 5000)))

(let ((pid (sb-posix:fork)))
  (if (zerop pid)
    (sb-ext:quit)
    (sb-posix:waitpid pid 0)))

% sbcl --load forkthread.lisp
(running SBCL from: /home/sky)
This is SBCL 1.0.31.26, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
; loading system definition from
; /home/sky/projects/lisp/sbcl.git/contrib/sb-grovel/sb-grovel.asd into
; #<PACKAGE "ASDF1">
; registering #<SYSTEM SB-GROVEL {B2AFB41}> as SB-GROVEL
fatal error encountered in SBCL pid 7933(tid 3084954384):
kill_safely: pthread_kill failed with 3

This error occurs because sb-thread::*all-threads* is not updated by sb-posix:fork.

The relevant part of POSIX states:

“A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread [...]”

Proposal: change the behavior of sb-posix:fork to take this into account, resetting the list of threads in *all-threads* to the one active thread.

Gábor Melis (melisgl) wrote :

"A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called."

The "consequently" part basically means that you are extremely limited in what you can do after forking a threaded program. I'm tempted to say that having the wrong value for *all-threads* is the least of your problems.

Leslie P. Polzer (polzer-gnu) wrote :

I generally concur.

But it should be possible to at least quit or dump a core, and for both we need correct thread handling.

The entire idea that anything can be done "correctly" to keep the system running is a bit laughable, but I put together a proof-of-concept for at least keeping gc_stop_the_world() from pitching a fit (tested on 1.0.11 x86-64):

(defun spork ()
  "sb-posix:fork doesn't do even minimal fixup of the thread-tracking
that SBCL requires. SPORK does the most minimal fixup to make GC not
completely choke. If we really wanted to be clever, we could stop the
world using the GC primitives, fork, then re-create the threads
in-place over their old stacks and contexts and have start-the-world
pick up their old register states (saved by stop-the-world)."
  ;; This is, of course, a total hack.
  (sb-sys:without-gcing
      ;; If we are the child, we can't allow -anything- to take the
      ;; all-threads lock. That means no interrupts and no gcing.
      (let ((pid (sb-posix:fork))
     (state-dead (sb-vm:fixnumize 3))) ;; Is a C runtime constant.
 (when (zerop pid)
   ;; We're the child, all other threads are dead. Here's
   ;; where we do some serious (brain) damage:
   (loop
      ;; For each thread the runtime knows about other than the
      ;; current thread, set the state to STATE_DEAD so that
      ;; world stopping doesn't choke on it. We don't bother
      ;; with the whole condition broadcast junk as we're the
      ;; only thread running, thus nobody is waiting on the
      ;; condition.
      for thread = (extern-alien "all_threads" system-area-pointer)
      then (sb-sys:sap-ref-sap thread (* sb-vm::thread-next-slot
      sb-vm:n-word-bytes))
      until (sb-sys:sap= thread (sb-sys:int-sap 0))
      unless (sb-sys:sap= thread (sb-thread::current-thread-sap))
      do (setf (sb-sys:sap-ref-word thread (* sb-vm::thread-state-slot
           sb-vm:n-word-bytes))
        state-dead))
   ;; The presumption at this point is that merely setting the
   ;; thread state to STATE_DEAD is sufficient to keep the
   ;; system running. This is almost certainly wrong.
   )
 pid)))

This is still a bad idea. The only defined-correct operation for a forked thread is exec, possibly preceeded by a pthread_traceme. And if you're going to exec, deport your strings before forking. And put the call to fork and all of the child process code in a without-gcing. And... Well, I'm not convinced that even that is sufficient to prevent anything from going wrong.

Nikodemus Siivola (nikodemus) wrote :

I'm somewhat inclined to make SB-POSIX:FORK refuse to fork if multiple threads a running, actually.

Leslie P. Polzer (polzer-gnu) wrote :

I agree.

Fixed in 1.0.32.35: SBCL now signals an error if an attempt to call SB-POSIX:FORK with multiple Lisp threads running is made.

  status fixcommitted

Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
importance: Undecided → Medium
status: Fix Committed → Fix Released
Nikodemus Siivola (nikodemus) wrote :

Oops, turns out we guard against multiple bugs only on Darwin.

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
status: Fix Released → In Progress
Nikodemus Siivola (nikodemus) wrote :

In 1.0.43.76 (all platforms this time)...

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers