The advertised usage of condition-wait may never timeout

Bug #1760827 reported by Siebe de Vos on 2018-04-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

Code using WAIT-ON-GATE with :TIMEOUT did not return in time under situations with lots of GC.

This can be traced back to the use of CONDITION-WAIT according to the documentation. When CONDITION-WAIT returns due to a spurious interrupt, the original timeout is reused in the next iteration. If the period of interrupts is shorter than TIMEOUT, this means that WAIT-ON-GATE may never return.

A simple example, using GC to trigger interrupts:

(require :sb-concurrency)

(defun test-condition-wait (&key (gc-p t) (timeout 1))
  "Expect to be timed out after TIMEOUT. When GC-P is true, generate lots of
interrupts, otherwise do nothing while waiting."
  (let ((gate (sb-concurrency:make-gate))
        (stop nil))
    (flet ((_waiter ()
             ;; We expect to be timed out:
             (sb-concurrency:wait-on-gate gate :timeout timeout)
             (setf stop t)))
      (let ((waiter (sb-thread:make-thread #'_waiter)))
        ;; Do something until stopped by WAITER after TIMEOUT.
        (unwind-protect
            (loop
              (when gc-p (sb-ext:gc))
              (sleep (/ timeout 2))
              (when stop (return)))
          (when (sb-thread:thread-alive-p waiter)
            (sb-thread:terminate-thread waiter)))))))

* (test-condition-wait :gc-p nil)
;; after one second:
NIL

* (test-condition-wait)
;; does not return...

This is SBCL 1.4.5, an implementation of ANSI Common Lisp.
Linux si2l 4.4.92-31-default #1 SMP Sun Oct 22 06:56:24 UTC 2017 (1d80e8a) x86_64 x86_64 x86_64 GNU/Linux

(:64-BIT :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS
 :C-STACK-IS-CONTROL-STACK :CALL-SYMBOL :COMMON-LISP :COMPACT-INSTANCE-HEADER
 :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :ELF :FLOAT-EQL-VOPS
 :FP-AND-PC-STANDARD-SAVE :GCC-TLS :GENCGC :IEEE-FLOATING-POINT :IMMOBILE-CODE
 :IMMOBILE-SPACE :INLINE-CONSTANTS :INTEGER-EQL-VOP :LARGEFILE :LINKAGE-TABLE
 :LINUX :LITTLE-ENDIAN :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN
 :OS-PROVIDES-GETPROTOBY-R :OS-PROVIDES-POLL :OS-PROVIDES-PUTWC
 :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES :RAW-INSTANCE-INIT-VOPS
 :RAW-SIGNED-WORD :RELOCATABLE-HEAP :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB
 :SB-PACKAGE-LOCKS :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-THREAD :SB-UNICODE
 :SBCL :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS :UNBIND-N-VOP
 :UNDEFINED-FUN-RESTARTS :UNIX :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

Siebe de Vos (s.de.vos) wrote :
Siebe de Vos (s.de.vos) wrote :

The patch #1 is naive because it is doing some work already done in %CONDITION-WAIT and because it may not be solving some on all levels. For example, the FIXME in %WAIT-FOR-MUTEX could probably lead to a similar issue.

A serious patch will use the remaining time values computed and returned by %CONDITION-WAIT and lower-level calls.

Probably timeouts have to be treated like deadlines, having both an absolute and relative component. Merging the timeout and deadline concepts might simplify some code.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers