SBCL

With Linux PREEMPT RT kernel condition-wait does not wait

Bug #1876822 reported by Ilya Perminov on 2020-05-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	SBCL	New	Undecided	Unassigned

Bug Description

Something in the PREEMPT RT scheduler triggers a pathological case in condition-wait almost every time.

%condition-wait:
#!+sb-futex
               (with-pinned-objects (queue me)
                 (setf (waitqueue-token queue) me)
                 (release-mutex mutex)
                 ;; Now we go to sleep using futex-wait. If anyone else
                 ;; manages to grab MUTEX and call CONDITION-NOTIFY during
                 ;; this comment, it will change the token, and so futex-wait
                 ;; returns immediately instead of sleeping. Ergo, no lost
                 ;; wakeup. We may get spurious wakeups, but that's ok.
                 (setf status
                       (case (allow-with-interrupts
                               (futex-wait (waitqueue-token-address queue)
                                           (get-lisp-obj-address me)
                                           ;; our way of saying "no
                                           ;; timeout":
                                           (or to-sec -1)
                                           (or to-usec 0)))
                         ((1)
                          ;; 1 = ETIMEDOUT
                          :timeout)
                         (t
                          ;; -1 = EWOULDBLOCK, possibly spurious wakeup
                          ;; 0 = normal wakeup
                          ;; 2 = EINTR, a spurious wakeup
                          :ok))))

The fragment of code above assumes that waitqueue-token is unlikely to be modified between release-mutex and futex-wait. While that is usually true, something in the RT scheduler or its settings breaks this assumption. In a contended case when 100 threads call condition-wait at the same time, none of them ever get to futex-wait with an unmodified token and condition-wait becomes a busy loop. I can't reproduce this problem with a standard Linux kernel event if I use 50000 threads - futex_wait fails in 0.1% of cases and all the threads go to sleep very quickly.

Here is a pathological sequence:
A grabs mutex
A sets token to its id
A releases mutex
B grabs mutex
B sets token to its id
A fails futex_wait (because the current token is from B)
B releases mutex
A grabs mutex
A sets token to its id
B fails futex_wait (because the current token is from A)
and so on

Environment: Linux 3.10.0-957.21.3.rt56.935.el7.x86_64 #1 SMP PREEMPT RT Tue Jun 18 18:11:43 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux