With Linux PREEMPT RT kernel condition-wait does not wait

Bug #1876822 reported by Ilya Perminov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

Something in the PREEMPT RT scheduler triggers a pathological case in condition-wait almost every time.

%condition-wait:
#!+sb-futex
               (with-pinned-objects (queue me)
                 (setf (waitqueue-token queue) me)
                 (release-mutex mutex)
                 ;; Now we go to sleep using futex-wait. If anyone else
                 ;; manages to grab MUTEX and call CONDITION-NOTIFY during
                 ;; this comment, it will change the token, and so futex-wait
                 ;; returns immediately instead of sleeping. Ergo, no lost
                 ;; wakeup. We may get spurious wakeups, but that's ok.
                 (setf status
                       (case (allow-with-interrupts
                               (futex-wait (waitqueue-token-address queue)
                                           (get-lisp-obj-address me)
                                           ;; our way of saying "no
                                           ;; timeout":
                                           (or to-sec -1)
                                           (or to-usec 0)))
                         ((1)
                          ;; 1 = ETIMEDOUT
                          :timeout)
                         (t
                          ;; -1 = EWOULDBLOCK, possibly spurious wakeup
                          ;; 0 = normal wakeup
                          ;; 2 = EINTR, a spurious wakeup
                          :ok))))

The fragment of code above assumes that waitqueue-token is unlikely to be modified between release-mutex and futex-wait. While that is usually true, something in the RT scheduler or its settings breaks this assumption. In a contended case when 100 threads call condition-wait at the same time, none of them ever get to futex-wait with an unmodified token and condition-wait becomes a busy loop. I can't reproduce this problem with a standard Linux kernel event if I use 50000 threads - futex_wait fails in 0.1% of cases and all the threads go to sleep very quickly.

Here is a pathological sequence:
A grabs mutex
A sets token to its id
A releases mutex
B grabs mutex
B sets token to its id
A fails futex_wait (because the current token is from B)
B releases mutex
A grabs mutex
A sets token to its id
B fails futex_wait (because the current token is from A)
and so on

Environment: Linux 3.10.0-957.21.3.rt56.935.el7.x86_64 #1 SMP PREEMPT RT Tue Jun 18 18:11:43 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Douglas Katzman (dougk) wrote :

we could probably offer an option to use pthread condition vars, basically let "someone else" deal with it.
This would be done by pointing to a malloc()ed thing from the lisp structure, doing a pthread_cond_init() on the C object, and attaching a finalizer which does pthread_cond_destroy(), and translating the lisp functions almost directly into the appropriate foreign call for waiting and notifying.
I wonder if the thing preventing that in the past was that finalizers were so bad for GC that having any at all pretty much trashed your application performance.

Revision history for this message
Douglas Katzman (dougk) wrote :

is this an issue related to the SBCL mutex implementation, or can it be isolated to purely an issue in condition-wait? Because there was a bug in mutexes fixed thusly - https://sourceforge.net/p/sbcl/sbcl/ci/3829777024ab800ddaf20051ad69ac921e03ae8e

Revision history for this message
Ilya Perminov (iperminov) wrote :

I think it is a condition-wait issue - spinning happens on a waitqueue futex.

Revision history for this message
Douglas Katzman (dougk) wrote :

ok, so this is definitely exactly the situation described in https://www.remlab.net/op/futex-condvar.shtml
"If more than one thread goes to sleep in a row, the second one must not change the futex value. Otherwise, the first thread would potentially fail to go to sleep due to the changed futex value. The more threads wait on the same condition variable, the more likely the problem. With enough threads, it could degrade into a live loop."

Revision history for this message
Ilya Perminov (iperminov) wrote :

Yes. To be fair in normal cases probability of hitting this issue is very low.
The sequence counter-based implementation described in the document you referenced is simple and would work fine in SBCL's case. When the counter overflows just start using a new one.
Offloading all the complexity to pthreads would be the best option I think.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.