The attached testcase seems to avoid the bug for me so far.
It includes an additional unwind-protect in the :ok mutex-aquisition case, and calling ensure-wakeup if necessary.
In addition, it uses :wait-p t in the mutex-acquisition during ensure-wakeup. Without that, I could still see occasional hangs like these:
("T2" :OK :SUCCESS NIL :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T) ("T2" :OK-DID-NOT-GET-MUTEX :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T) CWAIT1 = 7, CWAIT2 = 7, CWAIT3 = 7, CWAIT4 = 7, CLEANUP1 = 7, OK1 = 7, OK1A = 1, OK1B = 1, OK2 = 6, INTERRUPTED = 0, WAKEUP = 0, OTHER = 0, CLEANUP2 = 6
("T2" :INTERRUPTED :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T) ("T2" :INTERRUPTED-DID-NOT-GET-MUTEX :MUTEX-STATE 0 :OWNER NIL :ALIVE-P NIL) CWAIT1 = 7, CWAIT2 = 7, CWAIT3 = 7, CWAIT4 = 6, CLEANUP1 = 7, OK1 = 6, OK1A = 0, OK1B = 0, OK2 = 6, INTERRUPTED = 1, WAKEUP = 0, OTHER = 0, CLEANUP2 = 7
The attached testcase seems to avoid the bug for me so far.
It includes an additional unwind-protect in the :ok mutex-aquisition case,
and calling ensure-wakeup if necessary.
In addition, it uses :wait-p t in the mutex-acquisition during ensure-wakeup.
Without that, I could still see occasional hangs like these:
("T2" :OK :SUCCESS NIL :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T) NOT-GET- MUTEX :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T)
("T2" :OK-DID-
CWAIT1 = 7, CWAIT2 = 7, CWAIT3 = 7, CWAIT4 = 7, CLEANUP1 = 7, OK1 = 7, OK1A = 1, OK1B = 1, OK2 = 6, INTERRUPTED = 0, WAKEUP = 0, OTHER = 0, CLEANUP2 = 6
("T2" :INTERRUPTED :MUTEX-STATE 1 :OWNER "T1" :ALIVE-P T) DID-NOT- GET-MUTEX :MUTEX-STATE 0 :OWNER NIL :ALIVE-P NIL)
("T2" :INTERRUPTED-
CWAIT1 = 7, CWAIT2 = 7, CWAIT3 = 7, CWAIT4 = 6, CLEANUP1 = 7, OK1 = 6, OK1A = 0, OK1B = 0, OK2 = 6, INTERRUPTED = 1, WAKEUP = 0, OTHER = 0, CLEANUP2 = 7