pcas deadlocks in casEventSys
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
New
|
Undecided
|
Unassigned |
Bug Description
We observe a deadlock situation in the pcas server:
The indented lines represent the call stack; 1) 2) are threads
1) Application calls casPV::postEvent();
casPVI:
...
2) server thread runs fileDescriptorM
...
...
Thus, we have the classical case of two threads trying to acquire two locks in opposite order.
Note that this bug has already been experienced and discussed on tech-talk (no launchpad bug report I could find, though):
https:/
https:/
and a "solution" to the particular race condition reported then has been put in place.
This "solution" is, IMHO, but a mere hack which works around one particular scenario.
(another potential race condition is casPVI:
when called from casAsyncReadIOI
The deeper problem is -- again IMHO -- a design flaw in the event processing loop which
holds on to the casEventSys::.mutex while working on the callbacks.
It is not unreasonable (and quite common in other event processing systems I have seen)
for an application to post to an asynchronous facility from a guarded code section
and for callbacks to be synchronized using the same (application) lock:
{ guard( myLock );
POST_
other_
}
and
myCallback()
{ guard( myLock );
do_something();
}
Not possible with pcas.
-> I believe the casEventSys:
- release casEventSys::.mutex while working on the callback
- remove the epicsGuard< evSysMutex > & argument from casEvent::cbFunc()
(this is super ugly anyways. Callback should not have to know about
locking semantics of the event loop)