gddScalar::new() operator is not fully thread safe
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
High
|
Jeff Hill |
Bug Description
Dirk Wrote:
From what I can see in the macro definition of gdd_NEWDEL_NEW(gdd) in gddNewDel.h, the gddScalar::new() operator is not fully thread safe.
Initialization of the freelist is not protected. Thus, calling gdd*::new() for the first time in two different threads may crash. Maybe using gdd*::new() the first time while still in single threaded context may cure the problem.
Dirk
Bruno Coudoin wrote:
> Hi,
>
> Tonight I found something odd. I am perhaps doing something wrong but
> after several test the results were consistant.
>
> I have a multithreaded application, at startup each thread creates
> several gdd. It ends up frequently in the following SIGSEGV:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x98a35b90 (LWP 1020)]
> 0x00f2b2b8 in gdd::operator new (size=44) at ../gdd.cc:26
> 26 gdd_NEWDEL_NEW(gdd)
>
> My code to create the gdd is as simple as:
> gdd *pDD;
> pDD = new gddScalar ( gddAppType_value, aitEnumInt32 );
>
> It seems like my program crashes at startup but if it passed the first
> gdd creation of each threads, it becomes stable after that. More
> threads I have, more chance I have to see the crash.
>
> My configuration:
> Epics 3.14.10
> CentOS 5.3
> Multi core processor.
>
> Has anybody ever seen this issue? I'll try to dig further tomorrow, if
> someone has ideas on workarounds or tests to do to refine the issue,
> your welcome.
>
> Bruno.
>
>
>
Original Mantis Bug: mantis-343
http://
I found the issue, in gddNewDel. h:gdd_NEWDEL_ NEW() there is a race condition.
In this method, a first test is made if there are gdd left in the pool. If not it creates 20 more gdds. Then it goes on and uses one gdd of the pool for the new object being created.
What happens is that 2 threads can run while there is a single gdd in the pool. Then none will create gdds at the first stage. At the second stage, the first one will succeed but the second one will crash because there is no gdd left in the pool.
The solution is to move the guard at the start of the method instead of being into the two inner cases.