Race condition in db_get_field_and_count()

Bug #1581212 reported by Till Straumann on 2016-05-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
High
Andrew Johnson
3.14
High
Andrew Johnson
3.15
High
Andrew Johnson
3.16
High
Andrew Johnson

Bug Description

Dehong reported the following bug:

In order to convert between legacy and more 'modern' representation of
data with at least DBR_STATUS auxiliary information

db_access.c: db_get_field_and_count()

calls dbGetField() twice; a first time to retrieve the auxiliary information
(status, timestamps, ...) and a second time to fetch data.

Since the underlying record is not locked across the two calls to dbGetField()
the record could be processed in between and render the auxiliary data
invalid/obsolete.

Proposed fix:

AFAIK, dbScanLock() supports recursion and therefore the easiest
fix would be acquiring the record lock within db_get_field_and_count()

This bug is still present in the 3.14, 3.15 and 3.16 branch heads (as of the time of this filing). The routine (db_get_field_and_count()) has been renamed and relocated in 3.15/16, though.

RFC
- Till

Till Straumann (strauman) wrote :
mdavidsaver (mdavidsaver) wrote :

So much for all our swearing about atomicity...

The fix seems reasonable. Since the record is locked, should also change dbGetField() to dbGet() and avoid unnecessary calls to dbScanLock().

Till Straumann (strauman) wrote :

I looked into that - for the more recent versions, definitely but in 3.14 dbGetField() does more than just lock and dbGet.

mdavidsaver (mdavidsaver) wrote :

Then I shall limit myself to saying that dbChannelGetField() calls should be changed to dbChannelGet().

Andrew Johnson (anj) wrote :

This is a ca_get(), so there is no db_field_log to store any posted data. For a monitor the fact that we unlock and re-lock the record between fetching the value and its attributes should only matter for strings, arrays, and the larger composite data types since nothing else should be able to modify the contents of our db_field_log object (I'm assuming that the monitor event has been pulled off the queue by this point so cached updates can't affect it any more). Thus for the smaller data types atomicity of monitors should always have been preserved.

Thanks for the discussion, I will apply this patch to 3.14. Has anyone already worked out what the correct changes will be for 3.15, since they are different? [Don't do so if you haven't already got them though].

I agree with Michael's latest comment that for 3.15 and later we should change dbChannel_get_count() to call dbChannelGet() instead, I'll make that adjustment as I merge the change up from 3.14.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers