EPICS Base

Crash due to a too small buffer being provided in dbContextReadNotifyCache

Bug #1440186 reported by Ambroz Bizjak on 2015-04-03

This bug affects 1 person

	Status	Importance	Assigned to
EPICS Base	Fix Released	Medium	mdavidsaver
3.14	Fix Released	Medium	mdavidsaver
3.15	Fix Released	Medium	mdavidsaver
3.16	Fix Released	Medium	mdavidsaver

Bug Description

Hello,

We have been investigating an IOC crash for some time (originally on VxWorks) and we have finally managed to track it down when using a Linux build of the IOC compiled with Address Sanitizer. Here is a partial backtrace which indicates that a buffer provided by the dbContextReadNotifyCache code is being overflowed on write during a PV-get operation from a local Sequencer program.

==18348== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x600c005e7f98 at pc 0x7f567aa42461 bp 0x7f55e22cd9c0 sp 0x7f55e22cd9b0
WRITE of size 8 at 0x600c005e7f98 thread T578
<output omitted>
    #0 0x7f567aa42460 in getDoubleDouble /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbConvert.c:2018
    #1 0x7f567aa355d5 in dbGet /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbAccess.c:1103
    #2 0x7f567aa3733e in dbGetField /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbAccess.c:992
    #3 0x7f567aa6b907 in db_get_field_and_count /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../db_access.c:775
    #4 0x7f567aa6f762 in db_get_field /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../db_access.c:153
    #5 0x7f567aa84f80 in dbContextReadNotifyCache::callReadNotify(epicsGuard<epicsMutex>&, dbAddr&, unsigned int, unsigned long, cacReadNotify&) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x71f80)
    #6 0x7f567aa8191f in dbChannelIO::read(epicsGuard<epicsMutex>&, unsigned int, unsigned long, cacReadNotify&, unsigned int*) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x6e91f)
    #7 0x7f567a377494 in ca_array_get_callback (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libca.so.3.14+0x90494)
    #8 0x7f567bbd5c55 in caVariable::getCallback(pvType, unsigned int, void (*)(void*, pvType, unsigned int, pvValue*, void*, pvStat), void*) (/home/user/epics/modules/sncseq/lib/linux-x86_64/libpv.so+0x7c55)
    #9 0x7f567bdeda19 in seq_pvGet /home/user/epics/modules/sncseq/src/seq/O.linux-x86_64/../seq_if.c:162
<output omitted>
0x600c005e7f98 is located 0 bytes to the right of 56-byte region [0x600c005e7f60,0x600c005e7f98)
allocated by thread T509 here:
    #0 0x7f567dac61d9 (/lib64/libasan.so.0+0x121d9)
    #1 0x7f567aa84f36 in dbContextReadNotifyCache::callReadNotify(epicsGuard<epicsMutex>&, dbAddr&, unsigned int, unsigned long, cacReadNotify&) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x71f36)
<output omitted>
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbConvert.c:2018 getDoubleDouble

We should note that reproducing this bug turned out tricky, because if the IOC passed a certain action without crashing, it would not crash again when the action was repeated. The IOC then needed to be restarted to have any chance of the crash occurring.

Regardless, while reviewing this buffer caching code, we have found a bug. The following is a summary of our understanding of the bug, and we also suggest a fix (attached) which has resolved the crash for us. From what we can tell, the bug has been present for a long time and is currently both in 3.14 and 3.15.

The dbContextReadNotifyCache code manages a cache of buffers used for CA read operations (possibly local). It keeps a singly linked list of free buffers. After an allocated buffer is no longer needed, it is put to this list, where it can be taken from to be reused when another buffer is requested. Along with the list of free buffers, the size of the allocated buffers is tracked (_readNotifyCacheSize). When a buffer is requested larger than the current size, all free buffers in the list are free'd, and the _readNotifyCacheSize is bumped. A new buffer is allocated for this request of this new size, and further allocated buffers will
also be of this same size until the next bumping.

The bug is that when a used buffer is released, it is unconditionally inserted into the free list - even if the size was bumped in between! So when a new allocation request comes in, we may return this old buffer which is shorter than _readNotifyCacheSize. This buffer may consequently be shorter than the just requested size, which generally leads to a buffer overflow, as can be seen in our ASan backtrace.

We've fixed it by keeping the allocation size along with each buffer, and checking in the free function whether this is still equal to _readNotifyCacheSize. Only if it is will the buffer be inserted to the cache (otherwise it will be freed). Our patch also adds some assertions for assurance.