Crash due to a too small buffer being provided in dbContextReadNotifyCache

Bug #1440186 reported by Ambroz Bizjak
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Medium
mdavidsaver
3.14
Fix Released
Medium
mdavidsaver
3.15
Fix Released
Medium
mdavidsaver
3.16
Fix Released
Medium
mdavidsaver

Bug Description

Hello,

We have been investigating an IOC crash for some time (originally on VxWorks) and we have finally managed to track it down when using a Linux build of the IOC compiled with Address Sanitizer. Here is a partial backtrace which indicates that a buffer provided by the dbContextReadNotifyCache code is being overflowed on write during a PV-get operation from a local Sequencer program.

==18348== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x600c005e7f98 at pc 0x7f567aa42461 bp 0x7f55e22cd9c0 sp 0x7f55e22cd9b0
WRITE of size 8 at 0x600c005e7f98 thread T578
<output omitted>
    #0 0x7f567aa42460 in getDoubleDouble /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbConvert.c:2018
    #1 0x7f567aa355d5 in dbGet /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbAccess.c:1103
    #2 0x7f567aa3733e in dbGetField /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbAccess.c:992
    #3 0x7f567aa6b907 in db_get_field_and_count /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../db_access.c:775
    #4 0x7f567aa6f762 in db_get_field /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../db_access.c:153
    #5 0x7f567aa84f80 in dbContextReadNotifyCache::callReadNotify(epicsGuard<epicsMutex>&, dbAddr&, unsigned int, unsigned long, cacReadNotify&) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x71f80)
    #6 0x7f567aa8191f in dbChannelIO::read(epicsGuard<epicsMutex>&, unsigned int, unsigned long, cacReadNotify&, unsigned int*) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x6e91f)
    #7 0x7f567a377494 in ca_array_get_callback (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libca.so.3.14+0x90494)
    #8 0x7f567bbd5c55 in caVariable::getCallback(pvType, unsigned int, void (*)(void*, pvType, unsigned int, pvValue*, void*, pvStat), void*) (/home/user/epics/modules/sncseq/lib/linux-x86_64/libpv.so+0x7c55)
    #9 0x7f567bdeda19 in seq_pvGet /home/user/epics/modules/sncseq/src/seq/O.linux-x86_64/../seq_if.c:162
<output omitted>
0x600c005e7f98 is located 0 bytes to the right of 56-byte region [0x600c005e7f60,0x600c005e7f98)
allocated by thread T509 here:
    #0 0x7f567dac61d9 (/lib64/libasan.so.0+0x121d9)
    #1 0x7f567aa84f36 in dbContextReadNotifyCache::callReadNotify(epicsGuard<epicsMutex>&, dbAddr&, unsigned int, unsigned long, cacReadNotify&) (/home/user/epics/base/base-3-14-12/lib/linux-x86_64/libdbIoc.so.3.14+0x71f36)
<output omitted>
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/user/epics/base/base-3-14-12/src/db/O.linux-x86_64/../dbConvert.c:2018 getDoubleDouble

We should note that reproducing this bug turned out tricky, because if the IOC passed a certain action without crashing, it would not crash again when the action was repeated. The IOC then needed to be restarted to have any chance of the crash occurring.

Regardless, while reviewing this buffer caching code, we have found a bug. The following is a summary of our understanding of the bug, and we also suggest a fix (attached) which has resolved the crash for us. From what we can tell, the bug has been present for a long time and is currently both in 3.14 and 3.15.

The dbContextReadNotifyCache code manages a cache of buffers used for CA read operations (possibly local). It keeps a singly linked list of free buffers. After an allocated buffer is no longer needed, it is put to this list, where it can be taken from to be reused when another buffer is requested. Along with the list of free buffers, the size of the allocated buffers is tracked (_readNotifyCacheSize). When a buffer is requested larger than the current size, all free buffers in the list are free'd, and the _readNotifyCacheSize is bumped. A new buffer is allocated for this request of this new size, and further allocated buffers will
also be of this same size until the next bumping.

The bug is that when a used buffer is released, it is unconditionally inserted into the free list - even if the size was bumped in between! So when a new allocation request comes in, we may return this old buffer which is shorter than _readNotifyCacheSize. This buffer may consequently be shorter than the just requested size, which generally leads to a buffer overflow, as can be seen in our ASan backtrace.

We've fixed it by keeping the allocation size along with each buffer, and checking in the free function whether this is still equal to _readNotifyCacheSize. Only if it is will the buffer be inserted to the cache (otherwise it will be freed). Our patch also adds some assertions for assurance.

Revision history for this message
Ambroz Bizjak (ambroz-bizjak) wrote :
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I agree with your analysis.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I haven't been able to replicate the crash. It's a race condition, so I'm likely not getting the timing "right". I'm not too worried by this.

I couldn't find any authoritative statement about the alignment guarantees of "new char[]", so I'm attaching a modified patch. It replaces new[] and reinterpret_cast<>() with C-style allocation/casting. It also removes the "magic".

I'm also wondering about bounding the length of this free list, or for that matter replacing it with std::vector<char*>. However, these changes are too big for a 3.14.x patch.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :
Revision history for this message
Andrew Johnson (anj) wrote :

Could we make the struct cacheElem_t contain the size_t followed by a union of pNext and buf, since we never need both at once. This saves the extra storage and padding for the buf member, hmm, but it would slightly complicate the calloc calculation which needs to be sizeof(cacheElem_t) - sizeof(void*) + _readNotifyCacheSize.

Revision history for this message
Ambroz Bizjak (ambroz-bizjak) wrote :

Regarding the new char alignment, it seems like it is guaranteed: http://stackoverflow.com/a/10589019 . There is a reference to the standard text and I was able to find this in C++98 and C++11.

I agree with Andrew about making a union to save a few bytes per allocation. But, we should be careful not to allocate too little (when _readNotifytCacheSize is lesser than the size of pNext). I suggest this:

struct cacheElem_t {
    size_t size;
    union {
        struct cacheElem_t * pNext;
        char buf[1];
    } u;
};

Allocate: max(sizeof(cacheElem_t), offsetof(cacheElem_t, u) + _readNotifyCacheSize)

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

After convincing myself that my changes haven't broken anything I've committed the modified version of the patch to the 3.14 branch.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Launchpad strikes again... I forgot to subscribe to notification emails, and didn't refresh the webpage until just now. So I didn't see comments #5 and #6 before committing my patch.

While a union would save sizeof(void*) for each allocation, I don't like this practice as it will allow write-after-free bugs to corrupt the linked list, probably leading to a segfault during a future allocation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.