Access Security segmentation fault if *.acf file is missing

Bug #1694966 reported by Christoph Schröder
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Medium
Andrew Johnson
3.14
Fix Released
Medium
Andrew Johnson
3.15
Fix Released
Medium
Andrew Johnson
3.16
Fix Released
Medium
Andrew Johnson

Bug Description

Hi,

I discovered that a missing *.acf file in combination with iocStats leads to to a segmentation fault. This seems to be caused by a mutex locked by AccessSecurity which is never unlocked. If I remove asSetFilename from the startup script or remove iocStats the segmentation fault does not occur. Our base version is 13.14.12.5. Is this already known or perhaps fixed in 3.14.12.6?

Here is the whole backtrace + startup script call:
(gdb) run
Starting program: /home/schroeder/workspace/darcs/ioc/BII-Controls-3-14-12/bin/linux-x86_64/SIOC45C ../../iocBoot/iocSIOC45C/st.cmd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff40f5700 (LWP 26778)]
[Thread 0x7ffff40f5700 (LWP 26778) exited]
#!../../bin/linux-x86_64/SIOC45C
epicsEnvSet IOC SIOC45C
epicsEnvSet LOG_DIR /opt/IOC/log
# Change to top directory
cd ../..
########################################################### Set up Environment
epicsEnvSet STREAM_PROTOCOL_PATH ./db
epicsEnvSet("EPICS_CA_MAX_ARRAY_BYTES","10000000")
############################################## Register all support components
dbLoadDatabase dbd/SIOC45C.dbd 0 0
SIOC45C_registerRecordDeviceDriver pdbbase
############################################################ Configure drivers
drvAsynIPPortConfigure("PKIK1D1R","192.168.101.147:2101",0,0,0)
[New Thread 0x7ffff7fc2700 (LWP 26780)]
[New Thread 0x7ffff38f4700 (LWP 26781)]
[New Thread 0x7ffff37f3700 (LWP 26782)]
[New Thread 0x7ffff7ec1700 (LWP 26783)]
####################################################### Autosave configuration
set_requestfile_path db
set_requestfile_path /opt/IOC/log/autoSaveRestore
set_savefile_path /opt/IOC/log/autoSaveRestore
save_restoreSet_DatedBackupFiles 1
save_restoreSet_UseStatusPVs 0
######################################################## Load record instances
dbLoadRecords db/DistVersion.db "IOC=SIOC45C,TOP=."
dbLoadRecords db/IOC-stats.db "IOC=SIOC45C"
dbLoadRecords db/FugProbusV.SIOC45C.db
dbLoadRecords "db/asynRecord.db" "P=PKIK1D1R:,R=asyn,PORT=PKIK1D1R,ADDR=-1,OMAX=0,IMAX=0"
######################################################## Restore record values
set_pass0_restoreFile SIOC45C_0.sav
set_pass1_restoreFile SIOC45C_1.sav
########################################################### Configure IOC Core
# IOC Log Server Connection 0=enabled 1=disabled
setIocLogDisable 1
# Configure Access Security
asSetFilename /opt/IOC/AccessSecurity/links/PROD/security.acf
#################################################################### Start IOC
iocInit
[New Thread 0x7ffff36f2700 (LWP 26784)]
Starting iocInit
############################################################################
## EPICS R3.14.12.5 $Date: Tue 2015-03-24 09:57:35 -0500$
## EPICS Base built May 30 2017
############################################################################
[New Thread 0x7ffff3671700 (LWP 26785)]
[New Thread 0x7ffff3570700 (LWP 26786)]
[New Thread 0x7ffff336f700 (LWP 26787)]
[New Thread 0x7ffff316e700 (LWP 26788)]
[New Thread 0x7ffff2f6d700 (LWP 26789)]
[New Thread 0x7ffff2d6c700 (LWP 26790)]
reboot_restore: entry for file 'SIOC45C_0.sav'
reboot_restore: Found filename 'SIOC45C_0.sav' in restoreFileList.
*** restoring from '/opt/IOC/log/autoSaveRestore/SIOC45C_0.sav' at initHookState 6 (before record/device init) ***
reboot_restore: done with file 'SIOC45C_0.sav'

[New Thread 0x7ffff2c6b700 (LWP 26791)]
reboot_restore: entry for file 'SIOC45C_1.sav'
reboot_restore: Found filename 'SIOC45C_1.sav' in restoreFileList.
*** restoring from '/opt/IOC/log/autoSaveRestore/SIOC45C_1.sav' at initHookState 7 (after record/device init) ***
reboot_restore: done with file 'SIOC45C_1.sav'

[New Thread 0x7ffff2b6a700 (LWP 26792)]
[New Thread 0x7ffff2969700 (LWP 26793)]
[New Thread 0x7ffff2768700 (LWP 26794)]
[New Thread 0x7ffff2567700 (LWP 26795)]
[New Thread 0x7ffff2366700 (LWP 26796)]
[New Thread 0x7ffff2165700 (LWP 26797)]
[New Thread 0x7ffff1f64700 (LWP 26798)]
[New Thread 0x7ffff1d63700 (LWP 26799)]
asInitFile: Can't open file '/opt/IOC/AccessSecurity/links/PROD/security.acf'
iocBuild: asInit Failed.
epics>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2d6c700 (LWP 26790)]
epicsMutexLock (pmutexNode=0x0) at ../../../src/libCom/osi/epicsMutex.cpp:143
143 epicsMutexOsdLock(pmutexNode->id);
(gdb) bt
#0 epicsMutexLock (pmutexNode=0x0) at ../../../src/libCom/osi/epicsMutex.cpp:143
#1 0x00007ffff66a0f09 in casStatsFetch (pChanCount=0x7ffff2d6bd20,
    pCircuitCount=0x7ffff2d6bcf0) at ../caservertask.c:955
#2 0x00007ffff73269d7 in scan_time (type=3) at ../devIocStatsAnalog.c:354
#3 0x00007ffff579f7fe in epicsTimerForC::expire (this=<optimized out>)
    at ../../../src/libCom/timer/epicsTimer.cpp:65
#4 0x00007ffff57a113e in timerQueue::process (this=this@entry=0x6d30d0, currentTime=...)
    at ../../../src/libCom/timer/timerQueue.cpp:140
#5 0x00007ffff57a1853 in timerQueueActive::run (this=0x6d30a0)
    at ../../../src/libCom/timer/timerQueueActive.cpp:91
#6 0x00007ffff57942a9 in epicsThreadCallEntryPoint (pPvt=0x6d3170)
    at ../../../src/libCom/osi/epicsThread.cpp:85
#7 0x00007ffff57998f5 in start_routine (arg=0x6d35c0)
    at ../../../src/libCom/osi/os/posix/osdThread.c:401
#8 0x00007ffff497e064 in start_thread (arg=0x7ffff2d6c700) at pthread_create.c:309
#9 0x00007ffff4c7b62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

From a quick look, the issue is that casStatsFetch() can't be called until/unless the CA server (RSRV) has successfully initialized. iocstats is starting it's 1Hz scan timer from a device support init() function, this happens before RSRV is started. So this would also crash if the initialization between these two points takes longer than 1 second.

At a minimum, iocStats should not start this timer until after RSRV has started, if not later. I'd suggest an initHook with initHookAfterCaServerRunning or initHookAfterIocRunning.

https://github.com/epics-modules/iocStats/blob/0cec4fc67a4d4419c21161d889862766a7690daa/devIocStats/devIocStatsAnalog.c#L405

Revision history for this message
mdavidsaver (mdavidsaver) wrote :
Revision history for this message
Andrew Johnson (anj) wrote :

The code in casr attempts to detect when it's being run before rsrv_init() with:

    if ( ! clientQlock ) {
        return;
    }

Now this technique isn't perfectly safe on SMP because there's no memory barrier, and because the rsrv_init() code initializes clientQlock before clientQ, but I think it's worth swapping those two and adding similar protection to casStatsFetch() (and setting zero counts). Not sure if it's worth trying harder than this, any comments?

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> any comments?

I hesitate a bit with a less than guaranteed safe fix. I was only thinking to add "assert(clientQlock)". There are a few similar cases where certain functions can't be called too early, so I'm not sure I see a value in allowing this one.

Practically, I suspect your patch will work well enough, so go ahead. It might be worth making a properly safe version >=3.15 using the read/write barriers in epicsAtomic.h if only to see that these are actually sufficient.

Revision history for this message
Andrew Johnson (anj) wrote :

I can avoid the need for the memory barrier by using a static initializer for the clientQ ELLLIST. The result should be completely SMP-safe if pointer writes are atomic; if they aren't the window of vulnerability is tiny since we only write the clientQlock pointer once, in rsrv_init().

I will commit my modified version and merge up.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.