Possible bug in ca_clear_channel

Bug #1406331 reported by Andrew Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Invalid
High
Unassigned

Bug Description

From Benjamin Franksen:

I recently added a few more tests to the sequencer, mostly concerned with the
pvAssign function. This led to mysterious hang-ups: one of the new tests
sometimes hangs right in the middle of a call to pvAssign. Further
investigation revealed that the bug is not in the sequencer, at least not in
any apparent way. When pvAssign is called with a connected channel, it first
disconnects it by calling ca_clear_channel. What I see when instrumenting the
relevant parts of the code with printf statements is that ca_clear_channel
gets called but never returns. This does not happen every time, of course,
it's highly timing dependent. I can reproduce it only about every dozen times
I run the test. It only happens if the channel is on another IOC (but in my
tests the other IOC runs on the same machine, in the background). When it
happens it is always the first call to ca_clear_channel in the program that
hangs. I looked at the code in src/ca/access.cpp and there has been a change
between 3.14.12.3 and 3.14.12.4 in ca_clear_channel and it looks as if this is
a regression because I cannot reproduce the problem with 3.14.12.3, but I can
with 3.14.12.4 (and 3.15, BTW).

The latest version 2.2 snapshot on the sequencer home page (seq-2-2-
snapshot-2014-12-28.tar.gz) contains the test in test/validate/reassign.st. To
reproduce, it is easiest to start the IOC with the database in the background
like this:

ben@sarun[1]: .../seq/branch-2-2 > cd test/validate/O.linux-x86_64
ben@sarun[1]: .../validate/O.linux-x86_64 > ./reassign -S -d ../reassign.db &
[1] 29640
ben@sarun[1]: .../validate/O.linux-x86_64 > Starting iocInit
############################################################################
## EPICS R3.14.12.3 $Date: Mon 2012-12-17 14:11:47 -0600$
## EPICS Base built Dec 28 2014
############################################################################
iocRun: All initialization complete

Then start the real test program as often as it takes. This is how far it gets
when it hangs:

ben@sarun[1]: .../validate/O.linux-x86_64 > ./reassign -S -t
Sequencer release 2.2.0.3, compiled Sun Dec 28 22:40:58 2014
Spawning sequencer program "reassignTest", thread 0x1a77490: "reassignTest"
reassignTest[0]: all channels connected & received 1st monitor
1..30
# start
ok 1 - seq_pvChannelCount(seqg_env) == 3
ok 2 - seq_pvAssignCount(seqg_env) == 2
ok 3 - seq_pvConnectCount(seqg_env) == 2

Tags: ca
Revision history for this message
Andrew Johnson (anj) wrote :

Only one commit could have caused this change, to the 3.14 branch:

------------------------------------------------------------
revno: 12415
committer: Jeff Hill <email address hidden> <email address hidden>
branch nick: trunk
timestamp: Thu 2013-05-16 12:33:31 -0600
message:
  merged in fix for https://bugs.launchpad.net/epics-base/+bug/1179642
  also merged in removal of c++ support for old HPUX compiler
------------------------------------------------------------

http://bazaar.launchpad.net/~epics-core/epics-base/3.14/revision/12415

Unfortunately this was a rather large patch which introduced a new guard to prevent a race condition, although many of the changes in it were unrelated to delete dead code.

Changed in epics-base:
milestone: none → 3.14.branch
Revision history for this message
Andrew Johnson (anj) wrote :

From Ben:

thanks to Michael it is clear now that the bug is in the sequencer, not
in ca_clear_channel. Sorry for the false alarm.

I'll fix this in both the 2.1 and the 2.2 branch and will make releases
for both ASAP.

Changed in epics-base:
status: New → Invalid
Andrew Johnson (anj)
Changed in epics-base:
milestone: 3.14.branch → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.