new JCA channel doesnt connect

Bug #649469 reported by Jeff Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Invalid
Undecided
Unassigned

Bug Description

Mike Oothoudt here has found an issue with the JNI JCA where new channels do not always connect. The occurence rate is low, but appears but is repeatable with high repetition rates. This does not sound like a known issue as typically there are only complaints about slower connects/reconnects to IOCs that have been down for some time iff beacons from said IOC are not seen by the client. That is certainly a behavior that could be improved, but for the purposes of bug hunting it's important to look at one issue at a time.

When debugging, I thought that I saw the clear signature of Mantis 240. See below.

The bad news today is that Mike thinks he has forced JCA to use R3.14.9 but it does not appear to help.

Mike reported that he has seen this with other clients that don’t use JCA. I asked our operators about it and it's hard to make the distinction between the different signatures mentioned above.

I am interested in feedback from users. Do you see rare situations where a _newly-created_ channel doesn't connect?

Here is the message I sent to Mike.

------------------snip-snip---------------------

I dug a bit deeper in the debugger and discovered the
cause of your issue. Here are a few of the details.

1) You are experiencing an issue that was fixed in EPICS base R3.14.9
2) Your Java process actually somehow has two CA client libraries loaded
Object file /ade/epics/supTop/base/R3.14.8.2/lib/linux-x86/libca.so
Object file /export/home/hill/epics/R3.14/epics/base/lib/linux-x86/libca.so.3.14
I can see this in the debugger.
3) Based on the fields in the structures that are present I think that I can
make a pretty good guess that the failure is occurring in the R3.14.8.2 version
(that version must be the one that Java is actually using).
4) Your symptoms look real similar to Mantis 240, but I can see in the
source code additional related R3.14.9 patches beyond what is mentioned in
that bug report.

What's next? We need to force JCA to use the latest version of CA and verify that
this fixes your issue.

Tags: ca cleanup
Revision history for this message
Jeff Hill (johill-lanl) wrote :

The next step will be to connect the debugger to the JCA process, verify that we are in fact running R3.14.9 and if so look for the cause.

tags: added: ca
Jeff Hill (johill-lanl)
description: updated
Andrew Johnson (anj)
Changed in epics-base:
status: New → Incomplete
tags: added: cleanup
Changed in epics-base:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.