Random, unexplained Virtual circuit unresponsive's (related to #111)

Bug #541238 reported by evans
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Wishlist
Andrew Johnson

Bug Description

Our operators are getting random "Virtual circuit unresponsive"
exceptions from clients such as MEDMs and ALHs that remain connected
to other IOCs and servers that remain connected to other clients.
That is, only one circuit of many in each of the client and the server
is "unresponsive". This is happening on the order of once a day in
the control room. The Gateway is getting "Virtual circuit
unresponsives" with some clients (MEDM side) and not others for the
same PV.

Original Mantis Bug: mantis-194
    http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=194

Tags: ca 3.14
Revision history for this message
Jeff Hill (johill-lanl) wrote :

Is this related to #111? Would it be possible for MEDM and ALH to process
updates for EPICS_CA_CONN_TMO seconds w/o calling a ca function that
processes the input queue?

Revision history for this message
evans (evans) wrote :

It is related to #111 in that because of the behavior described in
#111, one cannot eliminate the client as a possible cause of a
"Virtual circuit disconnect", and one cannot tell if the TCP
connection is OK or not.

It would be possible for MEDM and ALH to process updates for
EPICS_CA_CONN_TMO seconds w/o calling a CA function that processes the
input queue if they get tied up for some reason or misbehave. One
possible event that could cause this is handling a modal dialog.
Others are malfunctions of various causes.

We don't, at this point, know if the problem is with the client or
not. It would be helpful to be able to know that.

Revision history for this message
Jeff Hill (johill-lanl) wrote :

From Ken (answering my question):

>> How long does the channel stay disconnected? Just an instant, a
>> minute, 15 minutes, or several hours?

Seems to be a short time. They get a popup dialog saying there is a disconnect (in the exception handler), and it goes in the MEDM log, but they (the operators) have trouble finding which MEDM screen and which PV. It doesn't get logged when the connection is restored (in the connection handler), so the downtime is not well known. There has never been any sign of anything wrong with the IOC, and the IOC could be connected to the same PV in other MEDMs (as well as in ALH) without any disconnects. It does happen with both ALH and MEDM. (Other clients may not display the disconnect so obviously.) This is Mantis #194.

Revision history for this message
Andrew Johnson (anj) wrote :

Probably fixed in later versions of Base, we're not getting complaints like this that I'm aware of.

Revision history for this message
Andrew Johnson (anj) wrote :

R3.14.10 released.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.