PCAS does not support postponement, by service, of first IO request on enum chan

Bug #541367 reported by Jeff Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Won't Fix
Wishlist
Ralph Lange

Bug Description

Hi all, (Jeff, Gasper, others?)

Today I got the following error message when trying to read through a CA gateway:

Invalid channel identifier
host=x05da-cagw-psi.ch:5064 ctx=Bad Resource ID=2172742414 detected at ../../../../src/cas/generic/casStrmClient.cc.1700

On the gateway I found this error log:

--*snip*------------------------------------------------------------
CAS Request: e11277 on x05da-cons-1: cmd=12 cid=2172742414 typ=0 cnt=0 psz=0 avail=1 filename="../../../../src/cas/generic/casStrmClient.cc" line number=1389 postpone asynchronous IO - enum string tbl cache read ASYNC IO postponed ?

May 01 10:28:52 !!! Errlog message received (message is above) The server library does not currently support postponment of

May 01 10:28:52 !!! Errlog message received (message is above) string table cache update of casChannel::read().

May 01 10:28:52 !!! Errlog message received (message is above) To postpone this request please postpone the PC attach IO request.

May 01 10:28:52 !!! Errlog message received (message is above) String table cache update did not occur.

May 01 10:28:52 !!! Errlog message received (message is above)
--*snip*------------------------------------------------------------

Any idea what went wrong? Gateway and caget both use 3.14.8.2.

Dirk

--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182

Original Mantis Bug: mantis-339
    http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=339

Tags: cas 3.14
Revision history for this message
Jeff Hill (johill-lanl) wrote :

Dirk,

In the PCAS, when the server initiates a read request with the service the service has three options. It can complete the request immediately, it can complete the request asynchronously, or it can postpone the request untill later if too many asynchronous requests are already in progress. Since it is a single threaded server the service isn’t allowed to block when too many asynchronous requests are outstanding. If the service returns status requesting postponement then the server has the hard job of remembering where it was with a particular client and, knowing how to resume the request later once some of the asynchronous IO completes, go off and take care of other clients.

When the server is setting up a channel whose native type is enumerated it must fetch the string table from the service using a read request, and cache this value for use later when converting between types. This is a _once_ only request, it is the first IO request to the channel so no requests are outstanding at that time, and restarting the multi-step channel create operation is very complicated, so it seemed sensible (when I wrote that code) to not support postponement of this particular request.

It is possibly a bug in the GW that it wants to postpone the first read request issued to the channel. A possibly better alternative of course would be to complete the read request asynchronously. The postponement option was originally intended to be used when at least one asynchronous IO request is already outstanding against the channel.

This type of headache is one reason why I converted the client library to fully threaded operation for R3.14, and PCAS is being converted to fully threaded operation for R3.15.

Jeff

edited on: 2009-07-01 17:14

Revision history for this message
Jeff Hill (johill-lanl) wrote :

If the GW does not have any asynchronous IO outstanding against the channel, then the server has no way to be asynchronous notified when the IO is done so that it can resume requests with the channel immediately when a new request can be started.

Revision history for this message
Jeff Hill (johill-lanl) wrote :

I created Mantis 339 to track the issue. The entry currently is against PCAS. We will need to decide on which side of the fence it needs to be fixed (PCAS or GW).

Revision history for this message
Andrew Johnson (anj) wrote :

On 2011-06-24 Tom Cumming wrote:

I'm getting the following message in our gateway log; I suspect that we're reproducing mantis 339, (launchpad 541367). We're running epics 3.14.9.

    filename="../../../../src/cas/generic/casStrmClient.cc" line number=1394
    postpone asynchronous IO - enum string tbl cache read ASYNC IO postponed ?

After I get the above message in the gateway logfile, I'm getting random values at random times with camonitor on some enumerated type PV's. If I get the same message a second time, it gets worse. Restarting the gateway fixes it.

I'm wondering if mantis 339 is fixed, or simply not a problem with newer versions of epics. The bug was filed in '09, and I don't see any movement on it.

Thanks, tom.c

Changed in epics-base:
assignee: nobody → Ralph Lange (ralph-lange)
status: New → Triaged
Andrew Johnson (anj)
Changed in epics-base:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.