CAC TCP socket linger set error was Invalid argument (message)

Bug #541370 reported by Jeff Hill
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Invalid
Low
Unassigned

Bug Description

That message would be emitted only in this situation (all of the following are true).

1) When CAC is forcing an abort shutdown for a circuit. That would happen only if CAC thinks the circuit is in a known unresponsive state when it shuts down the circuit.
2) The OS doesn’t like the linger options being specified to setsockopt– possibly because the file descriptor CA is using is invalid, or alternatively the linger socket options are being used in a weird context.

I am going to go out on a limb and guess that the SR tune fpga’s embedded ca server, or IP stack, is abruptly disconnecting the circuit during the connect phase, and that the linger arguments I am using are not appropriate when a TCP circuit hasn’t connected yet (that’s possible).

So first thing to check is that the SR tune FPGA has the right build options specifying its native byte order. If they were wrong the CA server would detect bad protocol and immediately disconnect the circuit with its client. Could that be the root issue?

Also, it looks like there might be a benign bug in CAC where it shouldn’t try to set the linger time to zero if the TCP circuit hasn’t indicated that it connected yet. If it looks like that is occurring then I should create a mantis entry so I won’t forget to fix the issues. It’s not a huge issue, but nevertheless I should take care of it. The first step would be to increase our certainty that a problem exists.

Jeff

From: Janet Anderson
Sent: Tuesday, May 26, 2009 2:42 PM
To: Jeff Hill
Subject: [Fwd: medm error message]

Hi Jeff,

Do you know what would cause the following CAC error message? MEDM was built
with R3.14.9 base.

Janet

-------- Original Message --------
Subject: medm error message
Date: Tue, 26 May 2009 15:00:25 -0500

Janet,

I received the following message while testing the SR tune FPGA box
(BR:tune:...)

Please check.

--CY

MEDM Version 3.1.3: Loading aliased fonts.................
CAC TCP socket linger set error was Invalid argument

Original Mantis Bug: mantis-341
    http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=341

Tags: ca 3.14 cleanup
Revision history for this message
Jeff Hill (johill-lanl) wrote :

Thanks jeff. I found out that there was an unscheduled power outage in the control room where some racks went down. IT is still working on rebooting some systems.

Revision history for this message
Jeff Hill (johill-lanl) wrote :

> One more thing to add about this: Various unusual things have been
> happening
> with our network today -- apparently someone turned off power to the wrong
> rack in the computer room and various services were yanked as a result,
> possibly including one or more network switches. I don't think it is
> worth
> investigating this issue any further until/unless it occurs again during a
> period when the network is stable (I don't really think it justifies a
> Mantis
> bug report at this stage).
>
> - Andrew

Revision history for this message
Jeff Hill (johill-lanl) wrote :

It looks like CA's use of linger options may have been wrong in this context, but AFAICT the linger options set request was ignored and so nothing failed, but nevertheless if I understood exactly what happened I might be able to fix it so that the message isn’t printed.

Revision history for this message
Jeff Hill (johill-lanl) wrote :

From Eric,

The IOC in question has been working happily with MEDM for quite some time (over a year). This is the first time we've seen the "CAC TCP socket linger set error" messages. The IOC is an m68k architecture
(Coldfire) -- the build options for the byte order are correct (big- endian).
Other weird things were happening with MEDM at this time -- a window that should inlcude three cartesian plot widgets came up but displayed only one of them -- the others weren't white, they were simply not present.
I'm not sure that any effort should be spent investigating this problem until it starts showing up more often.

Andrew Johnson (anj)
Changed in epics-base:
status: New → Incomplete
tags: added: cleanup
Changed in epics-base:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.