DB CA link did not reconnect (is this same as #153 ????)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
High
|
Andrew Johnson |
Bug Description
From Pam:
Last night after Kay rebooted some LLRF IOCs, scl-hprf-ioc05 was unable to re-connect with some links that should have been okay – caget got the values fine, for example. This morning when I came in, it still hadn’t managed to re-connect. I looked around. The only difference I notice between an IOC that’s making connections and one that isn’t is the second “timerQueue” line: the delay is 1 when it’s working and a big number when it isn’t.
After a reboot, the IOC looks okay.
Pam.
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tExcTask excTask 1dfda20 0 PEND 1fbff8 1dfd900 3006b 0
snsUtilEven19fc1a8 1706cf8 4 PEND+T 1effbc 1706bd8 3d0002 12
syncTime 1a0ec48 1a0e568 5 DELAY 1f51fc 1a0e468 0 12701
tLogTask logTask 1dfb010 10 PEND 1fbff8 1dfaf00 0 0
tNetTask netTask 1cc9a48 50 PEND 1effbc 1cc9958 0 0
tPortmapd portmapd 1c68188 54 PEND 1effbc 1c67f28 3d0002 0
tTelnetd telnetd 1c6a358 55 PEND 1effbc 1c6a1e8 360003 0
NTPTimeSync1a82d78 1a09538 109 DELAY 1f51fc 1a093d8 3d0002 377
EIPxmtr5 1a82d78 1508388 109 PEND+T 1effbc 1507f78 3d0002 299
EIPxmtr7 1a82d78 1505218 109 DELAY 1f51fc 1505098 3d0002 1
EIPmod 1a82d78 15020a8 109 DELAY 1f51fc 1501f28 3d0002 1
cbHigh 1a82d78 1711cf8 128 PEND 1effbc 1711bd8 0 0
timerQueue 1a82d78 1725bd8 129 PEND+T 1effbc 1725648 3d0004 2146458376
scanOnce 1a82d78 14fdf40 129 PEND 1effbc 14fde30 0 0
scan0.1 1a82d78 14c81e0 133 DELAY 1f51fc 14c80a0 0 2
scan0.2 1a82d78 14cda80 134 DELAY 1f51fc 14cd940 0 6
cbMedium 1a82d78 1719448 135 PEND 1effbc 1719328 0 0
Oms58_motor1a82d78 16e5550 135 PEND+T 1effbc 16e52e0 3d0004 12880
scan0.5 1a82d78 14d3320 135 DELAY 1f51fc 14d31e0 0 17
scan1 1a82d78 14d8bc0 136 DELAY 1f51fc 14d8a80 0 22
scan2 1a82d78 14de460 137 DELAY 1f51fc 14de320 0 55
scan5 1a82d78 14e3d00 138 DELAY 1f51fc 14e3bc0 0 82
scan10 1a82d78 14e95a0 139 DELAY 1f51fc 14e9460 0 96
cbLow 1a82d78 1720b98 140 PEND 1effbc 1720a78 0 0
CAC-event 1a82d78 147bdb8 141 PEND 1effbc 147bca8 0 0
asCaTask 1a82d78 149b048 142 PEND 1effbc 149af28 0 0
CAC-UDP 1a82d78 1451530 147 PEND 1effbc 1450d60 0 0
timerQueue 1a82d78 1464d50 148 PEND+T 1effbc 14647c0 3d0004 2143265573
seqAux 1a82d78 132baf0 148 DELAY 1f51fc 132b550 0 138
CAC-event 1a82d78 12e50b8 148 PEND 1effbc 12e4fa8 0 0
CAC-event 1a82d78 fe9330 148 PEND 1effbc fe9220 0 0
dbCaLink 1a82d78 170c4f8 149 PEND 1effbc 170c3d8 0 0
B9K72.109 1a82d78 1705760 149 PEND+T 1fbff8 1705580 3d0002 7
poolPoll 1a82d78 1322f88 149 DELAY 1f51fc 1322e98 0 29
Tune 1a82d78 1310270 149 PEND+T 1effbc 13100b0 3d0002 2144567296
Tune 1a82d78 1309b20 149 PEND+T 1effbc 1309960 3d0002 2144567663
Tune 1a82d78 13033d0 149 PEND+T 1effbc 1303210 3d0002 2144567954
Tune 1a82d78 12fcc80 149 PEND+T 1effbc 12fcac0 3d0002 2144919026
Tune 1a82d78 12dc248 149 PEND+T 1effbc 12dc088 3d0002 2144568608
Tune 1a82d78 126a8d0 149 PEND+T 1effbc 126a710 3d0002 2144568820
Additional information:
IOCs involved were running R3.14.7.
This has been observed to occur only once
Original Mantis Bug: mantis-206
http://
I have two questions for you.
1) What version of EPICS runs on scl-hprf-ioc05?
2) Is this is a side effect of Kay’s IOCs hanging up during the shutdown part of a soft reboot? If that was occurring then we certainly do understand why clients would not reconnect?
If *all* of the following are true the client might not reconnect on its own.
1) client side is running R3.14.4 (possibly R3.14.5) or earlier
2) the server was unavailable for longer than 10 – 30 min (depending on EPICS version and IP kernel version)
3) the server isn’t configured so the client sees the server’s beacon, or the router does not forward the server’s beacon to the client (the client sees no changes in the frequency of server beacons)
Also, looking in mantis there was a bug fixed in the R3.14.2 – R3.14.4 (inclusive) CA repeater that might cause a problem like you have described.
So if your client side (scl-hprf-ioc05) was running R3.14.6 or later and Kay’s IOC was not hung when this was supposed to reconnect then it sounds like I will need to investigate further. Let me know.