RSRV crash on disconnect

Bug #1707931 reported by mdavidsaver
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Status tracked in 7.0
3.14
Fix Released
Medium
mdavidsaver
3.15
Fix Released
High
mdavidsaver
3.16
Fix Released
High
mdavidsaver
7.0
Fix Released
High
mdavidsaver

Bug Description

Freddie Akeroyd reports IOC crashes when on CA client disconnect.

http://www.aps.anl.gov/epics/tech-talk/2017/msg01206.php

The first symptom observed, an assertion failure, is the same as lp:541330.

> assert(size <= ntohs ( pMsg->m_postsize ))

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Subsequent investigation identified two issues:

1. lack of locking around send_err() in camessage() leads to a crash.
2. signed -> unsigned cast of return from recv() leads to re-processing old receive buffer contents.

The first issue effects all targets. The second is WIN32 specific.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Issue #2 was introduced by 685e6b093730b5f3bd590409e27a4e87b1c03111 between 3.15.0.1 and 3.15.0.2. The 3.14 series uses 'int' on all targets, and thus doesn't trigger send_err().

Issue #1 is is present in the 3.14 series. The first unguarded call to send_err() was present in 3.14.0, the second was added prior to 3.14.7.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Issue #1 fixed on 3.14 and 3.15 with 619a99bf997984fa067a33551a86b1ea8282bd15 and 3.16 an additional send_err() call by 0205dcc61f5b48d97bf0b31052b110f9e2c4a702

Issue #2 is "fixed" on 3.14 by 5c8e5c52ef80ee02e59be2bd9a324532d2b7fe0e which takes effect when merged into 3.15 by 111cac8e47f78a067bbcb35e50f7b01fe2c818ab

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

The signature of issue #2 will likely be

> CAS: Missaligned protocol rejected

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

There was further discussion on the core-talk list

http://www.aps.anl.gov/epics/core-talk/2017/msg00466.php

Revision history for this message
Andrew Johnson (anj) wrote :

It surprises me that the WIN32 compiler wasn't warning about the (nchar < 0) test in camsgtask.c — there were no warnings for this file in the Windows Jenkins builds for either 3.15 or 3.16. Not that anyone is really looking at those kinds of warnings from Windows...

Thanks for fixing these.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I check, "gcc -Wall" doesn't catch this either (at least gcc 6.3). It needs '-Wtype-limits' from '-Wextra'.

I also see that there are windows installers for cppcheck, which also detects this kind of error.

http://cppcheck.sourceforge.net/

So lots of ways a motivated and patient person can find these sort of issues.

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

Like Andrew, I was surprised a warning did not show up. I increased the warning flag from -W3 to -W4 but still no sign of it, it seems it is part of "warnings that are off by default" https://msdn.microsoft.com/en-us/library/23k5d385.aspx and you need to specify -Wall to see it.

  ../../../src/ioc/rsrv/camsgtask.c(95): warning C4296: '<': expression is always false

I will try and find time at some point to cppcheck and/or -Wall a base build

Revision history for this message
Freddie Akeroyd (freddie-akeroyd) wrote :

and thanks Michael for fixing the problem so quickly

Andrew Johnson (anj)
Changed in epics-base:
milestone: 3.16.branch → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.