Segfaults in ipToAsciiProxy

Bug #1645301 reported by Ralph Lange
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Medium
mdavidsaver
3.14
Fix Released
Medium
mdavidsaver
3.15
Fix Released
Medium
mdavidsaver
3.16
Fix Released
Medium
mdavidsaver

Bug Description

We see CA commandline tools segfaulting inside ipToAsciiProxy() under yet unknown conditions.

Nov 28 10:14:12 4504DS-SRV-0003 kernel: ipToAsciiProxy[2388]: segfault at 0 ip 00007ffb833d5a90 sp 00007ffb8272ec00 error 6 in libca.so.3.15.5[7ffb833ac000+5f000]
Nov 28 10:14:12 4504DS-SRV-0003 abrt[2396]: Not saving repeating crash in '/opt/codac-5.4/epics/bin/linux-x86_64/caput'
Nov 28 10:14:14 4504DS-SRV-0003 kernel: ipToAsciiProxy[2571]: segfault at 0 ip 00007fe3d0cf6a90 sp 00007fe3d004fc00 error 6 in libca.so.3.15.5[7fe3d0ccd000+5f000]
Nov 28 10:14:14 4504DS-SRV-0003 abrt[2579]: Not saving repeating crash in '/opt/codac-5.4/epics/bin/linux-x86_64/caput'

RHEL 6.5, EPICS 3.15.5-rc1

Related branches

Revision history for this message
mdavidsaver (mdavidsaver) wrote : Re: [Bug 1645301] [NEW] Segfaults in ipToAsciiProxy

Can you use addr2line to translate the IP or libca.so offset to a source
line number?

On 11/28/2016 07:07 AM, Ralph Lange wrote:
> Public bug reported:
>
> We see CA commandline tools segfaulting inside ipToAsciiProxy() under
> yet unknown conditions.
>
> Nov 28 10:14:12 4504DS-SRV-0003 kernel: ipToAsciiProxy[2388]: segfault at 0 ip 00007ffb833d5a90 sp 00007ffb8272ec00 error 6 in libca.so.3.15.5[7ffb833ac000+5f000]
> Nov 28 10:14:12 4504DS-SRV-0003 abrt[2396]: Not saving repeating crash in '/opt/codac-5.4/epics/bin/linux-x86_64/caput'
> Nov 28 10:14:14 4504DS-SRV-0003 kernel: ipToAsciiProxy[2571]: segfault at 0 ip 00007fe3d0cf6a90 sp 00007fe3d004fc00 error 6 in libca.so.3.15.5[7fe3d0ccd000+5f000]
> Nov 28 10:14:14 4504DS-SRV-0003 abrt[2579]: Not saving repeating crash in '/opt/codac-5.4/epics/bin/linux-x86_64/caput'
>
> RHEL 6.5, EPICS 3.15.5-rc1
>
> ** Affects: epics-base
> Importance: Undecided
> Status: New
>
> ** Affects: epics-base/3.15
> Importance: Undecided
> Status: New
>
> ** Affects: epics-base/3.16
> Importance: Undecided
> Status: New
>
> ** Also affects: epics-base/3.15
> Importance: Undecided
> Status: New
>
> ** Also affects: epics-base/3.16
> Importance: Undecided
> Status: New
>
> ** Changed in: epics-base/3.15
> Milestone: None => 3.15.5
>

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Possibly this is connected to a

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"

but that needs to be verified.

At this moment, it smells like a race condition between shut-down of a short-lived client and a DNS resolution going on (or starting) at the same moment.
I can see "segfault at 0" (null pointer) and "segfault at 8" (accessing a member of a structure referenced by a null pointer?).

Revision history for this message
Ralph Lange (ralph-lange) wrote :

No debug information available for the related binaries and libraries.

I also don't seem to be able to get hold of a core file. Yet.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Even without debug symbols, addr2line should still give a function name, which would be helpful in narrowing this down. I agree that this seems like another side-effect of not joining the resolver thread.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

To state the (possibly) obvious. "ipToAsciiProxy" is the name of the async DNS resolver thread. That fact that the IP is in libca suggests that the bug is in hostNameCache or msgForMultiplyDefinedPV which use the ipAddrToAscii facility.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

$ addr2line -e /opt/codac/epics/bin/linux-x86_64/caput 00007ffb833d5a90 00007fe3d0cf6a90
??:0
??:0

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Since I have only seen these dumps with multiply defined PVs, I suspect the latter.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Well that's discouraging. For 3.15.3 and 3.16 head I find that the text of libca.so with linux-x86_64 ends at 0x4f84c, so offset 0x5f000 is jumping way past the end. So this is probably a virtual call on a delete'd object.

Perhaps coincidentally, I also find that, for me, the code for msgForMultiplyDefinedPV (except for _fini) is the last in the text segment.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.