Segfault on server exit with a GSSAPI server

Bug #1201939 reported by Vincent Giersch
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Project Moonshot
Fix Released
Medium
Unassigned

Bug Description

Hi,

I met a segfault when exiting a program which has instanced at least one
GSSAPI server:

==22893== Invalid read of size 4
==22893== at 0x7E91818: shibresolver::ShibbolethResolver::term() (in
/usr/lib/libshibresolver.so.1.0.0)
==22893== by 0x40D250E: __run_exit_handlers (exit.c:78)
==22893== by 0x40D257E: exit (exit.c:100)
==22893== by 0x81773C8: Py_Exit (pythonrun.c:1774)
==22893== by 0x8175967: handle_system_exit (pythonrun.c:1146)
==22893== by 0x8175985: PyErr_PrintEx (pythonrun.c:1156)
==22893== by 0x8175638: PyErr_Print (pythonrun.c:1059)
==22893== by 0x8174FE8: PyRun_SimpleFileExFlags (pythonrun.c:947)
==22893== by 0x817473B: PyRun_AnyFileExFlags (pythonrun.c:747)
==22893== by 0x818CFA9: Py_Main (main.c:639)
==22893== by 0x805D2E6: main (python.c:23)
==22893== Address 0x49c5de0 is 0 bytes inside a block of size 28 free'd
==22893== at 0x402719C: operator delete(void*) (vg_replace_malloc.c:457)
==22893== by 0x9533032: xmltooling::MutexImpl::~MutexImpl() (in
/usr/lib/libxmltooling.so.6.0.1)
==22893== by 0x40D257E: exit (exit.c:100)
==22893== by 0x81773C8: Py_Exit (pythonrun.c:1774)
==22893== by 0x8175967: handle_system_exit (pythonrun.c:1146)
==22893== by 0x8175985: PyErr_PrintEx (pythonrun.c:1156)
==22893== by 0x8175638: PyErr_Print (pythonrun.c:1059)
==22893== by 0x8174FE8: PyRun_SimpleFileExFlags (pythonrun.c:947)
==22893== by 0x817473B: PyRun_AnyFileExFlags (pythonrun.c:747)
==22893== by 0x818CFA9: Py_Main (main.c:639)
==22893== by 0x805D2E6: main (python.c:23)
==22893==
pure virtual method called
terminate called without an active exception

Distrib: Live DVD R4

Cheers,
Vincent

PS: This is a re-post from ML moonshot-community@.

Revision history for this message
Sam Hartman (hartmans) wrote : Re: [Bug 1201939] [NEW] Segfault on server exit with a GSSAPI server

Hi.
I think I mentioned this on the list.
I cannot reproduce this.

Unfortunately, as Scott pointed out, the code we're seeing is after
things are already broken.
There's a double destruct of a mutex.
We're only going to see what's going on by stepping through the
close-down code.

So, for example when I run gss-server I don't see this...
Hmm, although gss-server doesn't call exit.
OK, I'll look into calling exit and seeing what I get.
Actually, trustrouter probably gives a way to try this.

Revision history for this message
Vincent Giersch (vincent-giersch) wrote :

Hi,

Reproduced easily here: https://github.com/gierschv/krb5/commit/31195f4a2f4e64789041141dbdaf3672ee0cf319
If you ^C once the first authz is done, you get this segfault.

Cheers,
Vincent

Revision history for this message
Vincent Giersch (vincent-giersch) wrote :

FYI, here is the trace:

localname: moonshot
Accepted connection: ""
Received message: "test"
NOOP token
^C==14563== Invalid read of size 4
==14563== at 0x47DC818: shibresolver::ShibbolethResolver::term() (in /usr/lib/libshibresolver.so.1.0.0)
==14563== by 0x41D250E: __run_exit_handlers (exit.c:78)
==14563== by 0x41D257E: exit (exit.c:100)
==14563== by 0x804A04D: sigint_handler (gss-server.c:655)
==14563== by 0x41CD9D7: ??? (in /lib/i386-linux-gnu/i686/cmov/libc-2.13.so)
==14563== by 0x41B9E45: (below main) (libc-start.c:228)
==14563== Address 0x436e628 is 0 bytes inside a block of size 28 free'd
==14563== at 0x402719C: operator delete(void*) (vg_replace_malloc.c:457)
==14563== by 0x52C3032: xmltooling::MutexImpl::~MutexImpl() (in /usr/lib/libxmltooling.so.6.0.1)
==14563== by 0x41D257E: exit (exit.c:100)
==14563== by 0x804A04D: sigint_handler (gss-server.c:655)
==14563== by 0x41CD9D7: ??? (in /lib/i386-linux-gnu/i686/cmov/libc-2.13.so)
==14563== by 0x41B9E45: (below main) (libc-start.c:228)
==14563==
pure virtual method called
terminate called without an active exception

Revision history for this message
Sam Hartman (hartmans) wrote : Re: [Bug 1201939] Re: Segfault on server exit with a GSSAPI server

So, as best I can tell the C runtime is not doing the right thing:

==3766== Invalid read of size 4
==3766== at 0x47DD818: shibresolver::ShibbolethResolver::term()
(Threads.h:29
7)
==3766== by 0x41D850E: __run_exit_handlers (exit.c:78)
==3766== by 0x41D857E: exit (exit.c:100)
==3766== by 0x804970B: ??? (in /usr/bin/gss-server)
==3766== Address 0x436fc40 is 0 bytes inside a block of size 28 free'd
==3766== at 0x402719C: operator delete(void*)
(vg_replace_malloc.c:457)
==3766== by 0x52C3202: xmltooling::MutexImpl::~MutexImpl()
(PThreads.cpp:75)
==3766== by 0x41D857E: exit (exit.c:100)
==3766== by 0x804970B: ??? (in /usr/bin/gss-server)

In particular, the global lock used by the resolver is being destructed
and deleted prior to the call to term.

Scott, how do I move forward on debugging this?
Based on net searches this doesn't appear to be a common problem.
Also, waiting for a libc fix is unlikely to be desirable.

--Sam

Changed in moonshot:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Sam Hartman (hartmans) wrote :

Hi. I'm sort of surprised that there's not more information here. There was a long thread on the community list. I think the summary is that we're going to move to c++ destructors rather than finalizers.

Revision history for this message
Sam Hartman (hartmans) wrote :

I believe fdc24aea6397e4c8b91a8da322b81936989dedb5 in mech_eap will address this.

Changed in moonshot:
status: Confirmed → Fix Committed
Revision history for this message
Mark Donnelly (meadmaker) wrote :

Confirmed that fdc24aea6397e4c8b91a8da322b81936989dedb5 exists in the most recent version of the Moonshot libraries.

Changed in moonshot:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.