crashes with glibc-2.14/2.15 on dlopen (seen with kvm and gnucash)

Bug #893605 reported by Matthias Klose on 2011-11-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GLibC
Fix Released
Medium
eglibc (Ubuntu)
High
Unassigned
glibc (Fedora)
Fix Released
Undecided

Bug Description

seen with glibc-2.14/glibc-2.15:

kvm -cdrom <iso>

Program received signal SIGSEGV, Segmentation fault.
0xb7fe7740 in ?? () from /lib/ld-linux.so.2
(gdb) bt
#0 0xb7fe7740 in ?? () from /lib/ld-linux.so.2
#1 0xb7fe7eb9 in ?? () from /lib/ld-linux.so.2
#2 0xb7a26490 in do_sym (handle=0xb7d86860,
    name=0xb7c7ff4f "XAllocClassHint", who=<optimized out>, vers=0x0, flags=2)
    at dl-sym.c:178
#3 0xb7a26927 in _dl_sym (handle=<optimized out>, name=<optimized out>,
    who=<optimized out>) at dl-sym.c:283
#4 0xb778cd67 in dlsym_doit (a=0xbfffeef0) at dlsym.c:51
#5 0xb7feccaf in ?? () from /lib/ld-linux.so.2
#6 0xb778d33a in _dlerror_run (operate=0xb778cd40 <dlsym_doit>,
    args=0xbfffeef0) at dlerror.c:164
#7 0xb778cde4 in __dlsym (handle=0xb7d86860,
    name=0xb7c7ff4f "XAllocClassHint") at dlsym.c:71
#8 0xb7c56b5a in SDL_LoadFunction () from /usr/lib/libSDL-1.2.so.0
#9 0xb7c58511 in ?? () from /usr/lib/libSDL-1.2.so.0
#10 0xb7c5a8aa in ?? () from /usr/lib/libSDL-1.2.so.0
#11 0xb7c61825 in ?? () from /usr/lib/libSDL-1.2.so.0
#12 0xb7c5155a in SDL_VideoInit () from /usr/lib/libSDL-1.2.so.0
#13 0xb7c25c7a in SDL_InitSubSystem () from /usr/lib/libSDL-1.2.so.0
#14 0xb7c25cfb in SDL_Init () from /usr/lib/libSDL-1.2.so.0
#15 0x00202967 in ?? ()
---Type <return> to continue, or q <return> to quit---
#16 0x0013cfdc in main ()

gnucash:

Program received signal SIGSEGV, Segmentation fault.
0x00119740 in ?? () from /lib/ld-linux.so.2
(gdb) bt
#0 0x00119740 in ?? () from /lib/ld-linux.so.2
#1 0x00119eb9 in ?? () from /lib/ld-linux.so.2
#2 0x00c0a490 in do_sym (handle=0xb7ffd000,
    name=0x10eeec4 "g_module_check_init", who=<optimized out>, vers=0x0,
    flags=2) at dl-sym.c:178
#3 0x00c0a927 in _dl_sym (handle=<optimized out>, name=<optimized out>,
    who=<optimized out>) at dl-sym.c:283
#4 0x03195d67 in dlsym_doit (a=0xbfffedc0) at dlsym.c:51
#5 0x0011ecaf in ?? () from /lib/ld-linux.so.2
#6 0x0319633a in _dlerror_run (operate=0x3195d40 <dlsym_doit>,
    args=0xbfffedc0) at dlerror.c:164
#7 0x03195de4 in __dlsym (handle=0xb7ffd000,
    name=0x10eeec4 "g_module_check_init") at dlsym.c:71
#8 0x010ee065 in g_module_symbol ()
   from /usr/lib/i386-linux-gnu/libgmodule-2.0.so.0
#9 0x010ee54f in g_module_open ()
   from /usr/lib/i386-linux-gnu/libgmodule-2.0.so.0
#10 0x003ff61e in ?? () from /usr/lib/gnucash/libgnc-module.so.0
#11 0x003ff90b in gnc_module_load () from /usr/lib/gnucash/libgnc-module.so.0
#12 0x0804ca5f in _start ()

Created attachment 482361
Stack trace of the rpmbuild call that SIGSEGVs during $(fedpkg local)

See attached stack trace.

*** Bug 682415 has been marked as a duplicate of this bug. ***

*** Bug 682431 has been marked as a duplicate of this bug. ***

*** Bug 682398 has been marked as a duplicate of this bug. ***

*** Bug 682472 has been marked as a duplicate of this bug. ***

The two bugs that I have duped are not compilation crashes but are actually instances where the do_lookup_x issue has caused crashes in two other applications.

*** Bug 682406 has been marked as a duplicate of this bug. ***

*** Bug 682759 has been marked as a duplicate of this bug. ***

glibc-2.13.90-6 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/glibc-2.13.90-6

*** Bug 683009 has been marked as a duplicate of this bug. ***

*** Bug 682566 has been marked as a duplicate of this bug. ***

glibc-2.13.90-6 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.

*** Bug 683116 has been marked as a duplicate of this bug. ***

Matthias Klose (doko) wrote :

seen on Fedora in March. Apparently the corresponding patch is applied in trunk.
http://sourceware.org/git/?p=glibc.git;a=commit;h=6a5ee1029b3966c5ae9adaaa881e255b2880f511

Changed in eglibc (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Matthias Klose (doko) wrote :

test packages in the ubuntu-toolchain-r/glibc ppa

Download full text (3.5 KiB)

I've reproduced the gnucash crash.

The actual crash stack trace is (using 2.15~pre6-0ubuntu2):

#0 do_lookup_x (new_hash=2334765441, old_hash=<optimized out>, result=0x7fffffffdb60, scope=<optimized out>, i=0, flags=2, skip=0x0, undef_map=0x7ffff7ff8000) at dl-lookup.c:98
#1 0x00007ffff7de44e3 in _dl_lookup_symbol_x (undef_name=0x7ffff272e0c0 "g_module_check_init", undef_map=0x7ffff7ff8000, ref=0x7fffffffdc80, symbol_scope=0x7ffff7ff8388, version=0x0, type_class=0, flags=2, skip_map=0x0) at dl-lookup.c:739
#2 0x00007ffff57a041a in do_sym (handle=<optimized out>, name=0x7ffff272e0c0 "g_module_check_init", who=<optimized out>, vers=<optimized out>, flags=2) at dl-sym.c:178
#3 0x00007fffeace7044 in dlsym_doit (a=0x7fffffffde50) at dlsym.c:51
#4 0x00007ffff7de90f6 in _dl_catch_error (objname=0x611410, errstring=0x611418, mallocedp=0x611408, operate=0x7fffeace7030 <dlsym_doit>, args=0x7fffffffde50) at dl-error.c:178
#5 0x00007fffeace752f in _dlerror_run (operate=0x7fffeace7030 <dlsym_doit>, args=0x7fffffffde50) at dlerror.c:164
#6 0x00007fffeace709a in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:71
#7 0x00007ffff272d3f0 in g_module_symbol () from /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0
#8 0x00007ffff272d8aa in g_module_open () from /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0
#9 0x00007ffff70de17e in ?? () from /usr/lib/gnucash/libgnc-module.so.0
#10 0x00007ffff70de468 in gnc_module_load () from /usr/lib/gnucash/libgnc-module.so.0
#11 0x0000000000405cd5 in _start ()

The problem appears to be that the map->l_local_scope in frame #2
is corrupt:

(gdb) p map.l_name
$16 = 0x7ffff7ffafa8 "/usr/lib/gnucash/gnucash/libgncmod-app-utils.so"
(gdb) p map.l_local_scope
$17 = {0x7ffff7ff82b8, 0x0}
(gdb) p map.l_local_scope[0]
$18 = (struct r_scope_elem *) 0x7ffff7ff82b8
(gdb) p map.l_local_scope[0][0]
$19 = {r_list = 0x63bb48, r_nlist = 56}
(gdb) p map.l_local_scope[0][0].r_list[0]
$20 = (struct link_map *) 0x51

Back in frame #0:

(gdb) x/i $pc
=> 0x7ffff7de3b32 <do_lookup_x+146>: mov 0x28(%rax),%rsi
(gdb) p/x $rax
$21 = 0x51

The value 0x51 is written into 0x63bb48 here:
Hardware watchpoint 1: *(int**)0x0063bb48

Old value = (int *) 0x301
New value = (int *) 0x51
_int_malloc (av=0x7ffff5a27720, bytes=64) at malloc.c:3586
3586 in malloc.c
(gdb) bt
#0 _int_malloc (av=0x7ffff5a27720, bytes=64) at malloc.c:3586
#1 0x00007ffff56f3495 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3274
#2 0x00007ffff5c983e1 in g_malloc0 () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff2e74bd6 in ?? () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#4 0x00007ffff2e790b2 in g_type_register_static () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#5 0x00007ffff2e5bd7b in g_flags_register_static () from /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#6 0x00007ffff3ecadf3 in gnome_date_edit_flags_get_type () from /usr/lib/libgnomeui-2.so.0
#7 0x00007ffff3eff365 in gnome_type_init () from /usr/lib/libgnomeui-2.so.0
#8 0x00007ffff3ed84eb in ?? () from /usr/lib/libgnomeui-2.so.0
#9 0x00007ffff678683e in gnome_program_preinit () from /usr/lib/libgnome-2.so.0
#10 0x00007ffff678751e in...

Read more...

Valgrind confirms:

==11099== Invalid read of size 8
==11099== at 0x4009B1A: do_lookup_x (dl-lookup.c:98)
==11099== by 0x400A4E2: _dl_lookup_symbol_x (dl-lookup.c:739)
==11099== by 0x730D419: do_sym (dl-sym.c:178)
==11099== by 0x11D23043: dlsym_doit (dlsym.c:51)
==11099== by 0x400F0F5: _dl_catch_error (dl-error.c:178)
==11099== by 0x11D2352E: _dlerror_run (dlerror.c:164)
==11099== by 0x11D23099: dlsym (dlsym.c:71)
==11099== by 0xA2DD3EF: g_module_symbol (gmodule-dl.c:147)
==11099== by 0xA2DD8A9: g_module_open (gmodule.c:630)
==11099== by 0x592C17D: gnc_module_load_common (gnc-module.c:501)
==11099== by 0x592C467: gnc_module_load (gnc-module.c:552)
==11099== by 0x405CD4: load_gnucash_modules (gnucash-bin.c:595)
==11099== Address 0x194e2a28 is 456 bytes inside a block of size 904 free'd
==11099== at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11099== by 0x4012871: _dl_scope_free (dl-scope.c:32)
==11099== by 0x40143D5: _dl_close_worker (dl-close.c:130)
==11099== by 0x4014FBD: _dl_close (dl-close.c:779)
==11099== by 0x400F0F5: _dl_catch_error (dl-error.c:178)
==11099== by 0x11D2352E: _dlerror_run (dlerror.c:164)
==11099== by 0x11D2300E: dlclose (dlclose.c:48)
==11099== by 0xA2DD299: g_module_close (gmodule-dl.c:134)
==11099== by 0x592BC0D: gnc_module_get_info (gnc-module.c:329)
==11099== by 0x592B812: gnc_module_system_refresh (gnc-module.c:190)
==11099== by 0x592B72C: gnc_module_system_init (gnc-module.c:137)
==11099== by 0x406391: main (gnucash-bin.c:851)

The problem appears to have been introduced here:

4bff6e01 (Andreas Schwab 2011-02-25 20:49:48 -0500 127) {
4bff6e01 (Andreas Schwab 2011-02-25 20:49:48 -0500 128) struct link_map **oldp = map->l_initfini;
4bff6e01 (Andreas Schwab 2011-02-25 20:49:48 -0500 129) map->l_initfini = map->l_orig_initfini;
4bff6e01 (Andreas Schwab 2011-02-25 20:49:48 -0500 130) _dl_scope_free (oldp);
4bff6e01 (Andreas Schwab 2011-02-25 20:49:48 -0500 131) }

@@ -119,8 +119,17 @@ _dl_close_worker (struct link_map *map)
   if (map->l_direct_opencount > 0 || map->l_type != lt_loaded
       || dl_close_state != not_pending)
     {
- if (map->l_direct_opencount == 0 && map->l_type == lt_loaded)
- dl_close_state = rerun;
+ if (map->l_direct_opencount == 0)
+ {
+ if (map->l_type == lt_loaded)
+ dl_close_state = rerun;
+ else if (map->l_type == lt_library)
+ {
+ struct link_map **oldp = map->l_initfini;
+ map->l_initfini = map->l_orig_initfini;
+ _dl_scope_free (oldp);
+ }
+ }

The libraries that are loaded as direct dependencies of a.out have
map->l_type == lt_library.

This shows up as a crash in gnucash with glibc-2.15 on Precise Pangolin.
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/893605

Confirmed present in current glibc git trunk.

Test:

/// --- cut --- foo.c ---
int foo () { return bar (); }

/// --- cut --- bar.c ---
int bar () { return 42; }

/// --- cut --- t.c ---
#include <stdio.h>
#include <dlfcn.h>

int main ()
{
  void *h = dlopen ("./foo.so", RTLD_LAZY|RTLD_GLOBAL);
  void *p = dlsym (h, "bar");

  printf ("h = %p, p = %p\n", h, p);

  dlclose (h);

  h = dlopen ("./foo.so", RTLD_LAZY|RTLD_GLOBAL);
  p = dlsym (h, "bar");
  printf ("h = %p, p = %p\n", h, p);

  return 0;
}

gcc -fPIC -shared -o bar.so bar.c &&
gcc -fPIC -shared -o foo.so foo.c ./bar.so &&
gcc t.c ./foo.so ./bar.so -ldl

valgrind ./a.out # no errors with glibc-2.11

==16605== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==16605== Using Valgrind-3.8.0.SVN and LibVEX; rerun with -h for copyright info
==16605== Command: ./a.out
==16605==
h = 0x4023b78, p = 0x503759c
==16605== Invalid read of size 8
==16605== at 0x40093F6: do_lookup_x (/tmp/glibc-git/elf/dl-lookup.c:98)
==16605== by 0x4009E4A: _dl_lookup_symbol_x (/tmp/glibc-git/elf/dl-lookup.c:739)
==16605== by 0x5551305: do_sym (/tmp/glibc-git/elf/dl-sym.c:178)
==16605== by 0x523A043: dlsym_doit (/tmp/glibc-git/dlfcn/dlsym.c:51)
==16605== by 0x400E685: _dl_catch_error (/tmp/glibc-git/elf/dl-error.c:178)
==16605== by 0x523A4DB: _dlerror_run (/tmp/glibc-git/dlfcn/dlerror.c:164)
==16605== by 0x523A099: dlsym (/tmp/glibc-git/dlfcn/dlsym.c:71)
==16605== by 0x400806: main (in /tmp/bug/a.out)
==16605== Address 0x57e6098 is 40 bytes inside a block of size 72 free'd
==16605== at 0x4C2C0EB: free (/valgrind-test/coregrind/m_replacemalloc/vg_replace_malloc.c:426)
==16605== by 0x4011D21: _dl_scope_free (/tmp/glibc-git/elf/dl-scope.c:32)
==16605== by 0x4013446: _dl_close_worker (/tmp/glibc-git/elf/dl-close.c:130)
==16605== by 0x401407B: _dl_close (/tmp/glibc-git/elf/dl-close.c:779)
==16605== by 0x400E685: _dl_catch_error (/tmp/glibc-git/elf/dl-error.c:178)
==16605== by 0x523A4DB: _dlerror_run (/tmp/glibc-git/dlfcn/dlerror.c:164)
==16605== by 0x523A00E: dlclose (/tmp/glibc-git/dlfcn/dlclose.c:48)
==16605== by 0x4007DF: main (in /tmp/bug/a.out)
==16605==
h = 0x4023b78, p = 0x503759c
==16605==
==16605== HEAP SUMMARY:
==16605== in use at exit: 0 bytes in 0 blocks
==16605== total heap usage: 2 allocs, 2 frees, 200 bytes allocated
==16605==
==16605== All heap blocks were freed -- no leaks are possible
==16605==
==16605== For counts of detected and suppressed errors, rerun with: -v
==16605== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 2 from 2)

The bug may have been introduced here:

commit 4bff6e0175ed195871f4e01cc4c4c33274b8f6e3
Author: Andreas Schwab <email address hidden>
Date: Fri Feb 25 20:49:48 2011 -0500

    Fix memory leak in dlopen with RTLD_NOLOAD.

Upstream bug with short reproducer filed:
http://sourceware.org/bugzilla/show_bug.cgi?id=13579

Matthias Klose (doko) wrote :

updated the package in the PPA, this crash is gone. people testing the packages still do see a crash, this time on amd64 only. this is bug #919202.

I confirm that neither gnucash, nor the test from
http://sourceware.org/bugzilla/show_bug.cgi?id=13579
show dangling access under Valgrind using
libc6_2.15~pre6-0ubuntu4_amd64.deb

Launchpad Janitor (janitor) wrote :
Download full text (5.5 KiB)

This bug was fixed in the package eglibc - 2.15~pre6-0ubuntu10

---------------
eglibc (2.15~pre6-0ubuntu10) precise; urgency=low

  * Merge from Debian (r5151, 2.13-26).

eglibc (2.15~pre6-0ubuntu9) precise; urgency=low

  * Merge from Debian (r5143, 2.13-25):
  * Revert the patch from upstream PR 12724, which can cause surprising
    changes in fclose behaviour when multiple file handles refer to the
    same file (fclose on one changes file position on the other).
  * Replace ubuntu/issue13579.diff with any/local-leak-revert-crash.diff,
    a partial reversion of the patch.

eglibc (2.15~pre6-0ubuntu8) precise; urgency=low

  * Allow linking against the obsolete sunrpc implementation.

eglibc (2.15~pre6-0ubuntu7) precise; urgency=low

  * Fix BZ#16318, sort objects before relocations. LP: #919202.

eglibc (2.15~pre6-0ubuntu6) precise; urgency=low

  * Set minimum required kernel version to 2.6.24 on x86.
  * Set minimum required kernel version to 2.6.32 on powerpc.
  * Set minimum required kernel version to 2.6.31 on ARM (still needed
    for the imx51 kernel).
  * Disable multiarch routines for floor, ceil, rint, nearbyint. Work around
    PR 13618.

  * Merge from Debian (r5124):
  [ Aurelien Jarno ]
  * patches/s390/cvs-libm-ulps.diff: new patch to fix FTBFS on s390 with
    gcc-4.6.
  * Update Swedish debconf translation, by Martin Bagge. Closes: #653559.
  * Update Norwegian bokmÃ¥l debconf translation, by Bjørn Steensrud. Closes:
    #653566.
  * Add port 783 (spamd) to /etc/bindresvport.blacklist. Closes: #629984.
  * patches/any/cvs-vfscanf.diff: new patch from upstream to fix segfault in
    sscanf for large decimal input string. Closes: #553206.
  * local/manpages/ld.so.8: add a missing new line. Closes: #654582.
  * debhelper.in/libc.NEWS: suggest environment variables as an
    alternative, thanks to Jonathan Nieder for the idea. Closes: #654835.
  * Update Polish debconf translation, by Michał Kułach. Closes: #657748.
  * Update Spanish debconf translation, by Omar Campagne. Closes: #655850.
  * Danish debconf translation update from Joe Hansen. Closes: #656681.
  * patches/alpha/submitted-epoll_create1.diff: new patch from Mike
    Frysinger to fix epoll_create1() on alpha. Closes: #653441.
  * debian/control.in/main: bump build-depends on libc-linux-dev to
    (>= 3.2.1-1) to get accept4 defined on ia64.
    debian/libc6.1.symbols.ia64: force accept4 version to 2.13-25~.
  * debian/control.in/main: use architecture aliases in build-depends.
  * patches/amd64/cvs-avx-osxsave.diff: new patch from upstream to disable
    AVX support if the kernel doesn't support it.
  * patches/any/cvs-fmtmsg-lock.diff: new patch from upstream to fix a
    locking issue in fmtmsg.
  * patches/any/cvs-reloc-sort.diff: new patch from upstream to fix
    relocation issues with dlopen().

  [ Samuel Thibault ]
  * patches/hurd-i386/submitted-mmap.diff: New patch to fix iceweasel hang.
  * patches/hurd-i386/submitted-hurd-socket-EAFNOSUPPORT.diff: New patch to
    fix error value.

eglibc (2.15~pre6-0ubuntu4) precise; urgency=low

  * Issue #13579, revert the fix for #12509. LP: #893605.
  * Add an pldd manpage.
  * Merge Debian packaging...

Read more...

Changed in eglibc (Ubuntu):
status: Confirmed → Fix Released
Changed in glibc:
importance: Unknown → Medium
status: Unknown → Confirmed

*** Bug 13906 has been marked as a duplicate of this bug. ***

Changed in glibc:
status: Confirmed → New

The same bug has been reported for openSUSE with glibc 2.11.3 which contains a backport of the commit.
I'm appending the patch that Fedora seems to be using

Created attachment 6317
Patch that Fedora uses

Andreas,

Thanks for pulling those bits out of larger glibc-fedora.patch. One of my ongoing TODOs is to identify independent patches from glibc-fedora.patch and create distinct patch files for them (easier to track what needs to be pushed upstream).

Is there a clear description of exactly the problem solved by the patch? This looks like a serious problem that we need to fix, but I have not seen a clear "What problem are your solving?" kind of description of the problem and how the fix relates.

It appears that the patch does:

* In _dl_close_worker we don't swap back the old l_orig_initfini maps (is this what was causing the access to dangling memory?) we made at startup, and we don't free the current l_initfini maps we were using (they will get freed elsewhere).

* In _dl_map_object_deps we immediately free the old l_initfini maps instead of saving them to l_orig_initfini. We also use a new flag l_free_initfini to mark that the maps were dynamically allocated and need to be free'd.

* Given that we free the old l_initfini immediately, we only have the current l_initfini to free when l_free_initfini is 1, and that is done in libc_freeres_fn.

This seems logical and the change looks correct.

However, what was *actually* wrong with the original implementation?

Was it the swapping back of the l_orig_initfini?

What is wrong with the old l_orig_initfini?

Btw. to just fix the accessing of dangling memory, here's a simple (but broken) patch with a comment to explain the problem that the current implementation has:

===================================================================
--- glibc-2.11.3.orig/elf/dl-close.c 2011-05-27 15:08:23.000000000 +0200
+++ glibc-2.11.3/elf/dl-close.c 2011-07-13 19:28:52.000000000 +0200
@@ -127,7 +127,13 @@ _dl_close_worker (struct link_map *map)
            {
              struct link_map **oldp = map->l_initfini;
              map->l_initfini = map->l_orig_initfini;
- _dl_scope_free (oldp);
+ /* We can't remove the l_initfini memory because
+ it's shared with l_searchlist.r_list. We don't clear
+ the latter so when we dlopen this object again that
+ entry would point to stale memory. And we don't want
+ to recompute it as it would involve a new call to
+ map_object_deps.
+ _dl_scope_free (oldp); */
            }
        }

This patch is broken since now oldp never gets freed and thus some tests fail.

The Fedora patch is AFAIK applying Andreas Schwab's initial patch that Ulrich Drepper changed ontop of Ulrich's change (thus adding Andreas' initial version)

Here's a link to the initial patch
http://sourceware.org/ml/libc-hacker/2011-02/msg00004.html

Carlos, the Fedora patch reverts Ulrich's patch and adds Andreas' initial patch.

So, instead of looking what the Fedora patch does, I think it's better to look at Ulrich's patch and Andreas commit individually since they are two different solutions to the same problem. I don't know why Ulrich changed Andreas' patch.

Looking at the diff between Ulrich's and Andreas' patch is IMO not the best way to review this - but that's what the Fedora patch does (instead of reverting a patch and adding another).

Download full text (3.2 KiB)

(In reply to comment #6)
> + /* We can't remove the l_initfini memory because
> + it's shared with l_searchlist.r_list. We don't clear
> + the latter so when we dlopen this object again that
> + entry would point to stale memory. And we don't want
> + to recompute it as it would involve a new call to
> + map_object_deps.
> + _dl_scope_free (oldp); */

Thanks this is starting to explain the problem.

Why is l_searchlist.r_list sharing the memory of l_initfini, they each have nothing to do with eachother, other than the fact that they are both of type `struct link_map **`.

In dl-close.c I see this:
~~~
318 if (imap->l_searchlist.r_list == NULL && imap->l_initfini != NULL)
319 {
320 /* The object is still used. But one of the objects we are
321 unloading right now is responsible for loading it. If
322 the current object does not have it's own scope yet we
323 have to create one. This has to be done before running
324 the finalizers.
325
326 To do this count the number of dependencies. */
327 unsigned int cnt;
328 for (cnt = 1; imap->l_initfini[cnt] != NULL; ++cnt)
329 ;
330
331 /* We simply reuse the l_initfini list. */
332 imap->l_searchlist.r_list = &imap->l_initfini[cnt + 1];
333 imap->l_searchlist.r_nlist = cnt;
334
335 new_list = &imap->l_searchlist;
336 }
~~~
Which doesn't make any sense to me. Why are we resusing l_initfini for what appears to be a completely orthogonal purpose?

However, in dl-deps.c(_dl_map_object_deps) we clearly get:
~~~
513 /* Store the search list we built in the object. It will be used for
514 searches in the scope of this object. */
515 struct link_map **l_initfini =
516 (struct link_map **) malloc ((2 * nlist + 1)
517 * sizeof (struct link_map *));
518 if (l_initfini == NULL)
519 _dl_signal_error (ENOMEM, map->l_name, NULL,
520 N_("cannot allocate symbol search list"));
521
522
523 map->l_searchlist.r_list = &l_initfini[nlist + 1];
524 map->l_searchlist.r_nlist = nlist;
~~~
Which allocates a list of 2*nlist+1 size, and points l_searchlist.r_list into nlist+1 which is 2 beyond nlist pointers, and we see why it does this in a second...

Then we point l_initfini at the start of the memory block and terminate it:
~~~
687 /* Terminate the list of dependencies. */
688 l_initfini[nlist] = NULL;
689 atomic_write_barrier ();
690 map->l_initfini = l_initfini;
~~~

So we have a contiguous allocation of pointers split in two with a NULL entry:

|A---------------|NULL|B-----------------|

Where map->l_initfini points to A [0,nlist], including NULL, it is nlist+1 pointers.

Where map->l_searchlist.r_list points to B [nlist+1,2*nlist+1] which is only nlist pointers.

Obviously freeing l_initfini is impossible unless you also clear l_searchlist.r_list, and then you'd have to recompute the r_list.

I need to ...

Read more...

Michael, could you comment a bit on this? You debugged this some time ago and did the patch in comment #6.

Changed in glibc:
status: New → Confirmed

*** Bug 12871 has been marked as a duplicate of this bug. ***

Reproduced with 2.15 head and trunk.

Changed in glibc:
status: Confirmed → In Progress

*** Bug 14035 has been marked as a duplicate of this bug. ***

Any update on this issue?
What is holding the obvious patch from Andreas back?

(In reply to comment #13)
> Any update on this issue?
> What is holding the obvious patch from Andreas back?

Sorry, I've been holding up the obvious patch with a lengthy review.

I've filed BZ#14274 for the actual cleanup.

At this point in the code freeze it is too risky to make the larger change.

I'm running some final tests on the smaller patch and I'll post to libc-alpha for final review, then we'll check it in for 2.16.

The proposed patch is used by several distributions (including Fedora and openSUSE) without known issues for some time, so it has received good testing.

This is now fixed for 2.16 by my commit 0479b305c5b7c8e3fa8e3002982cf8cac02b842e
~~~~
Fix invalid memory access in do_lookup_x.

[BZ #13579] Do not free l_initfini and allow it to be reused
on subsequent dl_open calls for the same library. This fixes
the invalid memory access in do_lookup_x when the previously
free'd l_initfini was accessed through l_searchlist when a
library had been opened for the second time.
~~~

http://sourceware.org/git/?p=glibc.git;a=commit;h=0479b305c5b7c8e3fa8e3002982cf8cac02b842e

Filed BZ #14284 for the backport request.

Changed in glibc:
status: In Progress → Fix Released

I know this issue is closed, but for someone who needs to see the exact test case using which how this bug is reproduced by simple C test code, please find below simple C test case:
***************************** Source Code ***********************************
yogesh$ cat lib1.c
#include <stdio.h>

int lib1_func()
{
        return lib2_func();
}
----------------------------------------------
yogesh$ cat lib2.c
#include <stdio.h>

int lib2_func()
{
        return 10;
}
----------------------------------------------
yogesh$ cat main.c
#include <stdio.h>
#include <dlfcn.h>
#include <pthread.h>

void *handle;

static void *thread_abc()
{
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<thread_abc> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);
        return NULL;
}

static void *thread_xyz()
{
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<thread_xyz> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);
        return NULL;
}

int main()
{
        pthread_t abc_arr[1000], xyz_arr[1000];
        int i=0;
        handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
        void *func = dlsym (handle, "lib2_func");
        printf ("<main> Handle:%p, func:%p \n", handle, func);
        for (i=0;i<10;i++)
        {
                pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
                pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
        }

        printf ("<main> Handle:%p, func:%p \n", handle, func);
        dlclose (handle);

        for (i=0;i<1000;i++)
        {
                pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
                pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
        }
        for (i=0;i<10;i++)
        {
                pthread_join(abc_arr[i], NULL);
                pthread_join(xyz_arr[i], NULL);
        }
        printf ("Returning from main\n");
        return 0;
}
************************** Compilation steps *********************
gcc -g -fPIC -shared -o lib2.so lib2.c &&
gcc -g -fPIC -shared -o lib1.so lib1.c ./lib2.so &&
gcc -g main.c ./lib1.so ./lib2.so -ldl -lpthread
*******************************************************************

With the above test case this issue is 100% reproducible.

(In reply to comment #17)
> I know this issue is closed, but for someone who needs to see the exact test
> case using which how this bug is reproduced by simple C test code, please find
> below simple C test case:
> ***************************** Source Code ***********************************
> yogesh$ cat lib1.c
> #include <stdio.h>
>
> int lib1_func()
> {
> return lib2_func();
> }
> ----------------------------------------------
> yogesh$ cat lib2.c
> #include <stdio.h>
>
> int lib2_func()
> {
> return 10;
> }
> ----------------------------------------------
> yogesh$ cat main.c
> #include <stdio.h>
> #include <dlfcn.h>
> #include <pthread.h>
>
> void *handle;
>
> static void *thread_abc()
> {
> handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
> void *func = dlsym (handle, "lib2_func");
> printf ("<thread_abc> Handle:%p, func:%p \n", handle, func);
> dlclose (handle);
> return NULL;
> }
>
> static void *thread_xyz()
> {
> handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
> void *func = dlsym (handle, "lib2_func");
> printf ("<thread_xyz> Handle:%p, func:%p \n", handle, func);
> dlclose (handle);
> return NULL;
> }
>
> int main()
> {
> pthread_t abc_arr[1000], xyz_arr[1000];
> int i=0;
> handle = dlopen ("./lib1.so", RTLD_LAZY | RTLD_GLOBAL);
> void *func = dlsym (handle, "lib2_func");
> printf ("<main> Handle:%p, func:%p \n", handle, func);
> for (i=0;i<10;i++)
> {
> pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
> pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
> }
>
> printf ("<main> Handle:%p, func:%p \n", handle, func);
> dlclose (handle);
>
> for (i=0;i<1000;i++)
> {
> pthread_create(&abc_arr[i], NULL, thread_abc, NULL);
> pthread_create(&xyz_arr[i], NULL, thread_xyz, NULL);
> }
> for (i=0;i<10;i++)
> {
> pthread_join(abc_arr[i], NULL);
> pthread_join(xyz_arr[i], NULL);
> }
> printf ("Returning from main\n");
> return 0;
> }
> ************************** Compilation steps *********************
> gcc -g -fPIC -shared -o lib2.so lib2.c &&
> gcc -g -fPIC -shared -o lib1.so lib1.c ./lib2.so &&
> gcc -g main.c ./lib1.so ./lib2.so -ldl -lpthread
> *******************************************************************
>
> With the above test case this issue is 100% reproducible.

Hi yogesh,
With this application in your comment, the issue reported in this bug does not repro in stock eglibc-2.15 from svn
svn co svn://svn.eglibc.org/branches/eglibc-2_15 eglibc-2.15

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.15/master has been updated
       via 1ba48eb07a72690406c0ffda642a963c88639752 (commit)
      from e8b5394afb420449dde0b4cbefd4032936d96a25 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1ba48eb07a72690406c0ffda642a963c88639752

commit 1ba48eb07a72690406c0ffda642a963c88639752
Author: Andreas Schwab <email address hidden>
Date: Fri Jun 22 11:10:31 2012 -0700

    Fix invalid memory access in do_lookup_x.

    [BZ #13579] Do not free l_initfini and allow it to be reused
    on subsequent dl_open calls for the same library. This fixes
    the invalid memory access in do_lookup_x when the previously
    free'd l_initfini was accessed through l_searchlist when a
    library had been opened for the second time.

    (cherry picked from commit 0479b305c5b7c8e3fa8e3002982cf8cac02b842e)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog | 11 +++++++++++
 NEWS | 3 ++-
 elf/dl-close.c | 15 +++------------
 elf/dl-deps.c | 7 ++++---
 elf/dl-libc.c | 9 ++++++---
 elf/rtld.c | 2 ++
 include/link.h | 8 ++++----
 7 files changed, 32 insertions(+), 23 deletions(-)

*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.

Changed in glibc (Fedora):
importance: Unknown → Undecided
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.