LD_AUDIT is broken on amd64

Bug #1243473 reported by Geoffrey Thomas on 2013-10-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eglibc (Ubuntu)
Undecided
Adam Conrad
Precise
Undecided
Unassigned
Quantal
Undecided
Unassigned

Bug Description

[Impact]
LD_AUDIT is an "auditing interface" for the dynamic linker (ld.so), which allows an audit library specified in that environment variable to register hooks for loading and unloading objects, resolving relocations, and calling functions across dynamic libraries. It is particularly useful as a debugging tool; the command "latrace" is based on this functionality. It is also useful for making certain runtime changes to how libraries or symbols are resolved, similar to LD_PRELOAD but more powerful. See rtld-audit(7) for details.

In glibc before 2.17 (i.e., Precise and Quantal), on amd64, almost any use of LD_AUDIT on amd64 crashes with the following backtrace:

[1538449.702152] python[13400]: segfault at 60 ip 00007fdeaa97e8a3 sp 00007ffffd50bb00 error 4 in ld-2.15.so[7fdeaa970000+22000]
(gdb) bt
#0 _dl_profile_fixup (l=0x7fdeaab679d8, reloc_arg=3, retaddr=140594303358361, regs=0x7ffffd50bbd0, framesizep=0x7ffffd50bf28) at ../elf/dl-runtime.c:177
#1 0x00007fdeaa9856e8 in _dl_runtime_profile () at ../sysdeps/x86_64/dl-trampoline.h:49
...

In particular, l->l_reloc_result is NULL, and ld.so proceeds to dereference it.

That code has since been patched upstream with the following comment:
  if (l->l_reloc_result == NULL)
    {
      /* BZ #14843: ELF_DYNAMIC_RELOCATE is called before l_reloc_result
         is allocated. We will get here if ELF_DYNAMIC_RELOCATE calls a
         resolver function to resolve an IRELATIVE relocation and that
         resolver calls a function that is not yet resolved (lazy). For
         example, the resolver in x86-64 libm.so calls __get_cpu_features
         defined in libc.so. Skip audit and resolve the external function
         in this case. */

The referenced upstream bug (which unfortunately doesn't mention that it's fixed) is
http://sourceware.org/bugzilla/show_bug.cgi?id=14843
The full upstream commit is
http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2e64d2659d3edaebc792ac596a9863f1626e5c25
and only adds that one if statement and a test case.

[Test Case]
Install the latrace package, and try to trace anything that links libm.so (as described above) or anything else using the same functionality. `python -c 1` is a good test. Note that it segfaults:

salmon-of-wisdom:/tmp gthomas$ latrace python -c 1

python finished - killed by signal 11

The expected result is tracing output. Taken from an schroot with the fix applied:

(precise-amd64)root@salmon-of-wisdom:/tmp# latrace python -c 1
30242 _dl_get_tls_static_info [/lib64/ld-linux-x86-64.so.2]
30242 getrlimit [/lib/x86_64-linux-gnu/libc.so.6]
30242 __libc_dl_error_tsd [/lib/x86_64-linux-gnu/libc.so.6]
30242 __libc_pthread_init [/lib/x86_64-linux-gnu/libc.so.6]
...

You can also test this with a simple LD_AUDIT module:

salmon-of-wisdom:/tmp gthomas$ cat audit.c
unsigned int la_version(unsigned int version)
{
        return version;
}
salmon-of-wisdom:/tmp gthomas$ gcc -fPIC -shared -o audit.so audit.c
salmon-of-wisdom:/tmp gthomas$ LD_AUDIT=/tmp/audit.so python -c 1
Segmentation fault (core dumped)

[Regression Potential]
It seems highly unlikely to me that this patch introduces the possibility of regression: it checks for a NULL in a case where ld.so was previously not checking and instead dereferencing the NULL pointer, so we were already going to crash if we hit the code added by this patch.

Geoffrey Thomas (geofft) wrote :

Here's a debdiff that backports that one commit from upstream. I've tested that it fixes the bug inside a Precise chroot. I'm also currently rebuilding on Quantal, which is what I use on my desktop at the moment, and run with that for a bit.

Adam Conrad (adconrad) on 2013-10-23
Changed in eglibc (Ubuntu):
assignee: nobody → Adam Conrad (adconrad)
status: New → Fix Released
Geoffrey Thomas (geofft) wrote :

Hm, I guess I should mention that latrace -A (trace arguments) still segfaults on my Quantal machine with this patch applied:

salmon-of-wisdom:~ gthomas$ latrace -A true

true finished - killed by signal 11
salmon-of-wisdom:~ gthomas$ dmesg | tail -1
[2319682.778709] true[26822]: segfault at 15 ip 00007f64ed793c1d sp 00007fffd3b16670 error 4 in libc-2.15.so[7f64ed6a1000+1b5000]

(raring-amd64)root@salmon-of-wisdom:/home/gthomas# latrace -A true
27006 __libc_start_main(main = 0x401340, argc = 1, ubp_av = 0x7fff858118d8, auxvec = 0x403d60, init = 0x403df0, fini = 0x7f5ba8461d60, rtld_fini = 0x7fff858118c8) [/lib/x86_64-linux-gnu/libc.so.6] {
27006 exit(status = 0) [/lib/x86_64-linux-gnu/libc.so.6] {

true finished - exited, status=0

So maybe there's another patch that ought to be cherry-picked to make everything work. (Quantal and Raring have the same version of latrace, so it's probably not an latrace bug.)

Apart from that, the system has been stable with this patch, as I expected, and I've been writing a bunch of LD_AUDIT code and not having trouble. And use of latrace in general, without -A, works.

Rolf Leggewie (r0lf) wrote :

quantal has seen the end of its life and is no longer receiving any updates. Marking the quantal task for this ticket as "Won't Fix".

Changed in eglibc (Ubuntu Quantal):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers