Comment 23 for bug 1375555

Revision history for this message
In , Carlos (carlos-redhat-bugs) wrote :

(In reply to Mikko Tiihonen from comment #14)
> cat readelf.txt | egrep 'readelf|STATIC_TLS' | grep TLS -B1 | grep readelf |
> cut -d/ -f2-
> 272 /lib64/libc.so.6
> 108 /lib64/libgomp.so.1
> 32 /lib64/libglapi.so.0
> 8 /lib64/libEGL.so.1
> 8 /lib64/libGL.so.1
> 0 /lib64/libcrypt.so.1
> 0 /lib64/libm.so.6
> 0 /lib64/libnsl.so.1
> 0 /lib64/libnss_files.so.2
> 0 /lib64/libpthread.so.0
> 0 /lib64/libresolv.so.2
> 0 /lib64/librt.so.1
> 0 /lib64/libutil.so.1

That many libraries should not overflow the reserved static TLS slots in the DTV. That is only 13 slots. We have 14 surplus slots.

The original error is:

"RuntimeError: Failed to load ImageMagick: dlopen: cannot load any more object with static TLS"

Which indicates overflow of the slots not size of the allocated static TLS block itself.

For the record DL_NNS should be 16, so we should hvae 102,400 bytes of static surplus storage.

> The 0 byte STATIC_TLS usage .so files have references to glibc variables
> such as so I did not count them.
> 0 TLS GLOBAL DEFAULT UND errno@GLIBC_PRIVATE (4)

These are references to static TLS varaibles in other modules e.g. libc.so.6, and because of those references the entire access type for the module is adjusted to be static TLS. Their size doesn't matter for now.

> If I googled right the .so libraries tagged STATIC_TLS fail to load if their
> TLS section does not fit into the static section. Others prefer static TLS
> (and thus use it even if not tagged static).

Libraries don't prefer static TLS, they must be compiled for it.

No shared libraries should be built with static TLS. I'm going to make it my quest to ban anything but the implementation from using static TLS in libraries because it leads to unmaintainable chaos at the distribution level :-(

Maybe a few key libraries might be allowed...

> The actual .so load order (for libraries with TLS section) is:
> /lib64/libc.so.6
> /lib64/libcom_err.so.2
> /lib64/libselinux.so.1
> /lib64/libstdc++.so.6
> /lib64/libQtCore.so.4
> /lib64/libuuid.so.1
> /lib64/libasound.so.2
> /lib64/libGL.so.1
> /lib64/libglapi.so.0
> /lib64/libsystemd.so.0
> /lib64/libdw.so.1
> /lib64/libelf.so.1
> /lib64/libpixman-1.so.0
> /lib64/libEGL.so.1
> /lib64/libgomp.so.1
>
> Looking at the above lists my conclusion is that libgomp.so is the library
> that fails to load (the python tries to load libMagickCore-6.Q16.so.2, which
> depends on libgomp). But the libselinux and libpixman have already managed
> to store their large TLS sections into the static block.

Neither libselinux nor libpixman have static TLS AFAIK. How did you determine they did?

> Possible solutions:
> 1) make the calibre python somehow not load libselinux (could such switch be
> added to python?)

That's not a solution since libselinux doesn't use static TLS.

> 2) make the calibre load the libgomp/libMagicCore earlier, before libpixman
> (most likely libselinux is loaded so early that it cannot be avoided)

Not an option.

> 3) add a new flag RTLD_NO_AUTOMATIC_STATIC_TLS for dlopen function to _not_
> use static TLS for libraries unless they request it with STATIC_TLS. And
> then make python load libraries with the new flag

You are misunderstanding how static TLS works.

The compiler is either told to use static TLS in which case the DSOs generated code *depends* upon it, and the DSO is marked with the STATIC_TLS dynamic section flag.

Or

The compiler is told not to use static TLS (the default) in which case the DSOs generated code uses a mode that allows it to be loaded fully dynamically.

There is no way to undo static TLS requirements without recompiling the DSO.

> 4) recompile the fedora to have larger static TLS section (the glibc seems
> to define TLS_STATIC_SURPLUS as 64+DL_NNS*100, where DL_NNS is either 1 or
> 16)

Users will continue to build more libraries with static tls until you run out of room.

I think we can increase DTV_SURPLUS and TLS_STATIC_SURPLUS slightly, but we need to better understand which libraries are using it and get them to stop or figure out why they need it.

From this list:

> 272 /lib64/libc.so.6
- OK. May use STATIC_TLS, it's part of the implementation.

> 108 /lib64/libgomp.so.1
- OK. May use STATIC_TLS, it's part of the implementation. Language runtime support for gomp shared across all programs.

> 32 /lib64/libglapi.so.0
> 8 /lib64/libEGL.so.1
> 8 /lib64/libGL.so.1
- They should not use static TLS.

e.g.

[carlos@koi mesa-c40d7d6d948912a4d51cbf8f0854cf2ebe916636]$ grep -r 'initial-exec' *
docs/dispatch.html: __attribute__((tls_model("initial-exec")));
src/glx/glxcurrent.c:__thread void *__glX_tls_Context __attribute__ ((tls_model("initial-exec")))
src/glx/glxclient.h: __attribute__ ((tls_model("initial-exec")));
src/egl/main/eglcurrent.c: __attribute__ ((tls_model("initial-exec")));
src/mesa/drivers/dri/common/dri_test.c: __attribute__((tls_model("initial-exec")));
src/mesa/drivers/dri/common/dri_test.c: __attribute__((tls_model("initial-exec")));
src/mapi/u_current.c: __attribute__((tls_model("initial-exec")))
src/mapi/u_current.c: __attribute__((tls_model("initial-exec")));
src/mapi/u_current.h: __attribute__((tls_model("initial-exec")));
src/mapi/u_current.h: __attribute__((tls_model("initial-exec")));
src/mapi/glapi/glapi.h: __attribute__((tls_model("initial-exec")));
src/mapi/glapi/glapi.h: __attribute__((tls_model("initial-exec")));

I know why they are using it. They want speed and force the model.

> 0 /lib64/libcrypt.so.1
> 0 /lib64/libm.so.6
> 0 /lib64/libnsl.so.1
> 0 /lib64/libnss_files.so.2
> 0 /lib64/libpthread.so.0
> 0 /lib64/libresolv.so.2
> 0 /lib64/librt.so.1
> 0 /lib64/libutil.so.1

- OK, all part of the implementation (glibc).

That's only 13 libraries though and we have max counted slots + 14.

I'm going to have to debug this myself to figure out what's wrong.

Maybe it fails as we are loding the Nth library, but doesn't get time to display that information.

Have you tried building a glibc with DTV_SURPLUS increased to 64?

e.g.
diff -urN glibc-2.19-883-g7e54fd0/sysdeps/generic/ldsodefs.h glibc-2.19-883-g7e54fd0.mod/sysdeps/generic/ldsodefs.h
--- glibc-2.19-883-g7e54fd0/sysdeps/generic/ldsodefs.h 2014-08-13 12:24:07.000000000 -0400
+++ glibc-2.19-883-g7e54fd0.mod/sysdeps/generic/ldsodefs.h 2014-08-19 23:52:33.636202348 -0400
@@ -389,7 +389,7 @@
 #define TLS_SLOTINFO_SURPLUS (62)

 /* Number of additional slots in the dtv allocated. */
-#define DTV_SURPLUS (14)
+#define DTV_SURPLUS (64)

   /* Initial dtv of the main thread, not allocated with normal malloc. */
   EXTERN void *_dl_initial_dtv;

Does it help? Can you get a trace of all the libraries using STATIC_TLS?

If I built you a scratch glibc would you try it?

Here is a scratch build with DTS_SURPLUS set to 64:
http://koji.fedoraproject.org/koji/taskinfo?taskID=7429595