glibc: dlopen crash after a previously failed call to dlopen
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GLibC |
Fix Released
|
Medium
|
|||
glibc (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Environment
===========
Ubuntu 18.04.3 LTS
Linux 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
libc6:amd64 2.27-3ubuntu1
gcc 4:7.4.0-1ubuntu2.3
Steps to reproduce the crash
=======
(note: all libraries are linked with --no-as-needed to keep them as DT_NEEDED entries in the dynamic section, even though they are unused.)
1) create an empty library libNOTFOUND.so
2) create an empty library libB.so, linked to libNOTFOUND.so
3) create an empty library libA.so, linked to glibc's librt.so
4) create an empty library libPLUGIN.so, linked to libA.so and libB.so, set DT_RUNPATH to '$ORIGIN'
5) create an empty library libMAIN.so
6) create an executable, linked to libMAIN.so and libdl.so, set DT_RUNPATH to '$ORIGIN', this program calls:
a) dlopen("<absolute path to>/libPLUGIN.so")
b) dlopen("<absolute path to>/libMAIN.so")
Behaviour
=========
a) dlopen("<absolute path to>/libPLUGIN.so") fails because it cannot find libNOTFOUND.so via default search methods. This is wanted and OK!
b) dlopen("<absolute path to>/libMAIN.so") raises SIGSEGV somewhere deep inside the dynamic linking code of glibc (backtrace attached). Expected result: returns a valid handle to libMAIN.so.
Comments
========
Attached is a simple test script which does all the steps from above and also shows the workaround: Ensure that librt.so is loaded and fully initialized before the failing call to dlopen(
You can also replace librt.so with libpthread.so to reproduce this behaviour. Any other library I tried instead of librt.so (e.g. libm.so) does not trigger this bug.
I also attached a trace with LD_DEBUG=all. Here you can see that glibc tries to relocate librt.so while it loads libMAIN.so. I would expect that librt.so is loaded/relocated when libPLUGIN.so is dlopen'ed or that it is neither loaded nor relocated because libPLUGIN.so has unmet dependencies.
This example is a stripped down version of a real scenario where an application was misconfigured.
Changed in glibc: | |
importance: | Unknown → Medium |
status: | Unknown → Confirmed |
Changed in glibc: | |
status: | Confirmed → In Progress |
Changed in glibc: | |
status: | In Progress → Fix Released |
There are some cases in the implementation of dlopen where _dl_signal_error is called without removing all partially- initialized link maps. The downstream bug report refers to an error raised from _dl_map_object in response to a missing file (the final call to _dl_signal_error). We do some cleanup, but it seems we skip removal of a NODELETE object.
It's not clear to me if we should complete the initialization of the NODELETE object, or somehow arrange that we are always in a situation in which we can remove the NODELETE object without observable effects if we have to. The latter probably means that we cannot start running constructors and IFUNCs until all objects in the current link operation have been found, mapped, and all required ld.so data structures have at least been allocated.