pthread_join failure

Bug #1427981 reported by Peng Tao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

pthread_join() appears to behave differently depending on gcc optimization is on or off. Build the attached source file with gcc -O2 or gcc -O0 has shown different joining results.

E.g., with `gcc pthread_join.c -lpthread -g -O2` it passes without issue.

with `gcc pthread_join.c -lpthread -g -O0`, the program fails at pthread_join() with ESRCH.

The same does not happen on other distros (tested on CentOS and Fedora).

Some additional info:

[macbeth@tests]$cat /etc/issue
Ubuntu 14.04 LTS \n \l

[macbeth@tests]$gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[macbeth@tests]$dpkg -l libc6
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-========================-========================-==================================================================================
ii libc6:amd64 2.19-0ubuntu6 amd64 Embedded GNU C Library: Shared libraries

Tags: trusty
Revision history for this message
Peng Tao (bergwolf) wrote :
Revision history for this message
Peng Tao (bergwolf) wrote :

Another thing to note, which might be root cause of the bug, is that join_func() appears to be called multiple times according to the program output:

Joining thread 0
thread 0 joined!
Joining thread 1
thread 0 joined!
Joining thread 1

There is no way the program would want to join a thread multiple times.

tags: added: trusty
Revision history for this message
Adam Conrad (adconrad) wrote :

What versions of glibc and gcc were you testing on CentOS and Fedora? "Doesn't happen on other distros" isn't very helpful without that.

Revision history for this message
Peng Tao (bergwolf) wrote :

Here is what I have on CentOS:

[lear@libprotod]$gcc --version
gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[lear@libprotod]$rpm -qa|grep glibc
glibc-headers-2.17-55.el7_0.5.x86_64
glibc-devel-2.17-55.el7_0.5.x86_64
glibc-common-2.17-55.el7_0.5.x86_64
glibc-debuginfo-common-2.17-55.el7_0.5.x86_64
glibc-2.17-55.el7_0.5.x86_64
glibc-debuginfo-2.17-55.el7_0.5.x86_64

You might be interested in CentOS gcc default options as well:
[lear@libprotod]$gcc -Q -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.3/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.3-20140911/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC)

Revision history for this message
Peng Tao (bergwolf) wrote :

OK, it seems to be that pthread_join() is not working properly when --fstack-protector is enabled, which is true on Ubuntu but false on CentOS.

If I pass --fno-stack-protector, the test case passes in all optimization levels.

[macbeth@tests]$gcc pthread_join.c -lpthread -fno-stack-protector -O0
[macbeth@tests]$./a.out
Joining thread 0
thread 0 joined!
Joining thread 1
thread 1 joined!
Joining thread 2
thread 2 joined!
Joining thread 3
thread 3 joined!
Joining thread 4
thread 4 joined!
Joining thread 5
thread 5 joined!
Joining thread 6
thread 6 joined!
Joining thread 7
thread 7 joined!
Joining thread 8
thread 8 joined!
Joining thread 9
thread 9 joined!
Joining thread 10
thread 10 joined!
Joining thread 11
thread 11 joined!
Joining thread 12
thread 12 joined!
Joining thread 13
thread 13 joined!
Joining thread 14
thread 14 joined!
Joining thread 15
thread 15 joined!
Joining thread 16
thread 16 joined!
Joining thread 17
thread 17 joined!
Joining thread 18
thread 18 joined!
Joining thread 19
thread 19 joined!
join_func: 20 ran successfully

And if I pass -fstack-protector on CentOS, the case fails as well.

Revision history for this message
Jan-Benedict Glaw (jbglaw) wrote :

I saw ESRCH problems on one of our applications, which unfortunately leads to aborts() from within libmicrohttpd under certain circumstances.

Looking at your sample code, I guess that its main problem is the way you're receiving the thread return value off pthread_join(). Look carefully at it, you're getting a void * assigned to an integer (submitted through the void ** supplied to pthread_join())! So that int shall hold all of a pointer, which formally won't work, and may or may not work, depending on whether or not you're on a 32 bit architecture. You are, however, on a amd64 host, so the program isn't correct here. -f{no-,}stack-protector may change the stack layout, that's why you're seeing differences in testcase execution.

That said, on success, your index variable (i) is overwritten by the NULL pointer received from the thread function on success. That's why you always start with i=0 after a successful pthread_join().

That, unfortunately, invalidates the test case, and I need to continue to find the root cause of my very own problem. ;-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.