Bug #1157603 “unit-tests built with clang crash in MirClientSurf...” : Bugs : Mir

Revision history for this message

In LLVM #12730, Cmaloney (cmaloney) wrote on 2012-05-02:

#2

$cat thread.cc
#include <thread>

void f(){}

int main()
{
std::thread t(f);
t.join();
}

$clang++ -v
clang version 3.2 (trunk)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$clang++ -std=c++11 thread.cc -o thread -pthread

$./thread
pure virtual method called
terminate called without an active exception
Aborted

Revision history for this message

In LLVM #12730, Cmaloney (cmaloney) wrote on 2012-05-09:

#3

I just updated clang to r156503, and it seems to work for me now. Closing.

Revision history for this message

In LLVM #12730, Cmaloney (cmaloney) wrote on 2012-05-09:

#4

Turns out it was just a fluke, and is indeed still broken for me. Not sure how/why it worked...

Revision history for this message

In LLVM #12730, Turkong (matias-fontanini) wrote on 2012-07-04:

#5

Created attachment 8832
Valgrind's output

Revision history for this message

In LLVM #12730, Turkong (matias-fontanini) wrote on 2012-07-04:

#6

I've cloned the svn repository and this problem persists. This is my clang version:

$ clang++ -v
clang version 3.2 (trunk 159725)
Target: x86_64-unknown-linux-gnu
Thread model: posix

It looks like the ref count of the shared_ptr that holds the internal representation of std::thread gets to 0, so the thread impl is getting destroyed. I can't figure out why this example does actually work on g++ 4.7, since both compilers use the same headers/libs. This could be a bug in libstdc++'s std::thread implementation, if this code didn't work fine on g++.

Revision history for this message

In LLVM #12730, mokabar (tim-klingt) wrote on 2012-10-13:

#7

same issue here:
clang version 3.2
Target: x86_64-unknown-linux-gnu
Thread model: posix

Revision history for this message

In LLVM #12730, Zhouyan1014 (zhouyan1014) wrote on 2012-11-19:

#8

Same problem with

clang version 3.3 (trunk 168286)
Target: x86_64-unknown-linux-gnu
Thread model: posix

libstdc++ come with GCC 4.7.2

I suspect this is a problem with some projects lifetime. Some objects' (possibly from libstdc++'s implementation of std::thread) member function gets called while it is being destroyed. I had seen similar problem with G++ before. But that is a long time ago using pthread without C++11. But the error message is suspiciously similar.

However debugger does little to help. I cannot figure out which member function has being called.

Revision history for this message

In LLVM #12730, A-me-w (a-me-w) wrote on 2012-12-19:

#9

Same problem with

clang version 3.2 (branches/release_32 170508)
Target: x86_64-unknown-linux-gnu
Thread model: posix
OS: Fedora 17
gcc version: 4.7.2 20120921 (Red Hat 4.7.2-2)

the backtrace:

#0 0x0000003e51a35935 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0000003e51a370e8 in __GI_abort () at abort.c:91
#2 0x0000003e5ce60dad in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x0000003e5ce5eea6 in __cxxabiv1::__terminate (handler=<optimized out>)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:40
#4 0x0000003e5ce5eed3 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:50
#5 0x0000003e5ce5f9ef in __cxxabiv1::__cxa_pure_virtual () at ../../../../libstdc++-v3/libsupc++/pure.cc:50
#6 0x0000003e5ceb29e0 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
at ../../../../../libstdc++-v3/src/c++11/thread.cc:73
#7 0x0000003e51e07d14 in start_thread (arg=0x7ffff7fe6700) at pthread_create.c:309
#8 0x0000003e51af168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

62 namespace
63 {
64 extern "C" void*
(gdb) l
65 execute_native_thread_routine(void* __p)
66 {
67 thread::_Impl_base* __t = static_cast<thread::_Impl_base*>(__p);
68 thread::__shared_base_type __local;
69 __local.swap(__t->_M_this_ptr);
70
71 __try
72 {
73 __t->_M_run(); //abort here
74 }
(gdb) l
75 __catch(...)
76 {
77 std::terminate();
78 }
79
80 return 0;
81 }
82 }
83

it seems that __t->_M_run is a pure virtual function, so the process have been terminated.

Revision history for this message

In LLVM #12730, Chip Salzenberg (chip-pobox) wrote on 2012-12-29:

#10

Download full text (4.1 KiB)

The bug shows up as a pure virtual call, but the real error is use-after-free caused by apparent miscompilation of shared_ptr<>.

I've done deep repro on this, and what I find is that when creating a new thread, libstdc++ std:thread creates an impl object containing the virtual _M_run() function to run the thread's code. An odd detail is that this impl object contains a shared_ptr<> to itself; it's a shared_ptr<base> inside the base type, and the created impl object is always a type derived from this base. This approach, while weird, ensures that the impl object stays alive until the new thread starts running and uses it. Obviously since the impl object contains the virtual _M_run() that actually runs the thread's code, this is quite important.

But this goes wrong because Clang destroys the impl in the parent thread, before the child thread can use it.

The apparently miscompiled code is in <thread>; I quote here from libstdc++ 4.7.2:

129 template<typename _Callable, typename... _Args>
130 explicit
131 thread(_Callable&& __f, _Args&&... __args)
132 {
133 _M_start_thread(_M_make_routine(std::__bind_simple(
134 std::forward<_Callable>(__f),
135 std::forward<_Args>(__args)...)));
136 }

Clang destroys the impl with a backtrace that starts at shared_ptr::_M_destroy() ends at line 135 above. Remember, since the impl contains a shared_ptr to *itself*, the reference count should never fall below one until the thread exits, so this destruction is a mistake.

Running under valgrind confirms that there is a use-after-free here.

I don't know why Clang would do this.

Why does this show up as a pure virtual call? Simple: In the process of orderly destruction, the impl's vtable is reset to its base class vtable before the object is finally freed. That base vtable contains a pure virtual function, which is subsequently called. (Even though the memory was freed, the vtable pointer seems to remain unchanged.) I suppose this symptom would not occur if e.g. the base impl class had a pure virtual constructor.

Here's the backtrace, showing premature destruction. We can be sure that this destruction is premature because it's supposed to happen in the child thread, not in the parent.

(gdb) where
#0 std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)1>::_M_destroy() (this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:418
#1 0x08048f4f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)1>::_M_release (this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:164
#2 0x08048ecb in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:558
#3 0x08048e97 in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:556
#4 0x08048fac in std::__shared_pt...

The bug shows up as a pure virtual call, but the real error is use-after-free caused by apparent miscompilation of shared_ptr<>.

I've done deep repro on this, and what I find is that when creating a new thread, libstdc++ std:thread creates an impl object containing the virtual _M_run() function to run the thread's code.  An odd detail is that this impl object contains a shared_ptr<> to itself; it's a shared_ptr<base> inside the base type, and the created impl object is always a type derived from this base.  This approach, while weird, ensures that the impl object stays alive until the new thread starts running and uses it.  Obviously since the impl object contains the virtual _M_run() that actually runs the thread's code, this is quite important.

But this goes wrong because Clang destroys the impl in the parent thread, before the child thread can use it.

The apparently miscompiled code is in <thread>; I quote here from libstdc++ 4.7.2:

129	    template<typename _Callable, typename... _Args>
130	      explicit 
131	      thread(_Callable&& __f, _Args&&... __args)
132	      {
133	        _M_start_thread(_M_make_routine(std::__bind_simple(
134	                std::forward<_Callable>(__f),
135	                std::forward<_Args>(__args)...)));
136	      }

Clang destroys the impl with a backtrace that starts at shared_ptr::_M_destroy() ends at line 135 above.  Remember, since the impl contains a shared_ptr to *itself*, the reference count should never fall below one until the thread exits, so this destruction is a mistake.

Running under valgrind confirms that there is a use-after-free here.

I don't know why Clang would do this.

Why does this show up as a pure virtual call?  Simple: In the process of orderly destruction, the impl's vtable is reset to its base class vtable before the object is finally freed.  That base vtable contains a pure virtual function, which is subsequently called.  (Even though the memory was freed, the vtable pointer seems to remain unchanged.)  I suppose this symptom would not occur if e.g. the base impl class had a pure virtual constructor.

Here's the backtrace, showing premature destruction.  We can be sure that this destruction is premature because it's supposed to happen in the child thread, not in the parent.

(gdb) where
#0  std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)1>::_M_destroy() (this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:418
#1  0x08048f4f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)1>::_M_release (this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:164
#2  0x08048ecb in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:558
#3  0x08048e97 in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:556
#4  0x08048fac in std::__shared_ptr<std::thread::_Impl_base, 1>::~__shared_ptr (this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:813
#5  0x08048f87 in std::shared_ptr<std::thread::_Impl_base>::~shared_ptr (this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr.h:93
#6  0x08048e07 in std::shared_ptr<std::thread::_Impl_base>::~shared_ptr (this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr.h:93
#7  0x08048cd8 in std::thread::thread<void (&)(), >(void (&)(void)) (this=0xbffff4c0, __f=@0x8048a60: {void (void)} 0x8048a60 <f()>)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/thread:133
#8  0x08048c14 in std::thread::thread<void (&)(), >(void (&)(void)) (this=0xbffff4c0, __f=@0x8048a60: {void (void)} 0x8048a60 <f()>)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/thread:135
#9  0x08048a8c in main () at test.cc:9

Revision history for this message

In LLVM #12730, Chip Salzenberg (chip-pobox) wrote on 2012-12-29:

#11

Created attachment 9770
backtrace of premature destruction

Revision history for this message

In LLVM #12730, Ryan Prichard (ryan-prichard) wrote on 2013-01-06:

#12

I think the root problem is that GCC is defining the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros, whereas Clang is not.

In libstdc++, std::shared_ptr<T> is derived from
std::__shared_ptr<T, __gnu_cxx::__default_lock_policy>. __default_lock_policy is a compile-time constant defined in the include/ext/concurrence.h header. The
value varies depending upon which macros are defined:

  // Compile time constant that indicates prefered locking policy in
  // the current configuration.
  static const _Lock_policy __default_lock_policy =
#ifdef __GTHREADS
#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
     && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
  _S_atomic;
#else
  _S_mutex;
#endif
#else
  _S_single;
#endif

With G++ on x86 and x86_64, the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros are defined by the compiler, in gcc/c-family/c-cppbuiltin.c, so __default_lock_policy is _S_atomic.

With Clang, the macros are not defined, so __default_lock_policy is _S_mutex.

(The built-in macros can be shown by passing -dM -E to gcc/clang. e.g.: gcc -dM -E - < /dev/null)

With the thread.c test case, large parts of libstdc++ are inlined and therefore compiled by Clang. The std::thread constructor calls std::thread::_M_make_routine, which creates an std::__shared_ptr<T, _S_mutex> object. The constructor passes this object to std::thread::_M_start_thread, which is an out-of-line function in libstdc++.so.6. std::thread::_M_start_thread expects an std::__shared_ptr<T, _S_atomic> object. I think this mismatch is what causes the premature destruction of the std::thread::_Impl object.

Here are the most relevant bits from the 4.7.2 include/std/thread header:

  class thread
  {
  public:
    ...
    struct _Impl_base;
    typedef shared_ptr<_Impl_base> __shared_base_type;
    ...
    template<typename _Callable, typename... _Args>
      explicit
      thread(_Callable&& __f, _Args&&... __args)
      {
        _M_start_thread(_M_make_routine(std::__bind_simple(
                std::forward<_Callable>(__f),
                std::forward<_Args>(__args)...)));
      }
    ...
  private:
    void
    _M_start_thread(__shared_base_type);

    template<typename _Callable>
      shared_ptr<_Impl<_Callable>>
      _M_make_routine(_Callable&& __f)
      {
// Create and allocate full data structure, not base.
return std::make_shared<_Impl<_Callable>>
            (std::forward<_Callable>(__f));
      }
  };

If I define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros when I build the test case with Clang, then the test case passes, and valgrind is happy.

This ABI issue resulted in an (invalid) gcc bug report,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42734. There seems to be some acknowledgement there that the macros are expected to change the libstdc++ ABI.

I think the driver needs to define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros for libstdc++ compatibility. There is an issue open regarding these macros, but it does not mention ABI compatibility,
http://llvm.org/bugs/show_bug.cgi?id=11174.

I think the root problem is that GCC is defining the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros, whereas Clang is not.

In libstdc++, std::shared_ptr<T> is derived from
std::__shared_ptr<T, __gnu_cxx::__default_lock_policy>.  __default_lock_policy is a compile-time constant defined in the include/ext/concurrence.h header.  The
value varies depending upon which macros are defined:

// Compile time constant that indicates prefered locking policy in
  // the current configuration.
  static const _Lock_policy __default_lock_policy = 
#ifdef __GTHREADS
#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
     && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
  _S_atomic;
#else
  _S_mutex;
#endif
#else
  _S_single;
#endif

With G++ on x86 and x86_64, the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros are defined by the compiler, in gcc/c-family/c-cppbuiltin.c, so __default_lock_policy is _S_atomic.

With Clang, the macros are not defined, so __default_lock_policy is _S_mutex.

(The built-in macros can be shown by passing -dM -E to gcc/clang.  e.g.: gcc -dM -E - < /dev/null)

With the thread.c test case, large parts of libstdc++ are inlined and therefore compiled by Clang.  The std::thread constructor calls std::thread::_M_make_routine, which creates an std::__shared_ptr<T, _S_mutex> object.  The constructor passes this object to std::thread::_M_start_thread, which is an out-of-line function in libstdc++.so.6.  std::thread::_M_start_thread expects an std::__shared_ptr<T, _S_atomic> object.  I think this mismatch is what causes the premature destruction of the std::thread::_Impl object.

Here are the most relevant bits from the 4.7.2 include/std/thread header:

class thread
  {
  public:
    ...
    struct _Impl_base;
    typedef shared_ptr<_Impl_base>	__shared_base_type;
    ...
    template<typename _Callable, typename... _Args>
      explicit 
      thread(_Callable&& __f, _Args&&... __args)
      {
        _M_start_thread(_M_make_routine(std::__bind_simple(
                std::forward<_Callable>(__f),
                std::forward<_Args>(__args)...)));
      }
    ...
  private:
    void
    _M_start_thread(__shared_base_type);

template<typename _Callable>
      shared_ptr<_Impl<_Callable>>
      _M_make_routine(_Callable&& __f)
      {
	// Create and allocate full data structure, not base.
	return std::make_shared<_Impl<_Callable>>  
            (std::forward<_Callable>(__f));
      }
  };

If I define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros when I build the test case with Clang, then the test case passes, and valgrind is happy.

This ABI issue resulted in an (invalid) gcc bug report,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42734.  There seems to be some acknowledgement there that the macros are expected to change the libstdc++ ABI.

I think the driver needs to define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros for libstdc++ compatibility.  There is an issue open regarding these macros, but it does not mention ABI compatibility,
http://llvm.org/bugs/show_bug.cgi?id=11174.

Revision history for this message

In LLVM #12730, Chip Salzenberg (chip-pobox) wrote on 2013-01-06:

#13

Ryan, you are a gentleman and a scholar. Thanks

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2013-03-28:

#1

This is caused by http://llvm.org/bugs/show_bug.cgi?id=12730

Defining the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros solves this problem, but I am not convinced that this is the right way forward.

Changed in mir:
status:	New → Confirmed

Daniel van Vugt (vanvugt) on 2013-04-02

Changed in mir:
status:	Confirmed → Triaged

Bug Watch Updater (bug-watch-updater) on 2013-04-02

Changed in llvm:
importance:	Unknown → Medium
status:	Unknown → Confirmed

Revision history for this message

In LLVM #12730, Bigcheesegs (bigcheesegs) wrote on 2013-04-04:

#14

Fixed in r178816.

Bug Watch Updater (bug-watch-updater) on 2013-04-07

Changed in llvm:
status:	Confirmed → Fix Released

Revision history for this message

Brandon Schaefer (brandontschaefer) wrote on 2013-05-28:

#15

Im getting the pure virtual method call as well when releasing the connection to the server:
http://paste.ubuntu.com/5695233/

Here is the video quit code (though its simple...):
http://paste.ubuntu.com/5711507/

And how I connect to the server:
this->hidden->connection = mir_connect_sync(NULL, __PRETTY_FUNCTION__);

Daniel van Vugt (vanvugt) on 2013-07-05

tags:

added: clang

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2013-07-08:

#16

Fixed in saucy's clang: 3.2-7ubuntu1

Changed in mir:
status:	Triaged → Invalid
Changed in clang (Ubuntu):
status:	New → Fix Released
importance:	Undecided → Medium

Revision history for this message

In LLVM #12730, S-llvm (s-llvm) wrote on 2013-07-15:

#17

@Michael Spencer: Is there a good reason why your fix adding the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros in SVN rev 178816 is x86 only?

If you look at http://sourceware.org/ml/libc-ports/2010-11/msg00005.html you'll see that on ARM the use of atomic builtins only happens if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 is set.

This rather implies the need for every architecture target, not just x86, to set __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros according to whether that target provides CAS ops of varying sizes.

FYI I just got bit on this bug but with an ARM target, hence me finding this report.

Niall

Bug Watch Updater (bug-watch-updater) on 2013-07-15

Changed in llvm:
status:	Fix Released → Confirmed

Revision history for this message

In LLVM #12730, Weimingz (weimingz) wrote on 2013-09-01:

#18

I posted the patch for ARM to clang-commits maillist.

Revision history for this message

In LLVM #12730, Weimingz (weimingz) wrote on 2013-09-01:

#19

please help to review it:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130826/087642.html

Revision history for this message

In LLVM #12730, Weimingz (weimingz) wrote on 2013-09-24:

#20

For x86, it looks "__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8" only apply to i586 or above. So we can't define the 4 macros for all targets.
I don't know if it is safe to define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{1,2,4} for all targets.

And I can't find a way to compute the macros for each target either

Revision history for this message

In LLVM #12730, S-llvm (s-llvm) wrote on 2013-09-26:

#21

(In reply to comment #16)
> For x86, it looks "__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8" only apply to i586 or
> above. So we can't define the 4 macros for all targets.
> I don't know if it is safe to define
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{1,2,4} for all targets.
>
> And I can't find a way to compute the macros for each target either

ARMv7a has 64 bit atomics, so you can turn on __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 for that and above.

Niall

Revision history for this message

In LLVM #12730, Weimingz (weimingz) wrote on 2013-09-26:

#22

Should it be tied to MaxAtomicPromoteWidth ?
However, the value of MaxAtomicPromoteWidth is Target specific, not CPU specific.

Revision history for this message

In LLVM #12730, Weimingz (weimingz) wrote on 2013-09-30:

#23

ARM part: patched in r191707

Mir

unit-tests built with clang crash in MirClientSurfaceTest.client_buffer_created_on_surface_creation

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
LLVM	Confirmed	Medium	llvm #12730
Mir	Invalid	Medium	Unassigned
clang (Ubuntu)	Fix Released	Medium	Unassigned