Recent rtti bugfix causes segfault when importing dolfin-adjoint

Bug #1085986 reported by Martin Sandve Alnæs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DOLFIN
Fix Released
Undecided
Unassigned
dolfin-adjoint
Fix Released
Undecided
Unassigned
libadjoint
Fix Released
Undecided
Unassigned

Bug Description

Running just:

import dolfin
import dolfin_adjoint

gives

[martinal-mac:04342] *** Process received signal ***
[martinal-mac:04342] Signal: Segmentation fault (11)
[martinal-mac:04342] Signal code: Invalid permissions (2)
[martinal-mac:04342] Failing at address: 0x7f91f457bc60
[martinal-mac:04342] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f91f7027cb0]
[martinal-mac:04342] [ 1] /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0) [0x7f91f457bc60]
[martinal-mac:04342] *** End of error message ***
Segmentation fault (core dumped)

as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180 (-r7158.2.22 in current dolfin trunk) with message
"""Add RTLD_GLOBAL to python ld loader. This will make all types available
for all other dynamically loaded modules. This should fix problems with
dynamic_cast encountered with templated dolfin types.
  -- Is this also a problem on other platforms than linux?"""

Related branches

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :
Download full text (5.7 KiB)

Of course, if I revert the RTLD fix, plotting now crashes on my newly installed ubuntu 12.10... Showing stacktrace below, is this the same issue that the buildbots had before the RTLD fix?

(gdb) where
#0 0x00007ffff6f1a425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff6f1db8b in __GI_abort () at abort.c:91
#2 0x00007ffff189ee2d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff189cf26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff189cf53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff189da6f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fffb370a49e in llvm::BumpPtrAllocator::DeallocateSlabs(llvm::MemSlab*) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#7 0x00007fffb2f2a2ea in llvm::MemoryDependenceAnalysis::~MemoryDependenceAnalysis() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#8 0x00007fffb2f2a4c9 in llvm::MemoryDependenceAnalysis::~MemoryDependenceAnalysis() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#9 0x00007fffb326b7a6 in llvm::PMDataManager::~PMDataManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#10 0x00007fffb3270fc5 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#11 0x00007fffb32680ce in llvm::PMTopLevelManager::~PMTopLevelManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#12 0x00007fffb3271096 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#13 0x00007fffb3267e71 in llvm::FunctionPassManager::~FunctionPassManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#14 0x00007fffb3267ec9 in llvm::FunctionPassManager::~FunctionPassManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
#15 0x00007fffc13ef035 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#16 0x00007fffc13ef3b9 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#17 0x00007fffc13d218f in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#18 0x00007fffc13d25ac in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#19 0x00007fffc1265ef0 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#20 0x00007fffc1266d7a in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#21 0x00007fffc132bf05 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#22 0x00007fffc126d73e in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#23 0x00007fffc126e9b1 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#24 0x00007fffc126ea5b in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#25 0x00007fffc126d8bb in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#26 0x00007fffc11c67da in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#27 0x00007fffc1263e22 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#28 0x00007fffc11af9ea in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#29 0x00007fffc117d903 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
#30 0x00007fffe013fdbc in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#31 0x00007fffe01190fb in glXMakeCurrentReadSGI () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#32 0x00007fffe8733a22 in vtkXOpenGLRe...

Read more...

Revision history for this message
Johan Hake (johan-hake) wrote : Re: [Bug 1085986] [NEW] Recent rtti bugfix causes segfault when importing dolfin-adjoint

This is most probably caused by PyArray_API not being a unique symbol,
when RTLD_GLOBAL is set.

In our own library this is fixed by including

#define PY_ARRAY_UNIQUE_SYMBOL PyDOLFIN_FOO

where FOO is the name of each swig module. Then if I have not
missunderstood this, behind the scenes PyArray_API will get the define
appended to its name, making it unique.

I suggest libadjoint folks to try define this symbol in their c-api
before numpy gets initialized.

Johan

On 12/03/2012 03:32 PM, Martin Sandve Alnæs wrote:
> Public bug reported:
>
> Running just:
>
> import dolfin
> import dolfin_adjoint
>
> gives
>
> [martinal-mac:04342] *** Process received signal ***
> [martinal-mac:04342] Signal: Segmentation fault (11)
> [martinal-mac:04342] Signal code: Invalid permissions (2)
> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
> [martinal-mac:04342] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f91f7027cb0]
> [martinal-mac:04342] [ 1] /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0) [0x7f91f457bc60]
> [martinal-mac:04342] *** End of error message ***
> Segmentation fault (core dumped)
>
> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180 (-r7158.2.22 in current dolfin trunk) with message
> """Add RTLD_GLOBAL to python ld loader. This will make all types available
> for all other dynamically loaded modules. This should fix problems with
> dynamic_cast encountered with templated dolfin types.
> -- Is this also a problem on other platforms than linux?"""
>
> ** Affects: dolfin
> Importance: Undecided
> Status: New
>

Revision history for this message
Johan Hake (johan-hake) wrote : Re: [Bug 1085986] Re: Recent rtti bugfix causes segfault when importing dolfin-adjoint
Download full text (6.2 KiB)

It might be related. However, the error we got was just a failing
dynamic_cast. Looks like you triggered something within VTK...

Johan

On 12/03/2012 03:57 PM, Martin Sandve Alnæs wrote:
> Of course, if I revert the RTLD fix, plotting now crashes on my newly
> installed ubuntu 12.10... Showing stacktrace below, is this the same
> issue that the buildbots had before the RTLD fix?
>
> (gdb) where
> #0 0x00007ffff6f1a425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x00007ffff6f1db8b in __GI_abort () at abort.c:91
> #2 0x00007ffff189ee2d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3 0x00007ffff189cf26 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4 0x00007ffff189cf53 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5 0x00007ffff189da6f in __cxa_pure_virtual () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6 0x00007fffb370a49e in llvm::BumpPtrAllocator::DeallocateSlabs(llvm::MemSlab*) () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #7 0x00007fffb2f2a2ea in llvm::MemoryDependenceAnalysis::~MemoryDependenceAnalysis() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #8 0x00007fffb2f2a4c9 in llvm::MemoryDependenceAnalysis::~MemoryDependenceAnalysis() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #9 0x00007fffb326b7a6 in llvm::PMDataManager::~PMDataManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #10 0x00007fffb3270fc5 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #11 0x00007fffb32680ce in llvm::PMTopLevelManager::~PMTopLevelManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #12 0x00007fffb3271096 in ?? () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #13 0x00007fffb3267e71 in llvm::FunctionPassManager::~FunctionPassManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #14 0x00007fffb3267ec9 in llvm::FunctionPassManager::~FunctionPassManager() () from /usr/lib/x86_64-linux-gnu/libLLVM-3.1.so.1
> #15 0x00007fffc13ef035 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #16 0x00007fffc13ef3b9 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #17 0x00007fffc13d218f in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #18 0x00007fffc13d25ac in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #19 0x00007fffc1265ef0 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #20 0x00007fffc1266d7a in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #21 0x00007fffc132bf05 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #22 0x00007fffc126d73e in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #23 0x00007fffc126e9b1 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #24 0x00007fffc126ea5b in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #25 0x00007fffc126d8bb in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #26 0x00007fffc11c67da in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #27 0x00007fffc1263e22 in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #28 0x00007fffc11af9ea in ?? () from /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
> #29 0x00007fffc117d903 in ?? () from...

Read more...

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Sounds like a recipe for future crashes with other python c extensions...

Revision history for this message
Patrick Farrell (pefarrell) wrote :

Johan: isn't this a numpy bug, really? What does it have to do with libadjoint?

The libadjoint C-Python interface is generated via ctypes anyway -- it's all Python code. I don't know where I would put that #define workaround ...

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Discussed this a bit with Benjamin, we agree that it's probably safer overall to work around the dynamic cast with a type string in the Variable interface. Or an enum, but a string gets away with the need for a global list.

Revision history for this message
Johan Hake (johan-hake) wrote :

Sure in principle I agree. It would require some hard thinking because
the Variable hierarchy is not small. Including information about
castability between templated MeshFunctions, CellFunctions, aso can
potentially be very hard, if not messy.

For now dynamics_cast accomplish this for us.

Johan

On 12/03/2012 05:26 PM, Martin Sandve Alnæs wrote:
> Discussed this a bit with Benjamin, we agree that it's probably safer
> overall to work around the dynamic cast with a type string in the
> Variable interface. Or an enum, but a string gets away with the need for
> a global list.
>

Revision history for this message
Benjamin Kehlet (benjamik) wrote :

Have I understood correctly: This means that every library that uses the Numpy C API have to to set this PY_ARRAY_UNIQUE_SYMBOL to be compatible with Dolfin? That sounds like something we should work around.

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Only at the unacceptable cost of breaking compatibility with important
external modules...
Den 3. des. 2012 17:50 skrev "Johan Hake" <email address hidden>
følgende:

> Sure in principle I agree. It would require some hard thinking because
> the Variable hierarchy is not small. Including information about
> castability between templated MeshFunctions, CellFunctions, aso can
> potentially be very hard, if not messy.
>
> For now dynamics_cast accomplish this for us.
>
> Johan
>
> On 12/03/2012 05:26 PM, Martin Sandve Alnæs wrote:
> > Discussed this a bit with Benjamin, we agree that it's probably safer
> > overall to work around the dynamic cast with a type string in the
> > Variable interface. Or an enum, but a string gets away with the need for
> > a global list.
> >
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/1085986
>
> Title:
> Recent rtti bugfix causes segfault when importing dolfin-adjoint
>
> Status in DOLFIN:
> New
> Status in dolfin-adjoint:
> New
> Status in libadjoint:
> New
>
> Bug description:
> Running just:
>
> import dolfin
> import dolfin_adjoint
>
> gives
>
> [martinal-mac:04342] *** Process received signal ***
> [martinal-mac:04342] Signal: Segmentation fault (11)
> [martinal-mac:04342] Signal code: Invalid permissions (2)
> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
> [martinal-mac:04342] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)
> [0x7f91f7027cb0]
> [martinal-mac:04342] [ 1]
> /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0)
> [0x7f91f457bc60]
> [martinal-mac:04342] *** End of error message ***
> Segmentation fault (core dumped)
>
> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180
> (-r7158.2.22 in current dolfin trunk) with message
> """Add RTLD_GLOBAL to python ld loader. This will make all types
> available
> for all other dynamically loaded modules. This should fix problems with
> dynamic_cast encountered with templated dolfin types.
> -- Is this also a problem on other platforms than linux?"""
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/1085986/+subscriptions
>

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

We sketched a solution, it is solvable and not that hard. No time right now
though.
Den 3. des. 2012 17:53 skrev "Martin Sandve Alnæs" <email address hidden>
følgende:

> Only at the unacceptable cost of breaking compatibility with important
> external modules...
> Den 3. des. 2012 17:50 skrev "Johan Hake" <email address hidden>
> følgende:
>
>> Sure in principle I agree. It would require some hard thinking because
>> the Variable hierarchy is not small. Including information about
>> castability between templated MeshFunctions, CellFunctions, aso can
>> potentially be very hard, if not messy.
>>
>> For now dynamics_cast accomplish this for us.
>>
>> Johan
>>
>> On 12/03/2012 05:26 PM, Martin Sandve Alnæs wrote:
>> > Discussed this a bit with Benjamin, we agree that it's probably safer
>> > overall to work around the dynamic cast with a type string in the
>> > Variable interface. Or an enum, but a string gets away with the need for
>> > a global list.
>> >
>>
>> --
>> You received this bug notification because you are a member of DOLFIN
>> Core Team, which is subscribed to DOLFIN.
>> https://bugs.launchpad.net/bugs/1085986
>>
>> Title:
>> Recent rtti bugfix causes segfault when importing dolfin-adjoint
>>
>> Status in DOLFIN:
>> New
>> Status in dolfin-adjoint:
>> New
>> Status in libadjoint:
>> New
>>
>> Bug description:
>> Running just:
>>
>> import dolfin
>> import dolfin_adjoint
>>
>> gives
>>
>> [martinal-mac:04342] *** Process received signal ***
>> [martinal-mac:04342] Signal: Segmentation fault (11)
>> [martinal-mac:04342] Signal code: Invalid permissions (2)
>> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
>> [martinal-mac:04342] [ 0]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f91f7027cb0]
>> [martinal-mac:04342] [ 1]
>> /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0)
>> [0x7f91f457bc60]
>> [martinal-mac:04342] *** End of error message ***
>> Segmentation fault (core dumped)
>>
>> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180
>> (-r7158.2.22 in current dolfin trunk) with message
>> """Add RTLD_GLOBAL to python ld loader. This will make all types
>> available
>> for all other dynamically loaded modules. This should fix problems with
>> dynamic_cast encountered with templated dolfin types.
>> -- Is this also a problem on other platforms than linux?"""
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/dolfin/+bug/1085986/+subscriptions
>>
>

Revision history for this message
Johan Hake (johan-hake) wrote :

On 12/03/2012 04:54 PM, Patrick Farrell wrote:
> Johan: isn't this a numpy bug, really?

I have not dug that much into it, but at least it is a known issue, with
known workarounds.

> What does it have to do with libadjoint?

It suffers from the issue of same named symbols.

> The libadjoint C-Python interface is generated via ctypes anyway -- it's
> all Python code. I don't know where I would put that #define workaround
> ...

I saw that. I have tried adding import_array together with the defines
into the:

  adj_python_utils.c

but that did not help...

Johan

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Could it be related to this bug?

http://projects.scipy.org/numpy/ticket/1928
Den 3. des. 2012 18:00 skrev "Benjamin Kehlet" <email address hidden>
følgende:

> Have I understood correctly: This means that every library that uses the
> Numpy C API have to to set this PY_ARRAY_UNIQUE_SYMBOL to be compatible
> with Dolfin? That sounds like something we should work around.
>
> --
> You received this bug notification because you are a member of DOLFIN
> Core Team, which is subscribed to DOLFIN.
> https://bugs.launchpad.net/bugs/1085986
>
> Title:
> Recent rtti bugfix causes segfault when importing dolfin-adjoint
>
> Status in DOLFIN:
> New
> Status in dolfin-adjoint:
> New
> Status in libadjoint:
> New
>
> Bug description:
> Running just:
>
> import dolfin
> import dolfin_adjoint
>
> gives
>
> [martinal-mac:04342] *** Process received signal ***
> [martinal-mac:04342] Signal: Segmentation fault (11)
> [martinal-mac:04342] Signal code: Invalid permissions (2)
> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
> [martinal-mac:04342] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)
> [0x7f91f7027cb0]
> [martinal-mac:04342] [ 1]
> /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0)
> [0x7f91f457bc60]
> [martinal-mac:04342] *** End of error message ***
> Segmentation fault (core dumped)
>
> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180
> (-r7158.2.22 in current dolfin trunk) with message
> """Add RTLD_GLOBAL to python ld loader. This will make all types
> available
> for all other dynamically loaded modules. This should fix problems with
> dynamic_cast encountered with templated dolfin types.
> -- Is this also a problem on other platforms than linux?"""
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/dolfin/+bug/1085986/+subscriptions
>

Revision history for this message
Johan Hake (johan-hake) wrote :

On 12/03/2012 05:55 PM, Martin Sandve Alnæs wrote:
> We sketched a solution, it is solvable and not that hard. No time right now
> though.
> Den 3. des. 2012 17:53 skrev "Martin Sandve Alnæs" <email address hidden>
> følgende:
>
>> Only at the unacceptable cost of breaking compatibility with important
>> external modules...

You are fast to conclude :)

Johan

>> Den 3. des. 2012 17:50 skrev "Johan Hake" <email address hidden>
>> følgende:
>>
>>> Sure in principle I agree. It would require some hard thinking because
>>> the Variable hierarchy is not small. Including information about
>>> castability between templated MeshFunctions, CellFunctions, aso can
>>> potentially be very hard, if not messy.
>>>
>>> For now dynamics_cast accomplish this for us.
>>>
>>> Johan
>>>
>>> On 12/03/2012 05:26 PM, Martin Sandve Alnæs wrote:
>>>> Discussed this a bit with Benjamin, we agree that it's probably safer
>>>> overall to work around the dynamic cast with a type string in the
>>>> Variable interface. Or an enum, but a string gets away with the need for
>>>> a global list.
>>>>
>>>
>>> --
>>> You received this bug notification because you are a member of DOLFIN
>>> Core Team, which is subscribed to DOLFIN.
>>> https://bugs.launchpad.net/bugs/1085986
>>>
>>> Title:
>>> Recent rtti bugfix causes segfault when importing dolfin-adjoint
>>>
>>> Status in DOLFIN:
>>> New
>>> Status in dolfin-adjoint:
>>> New
>>> Status in libadjoint:
>>> New
>>>
>>> Bug description:
>>> Running just:
>>>
>>> import dolfin
>>> import dolfin_adjoint
>>>
>>> gives
>>>
>>> [martinal-mac:04342] *** Process received signal ***
>>> [martinal-mac:04342] Signal: Segmentation fault (11)
>>> [martinal-mac:04342] Signal code: Invalid permissions (2)
>>> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
>>> [martinal-mac:04342] [ 0]
>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f91f7027cb0]
>>> [martinal-mac:04342] [ 1]
>>> /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0)
>>> [0x7f91f457bc60]
>>> [martinal-mac:04342] *** End of error message ***
>>> Segmentation fault (core dumped)
>>>
>>> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180
>>> (-r7158.2.22 in current dolfin trunk) with message
>>> """Add RTLD_GLOBAL to python ld loader. This will make all types
>>> available
>>> for all other dynamically loaded modules. This should fix problems with
>>> dynamic_cast encountered with templated dolfin types.
>>> -- Is this also a problem on other platforms than linux?"""
>>>
>>> To manage notifications about this bug go to:
>>> https://bugs.launchpad.net/dolfin/+bug/1085986/+subscriptions
>>>
>>
>

Revision history for this message
Johan Hake (johan-hake) wrote :

On 12/03/2012 05:53 PM, Benjamin Kehlet wrote:
> Have I understood correctly: This means that every library that uses the
> Numpy C API have to to set this PY_ARRAY_UNIQUE_SYMBOL to be compatible
> with Dolfin?

It looks like it. But I think it is more general than so. But I think it
is more general than so, and a know issue with libraries that uses the
numpy C-API. We defined this variable in the swig interface long before
we started using RTLD_GLOBAL.

> That sounds like something we should work around.

Sounds good to me, but I am not convinced, yet, that this is not a known
issue with known (but not yet to us :) ) workarounds.

Johan

Revision history for this message
Johan Hake (johan-hake) wrote :

On 12/03/2012 06:10 PM, Martin Sandve Alnæs wrote:
> Could it be related to this bug?
>
> http://projects.scipy.org/numpy/ticket/1928

Yes, I also saw that one. I think it is related. It looks like he solves
it by not using RTLD_GLOBAL. However he first traced the bug to a Numpy
file.

Will not be able to trace this any further to night...

Johan

> Den 3. des. 2012 18:00 skrev "Benjamin Kehlet" <email address hidden>
> følgende:
>
>> Have I understood correctly: This means that every library that uses the
>> Numpy C API have to to set this PY_ARRAY_UNIQUE_SYMBOL to be compatible
>> with Dolfin? That sounds like something we should work around.
>>
>> --
>> You received this bug notification because you are a member of DOLFIN
>> Core Team, which is subscribed to DOLFIN.
>> https://bugs.launchpad.net/bugs/1085986
>>
>> Title:
>> Recent rtti bugfix causes segfault when importing dolfin-adjoint
>>
>> Status in DOLFIN:
>> New
>> Status in dolfin-adjoint:
>> New
>> Status in libadjoint:
>> New
>>
>> Bug description:
>> Running just:
>>
>> import dolfin
>> import dolfin_adjoint
>>
>> gives
>>
>> [martinal-mac:04342] *** Process received signal ***
>> [martinal-mac:04342] Signal: Segmentation fault (11)
>> [martinal-mac:04342] Signal code: Invalid permissions (2)
>> [martinal-mac:04342] Failing at address: 0x7f91f457bc60
>> [martinal-mac:04342] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)
>> [0x7f91f7027cb0]
>> [martinal-mac:04342] [ 1]
>> /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so(PyArray_API+0)
>> [0x7f91f457bc60]
>> [martinal-mac:04342] *** End of error message ***
>> Segmentation fault (core dumped)
>>
>> as seen on the buildbot for dolfin-adjoint after dolfin revision -r7180
>> (-r7158.2.22 in current dolfin trunk) with message
>> """Add RTLD_GLOBAL to python ld loader. This will make all types
>> available
>> for all other dynamically loaded modules. This should fix problems with
>> dynamic_cast encountered with templated dolfin types.
>> -- Is this also a problem on other platforms than linux?"""
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/dolfin/+bug/1085986/+subscriptions
>>
>

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

This is not libadjoint related, but scipy.optimize (via f2py it seems):

import dolfin
import scipy.optimize

and this bug is marked "wontfix" in scipy 3 years ago:

http://projects.scipy.org/numpy/ticket/1148

I quote the reason (replace openbabel with dolfin to get to our situation):
"This behavior of Openbabel's is highly suspect IMHO. If it needs to enable this flag, it probably ought to also restore its value after loading whatever needs loading.

I think you should bring this issue up with Openbabel developers, as it doesn't look like this is a real bug in Numpy or Scipy."
...
"Closing as wontfix, as I believe the problem is in Openbabel.

Reopen if Openbabel developers disagree, or there is strong evidence that we ought not rely on RTLD_GLOBAL not being enabled."

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

Another project with this issue that also got rid of RTLD_GLOBAL:
https://github.com/TRIQS/TRIQS/issues/26

They write:
"In the meanwhile, we have decided to use the dynamic version of boost python. The latter contains a type-conversion registry which is shared among all extension modules (http://www.boost.org/doc/libs/1_47_0/libs/python/doc/building.html#the-dynamic-binary) and it is no longer necessary to resolve symbols globally. "

So the question for our dynamic casting issues becomes: is this a swig issue, lacking type sharing among our extension modules?

It seems clear, however, that we have to remove RTLD_GLOBAL, or we can't use scipy.optimize.

Revision history for this message
Johan Hake (johan-hake) wrote :

Importing numpy before setting RTLD_GLOBAL and before the swig interface is imported fixed the problem. I mark this bug as fixed and then we can open another bug for removing the usage of RTLD_GLOBAL.

Changed in dolfin:
status: New → Fix Committed
Changed in libadjoint:
status: New → Fix Committed
Changed in dolfin-adjoint:
status: New → Fix Committed
Revision history for this message
Johan Hake (johan-hake) wrote :

No it is not a SWIG issue per se, but if we would not have splitted the SWIG module into several modules we would not have had the problem. The problem is that sharing type information across shared libraries is not AFAIK defined in the C++ standard. Before version 3 gcc used a type str to check RTTI. However, again as far as I understand, for reason of speed they changed it to compare type adresses. There are obvious reason why that is not preserved across shared libraries.

gcc seems to do a good job for most of our types. Hoever templated types are the one that causes trouble. The type information of these are handled in a different way as these are most often not instantiated compile time. They are instantiated in the SWIG module that declares the templated type. For meshfunction that is the mesh module, but the types are not properly shared with the module that calls plot, which is in the io module.

Revision history for this message
Martin Sandve Alnæs (martinal) wrote :

The import is fixed, but now I can't plot... Will file a separate bug for this.

Changed in dolfin:
status: Fix Committed → Fix Released
Changed in dolfin-adjoint:
status: Fix Committed → Fix Released
Changed in libadjoint:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.