libopenmpi segfaults when electric fence is enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openmpi (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: libopenmpi1
Electric Fence is a library for finding memory access bugs. It wraps malloc in such a way that a segfault
is issued upon invalid memory access.
When I load my MPI program with electric-fence enabled, MPI::Init issues a segfault:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb562a6c0 (LWP 2014)]
0xb5d56fb1 in opal_free_list_grow () from /usr/lib/
(gdb) bt
#0 0xb5d56fb1 in opal_free_list_grow () from /usr/lib/
#1 0xb5d57099 in opal_free_list_init () from /usr/lib/
#2 0xb2bed9b5 in ompi_osc_
#3 0xb7953d4a in ompi_osc_
#4 0xb791b032 in ompi_mpi_init () from /usr/lib/
#5 0xb793dd17 in PMPI_Init () from /usr/lib/
#6 0x0811c8dc in MPI::Init ()
#7 0x081189fa in main ()
That MPI::Init is crashing is suggestive: it very likely indicates an invalid
memory access.
What I expected to happen: I expected MPI::Init to execute without
crashing. I expected the program to possibly crash later on an invalid
memory access in my own code.
What actually happened: MPI::Init crashed.
I have attached test.cpp, a program which just
calls MPI::Init.
[11:45]
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>
int main(int argc, char **argv)
{
MPI::Init(argc, argv);
MPI::Finalize();
return 0;
}
[11:46]
[11:46]
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) set logging overwrite on
(gdb) set environment EF_PROTECT_BELOW 0
(gdb) set environment EF_DISABLE_BANNER 1
(gdb) set environment LD_PRELOAD /usr/lib/
(gdb) handle SIG33 pass nostop noprint
Signal Stop Print Pass to program Description
SIG33 No No Yes Real-time event 33
(gdb) set pagination 0
(gdb) run
Starting program: /tmp/test
[Thread debugging using libthread_db enabled]
[New Thread 0xb7b6d6c0 (LWP 3047)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7b6d6c0 (LWP 3047)]
0xb7e24fb1 in opal_free_list_grow () from /usr/lib/
(gdb) backtrace full
#0 0xb7e24fb1 in opal_free_list_grow () from /usr/lib/
No symbol table info available.
#1 0xb7e25099 in opal_free_list_init () from /usr/lib/
No symbol table info available.
#2 0xb62309b5 in ompi_osc_
No symbol table info available.
#3 0xb7f30d4a in ompi_osc_
No symbol table info available.
#4 0xb7ef8032 in ompi_mpi_init () from /usr/lib/
No symbol table info available.
#5 0xb7f1ad17 in PMPI_Init () from /usr/lib/
No symbol table info available.
#6 0x08052cec in MPI::Init ()
No locals.
#7 0x0804f7f0 in main ()
No locals.
(gdb) info registers
eax 0xb6237800 -1239189504
ecx 0xb6237528 -1239190232
edx 0xb6237800 -1239189504
ebx 0xb7e4e458 -1209736104
esp 0xbfbaec80 0xbfbaec80
ebp 0xbfbaec98 0xbfbaec98
esi 0xb7614fec -1218359316
edi 0xb62f8000 -1238401024
eip 0xb7e24fb1 0xb7e24fb1 <opal_free_
eflags 0x10286 [ PF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) thread apply all backtrace
Thread 1 (Thread 0xb7b6d6c0 (LWP 3047)):
#0 0xb7e24fb1 in opal_free_list_grow () from /usr/lib/
#1 0xb7e25099 in opal_free_list_init () from /usr/lib/
#2 0xb62309b5 in ompi_osc_
#3 0xb7f30d4a in ompi_osc_
#4 0xb7ef8032 in ompi_mpi_init () from /usr/lib/
#5 0xb7f1ad17 in PMPI_Init () from /usr/lib/
#6 0x08052cec in MPI::Init ()
#7 0x0804f7f0 in main ()
(gdb) quit
The program is running. Exit anyway? (y or n) y
[11:47]
[pfarrell@
Description: Ubuntu 8.04
Release: 8.04
[pfarrell@
libopenmpi1:
Installed: 1.2.5-1ubuntu1.1
Candidate: 1.2.5-1ubuntu1.1
Version table:
*** 1.2.5-1ubuntu1.1 0
100 /var/lib/
1.2.5-1ubuntu1 0
500 http://
Changed in openmpi (Ubuntu): | |
status: | Fix Committed → Fix Released |
After struggling a bit to build libopenmpi1 with debugging symbols (even with libopenmpi-dbg installed, libopal-pal
does not have debugging symbols installed), I managed to get a more useful backtrace:
0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_ free_list. c:113 INTERNAL( item, flist-> fl_elem_ class); free_list. c:113 0xb2b46e20, num_elements_ to_alloc= 73, max_elements_ to_alloc= -1, num_elements_ per_alloc= 1) at class/opal_ free_list. c:78 pt2pt_component _init (enable_ progress_ threads= false, enable_ mpi_threads= false) at osc_pt2pt_ component. c:173 base_find_ available (enable_ progress_ threads= false, enable_ mpi_threads= false) at base/osc_ base_open. c:84 0xbfd61e78) at runtime/ ompi_mpi_ init.c: 411
113 OBJ_CONSTRUCT_
(gdb) bt
#0 0xb5cdd334 in opal_free_list_grow (flist=0xb2b46a50, num_elements=1) at class/opal_
#1 0xb5cdd479 in opal_free_list_init (flist=0xb2b46a50, elem_size=56, elem_class=
#2 0xb2b381aa in ompi_osc_
#3 0xb792b67c in ompi_osc_
#4 0xb78e6abe in ompi_mpi_init (argc=5, argv=0xbfd61f84, requested=0, provided=
#5 0xb7911a87 in PMPI_Init (argc=0xbfd61f00, argv=0xbfd61f04) at pinit.c:71
#6 0x0811ca6c in MPI::Init ()
#7 0x08118b8a in main ()