Segmentation fault while sending large arrays
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mpich (Debian) |
New
|
Unknown
|
|||
mpich (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Using mpich from Ubuntu's packages, the attached program crashes with the following output:
$ mpirun -np 2 ./test
MPI_Init returned 0
myid = 0 i = 1
myid = 0 numprocs = 2
myid = 0: Sending to 1 size = 16000
p0_12348: p4_error: interrupt SIGSEGV: 11
p0_12348: (0.031250) net_send: could not write to fd=12, errno = 32
rm_l_1_12357: (0.000000) net_send: could not write to fd=5, errno = 32
MPI_Init returned 0
myid = 1 numprocs = 2
myid = 1: Receiving from 0 size = 16000
I compiled the executable as follows:
mpicc -Wall -c -o test.o -Wall -g3 test.c
mpicc test.o -o test
I have the following packages installed:
ii libmpich-mpd1.0-dev 1.2.7-8 mpich static libraries and development files
ii libmpich-mpd1.0gf 1.2.7-8 mpich-mpd runtime shared library
ii libmpich1.0-dev 1.2.7-8 mpich static libraries and development files
ii libmpich1.0gf 1.2.7-8 mpich runtime shared library
ii mpich-bin 1.2.7-8 MPI parallel computing system implementation
ii mpich-mpd-bin 1.2.7-8 MPI parallel computing system implementation, MPD version
The alternatives are defines as follows:
lrwxrwxrwx 1 root root 32 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 40 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 29 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 44 May 14 15:59 /etc/alternativ
lrwxrwxrwx 1 root root 22 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 20 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 36 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 20 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 36 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 27 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 43 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 21 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 24 May 13 12:05 /etc/alternativ
lrwxrwxrwx 1 root root 40 May 13 12:05 /etc/alternativ
lrwxrwxrwx 1 root root 21 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 21 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 21 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 26 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 42 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 21 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 37 May 13 12:23 /etc/alternativ
lrwxrwxrwx 1 root root 38 May 14 15:59 /etc/alternativ
lrwxrwxrwx 1 root root 43 May 14 15:59 /etc/alternativ
If I just reduce the macro "size" by one, the program works:
$ mpirun -np 2 ./test
MPI_Init returned 0
myid = 0 i = 1
myid = 0 numprocs = 2
myid = 0: Sending to 1 size = 15999
MPI_Init returned 0
myid = 1 numprocs = 2
myid = 1: Receiving from 0 size = 15999
Using a self-built version of mpich-1.2.7p1, everything is OK even for larger numbers of "size", e.g.
$ mpirun -np 2 ./test
MPI_Init returned 0
myid = 0 i = 1
myid = 0 numprocs = 2
myid = 0: Sending to 1 size = 1600000
MPI_Init returned 0
myid = 1 numprocs = 2
myid = 1: Receiving from 0 size = 1600000
Changed in mpich (Debian): | |
status: | Unknown → New |
Sorry, I attached the wrong file the last time. With this post you find the right one