valgrind reports a problem with mpich and ld-2.9.so library

Bug #419467 reported by gcordoba
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mpich (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: mpich-bin

I was trying to run a new program developed by SUNY at Buffalo and the program seems to go into an infinite loop. The same happen with a former version which also uses mpi. They do callings to mpich. Thus, I run the program using valgrind:

The direct report of valgrind was not clear for me (without option -v), as it only shows this warning:
==12266== Warning: invalid file descriptor -1 in syscall write()

however I am attaching the log (valgring-1.log)

By including the option -v, valgrind shows several CRC mismatches and seems to point towards the library ld-2.9.so:

==32107== Warning: invalid file descriptor -1 in syscall write()
==32107== at 0x40007F2: (within /lib/ld-2.9.so)

 as shown in the second attached file (valgrind-2.log).

The reported problem on the hpfem.C routine (mine) corresponds to the calling of mpi:
   MPI_Init(&argc,&argv);

I am using Ubuntu 9.04, however, this problem does not appear on former versions of Ubuntu (e.g. 8.10).
Any idea?
Gustavo

Revision history for this message
gcordoba (glgcg) wrote :
Revision history for this message
gcordoba (glgcg) wrote :

Hi again,
this is to report that the same problem occurs with a simple program that uses mpi (see the attached file).

Revision history for this message
gcordoba (glgcg) wrote :
Revision history for this message
gcordoba (glgcg) wrote :

I just tried a very simple one line program found on an ubuntu forum (http://ubuntuforums.org/showthread.php?t=80182):

int main(){return 42;}

Valgrind does not report any mistake. However, using valgrind -v:
--9198-- Valgrind library directory: /usr/lib/valgrind
--9198-- Reading syms from /lib/ld-2.9.so (0x4000000)
--9198-- Reading debug info from /lib/ld-2.9.so ..
--9198-- .. CRC mismatch (computed 049232cc wanted 022486d8)
--9198-- object doesn't have a symbol table
--9198-- Reading syms from /home/gcordoba/tmp/a.out (0x8048000)
--9198-- Reading syms from /usr/lib/valgrind/x86-linux/memcheck (0x38000000)
--9198-- object doesn't have a dynamic symbol table

Is this due to the same bug? It is related with a bug on the libraries?
Please. note that the "syscal write" problem which appear on any mpi related program does not appear in this case.

Revision history for this message
gcordoba (glgcg) wrote :

Hi,
I found a partial solution. The problem is on the mpich installation. It installed mpi.h on several places and when mpi is called any program that use it becomes confused.
I discover this during an installation checking by looking the .configure.log. It showed me that the library -lmpich was not found. Thus I just put some flags to the configure process (thanks to Dinesh Kumar from the University at Buffalo):
./configure CPPFLAGS=-I/usr/lib/mpich/include/ LDFLAGS='-L/usr/lib/mpich/lib -lmpich'

This is a particular solution for my program. I do not know how to fix this for other programs.
Gus

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.