flredecomp fails with newer version of parmetis

Bug #1006863 reported by Stephan Kramer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fluidity
Confirmed
High
Jon Hill

Bug Description

In build of fluidity and tools linked with petsc-dev (which includes a newer version of parmetis: parmetis-4.0.2-p3), flredecomp fails to upscale from a single process to multiple processes. The following comes from the output of running the channel_wind_drag_parallel test case:

mpiexec -np 4 ../../bin/flredecomp -i 1 -o 4 channel_parallel_periodised channel_parallel_periodised_flredecomp
PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0
PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0
PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0
PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0
../../bin/flredecomp(fprint_backtrace_+0x1f) [0xefd0c3]
../../bin/flredecomp(__fldebug_MOD_flabort_pinpoint+0x304) [0x60ea46]
../../bin/flredecomp(__zoltan_integration_MOD_zoltan_load_balance+0x3a6) [0xe9dd6f]
../../bin/flredecomp(__zoltan_integration_MOD_zoltan_drive+0x929) [0xea3f02]
../../bin/flredecomp(flredecomp+0x1113) [0x607dbc]
../../bin/flredecomp(main+0x665) [0x60869e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff) [0x2adaa25d5eff]

....

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 16.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
*** FLUIDITY ERROR ***
Source location: (Zoltan_integration.F90, 820)
Error message: After load balancing process would have an empty partition.
Backtrace will follow if it is available:
Use addr2line -e <binary> <address> to decipher.
Error is terminal.

....

I presume this is to do with the newer version of parmetis that's being linked into petsc-dev.

Revision history for this message
Jon Hill (jon-hill) wrote :

I'll take a look at this

Changed in fluidity:
assignee: nobody → Jon Hill (jon-hill)
importance: Undecided → Low
Revision history for this message
Jon Hill (jon-hill) wrote :

Zoltan *should* work with this version of Parmetis. Will have to dig deeper.

Changed in fluidity:
status: New → Confirmed
Changed in fluidity:
importance: Low → High
Revision history for this message
Stephan Kramer (s-kramer) wrote :

The problem in the case above, is that our packaged zoltan is build with parmetis 3.1. When linking with a PETSc with a newer parmetis version however (4.x), it will dynamically link against that version instead and the calls from zoltan to parmetis screw up due to API incompatibility (zoltan needs to know what version of parmetis it's linked against at build time). The wider problem is that this also screws up fldecomp as the metis version has also increased and if fluidity has been configured with sam this will probably also break its calls to parmetis (although I haven't tried this).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.