DOLFIN

Bug #928342
Comment #4

Comment 4 for bug 928342

Revision history for this message

Marco Morandini (morandini) wrote on 2012-02-12:

>This sounds strange.

My initial report was missing a testcase, an analysis and, worst of all, the
title is wrong. It should have been something like "reading facet mesh function
from a file in parallel is extremely slow".
Apologize for that.

1) Testcase: run the following code first with only a process in order to
build the input file, then in parallel.

#include <dolfin.h>
using namespace dolfin;
int main(void)
{
  UnitCube mesh(50, 50, 50);
  if (MPI::num_processes() == 1)
  {
    MeshFunction<uint> mfc(mesh, 2, 0);
    File mfcf("mesh_facet_markers.xml.gz");
    mfcf << mfc;
  }
  MeshFunction<uint> mfc(mesh, "mesh_facet_markers.xml.gz");
  cout << dolfin::MPI::sum(mesh.num_cells()) << endl;
  return 0;
}

Im' getting this on my laptop:
marco@darkstar:~/Programmi/Dolphin> time ./TestCase && time mpirun -np 2 TestCase
750000

real 0m25.556s
user 0m24.827s
sys 0m0.665s
Process 0: Number of global vertices: 132651
Process 0: Number of global cells: 750000
Process 0: Partitioned mesh, edge cut is 6083.
Process 1: Partitioned mesh, edge cut is 6083.
Process 0: 375030
Process 1: 374970

real 16m55.749s
user 33m39.199s
sys 0m6.482s

2) Analysis.
2.1) MeshPartitioning.h line 223:
std::find on a vector is an
O(global_entity_indices.size()) operation.
It is performed insed a loop spanning [0, ldata.size())
-> global_entity_indices.size() * ldata.size() operations.
The proposed solution leads to N*log(N) average operations.
I agree with you that hashed containers should be investigated
in order to check if the better scalability
they could offer is does not cost too much for
smaller meshes.

2.2) MeshPartitioning.h lines 245-273:
The second loop at line 251 is performed to check
the condition of line 260. If I've understoo correctly,
the condition is true, for the test case,
only four time for each time the whole loop is run
(i.e. for the four facets of each cell); it can be
avoided pre-building the map (of hashed container) map_of_ldata

2.3) MeshDistributed.cpp line 89 and 145: same problem of 2.1:
std::find on a vector performed inside a loop.

>This sounds strange.

1) Testcase: run the following code first with only a process in order to
build the input file, then in parallel.

#include <dolfin.h>
using namespace dolfin;
int main(void)
{
  UnitCube mesh(50, 50, 50);
  if (MPI::num_processes() == 1) 
  {
    MeshFunction<uint> mfc(mesh, 2, 0);
    File mfcf("mesh_facet_markers.xml.gz");
    mfcf << mfc;
  }
  MeshFunction<uint> mfc(mesh, "mesh_facet_markers.xml.gz");
  cout << dolfin::MPI::sum(mesh.num_cells()) << endl;
  return 0;
}

Im' getting this on my laptop:
marco@darkstar:~/Programmi/Dolphin> time ./TestCase && time mpirun -np 2 TestCase
750000

real    0m25.556s
user    0m24.827s
sys     0m0.665s
Process 0: Number of global vertices: 132651
Process 0: Number of global cells: 750000
Process 0: Partitioned mesh, edge cut is 6083.
Process 1: Partitioned mesh, edge cut is 6083.
Process 0: 375030
Process 1: 374970

real    16m55.749s
user    33m39.199s
sys     0m6.482s

2) Analysis.
2.1) MeshPartitioning.h line 223: 
std::find on a vector is an 
O(global_entity_indices.size()) operation.
It is performed insed a loop spanning [0, ldata.size())
-> global_entity_indices.size() * ldata.size() operations.
The proposed solution leads to N*log(N) average operations.
I agree with you that hashed containers should be investigated
in order to check if the better scalability
they could offer is does not cost too much for
smaller meshes.

2.2) MeshPartitioning.h lines 245-273:
The second loop at line 251 is performed to check
the condition of line 260. If I've understoo correctly,
the condition is true, for the test case, 
only four time for each time the whole loop is run 
(i.e. for the four facets of each cell); it can be
avoided pre-building the map (of hashed container) map_of_ldata

2.3) MeshDistributed.cpp line 89 and 145: same problem of 2.1:
std::find on a vector performed inside a loop.