My initial report was missing a testcase, an analysis and, worst of all, the
title is wrong. It should have been something like "reading facet mesh function
from a file in parallel is extremely slow".
Apologize for that.
1) Testcase: run the following code first with only a process in order to
build the input file, then in parallel.
Im' getting this on my laptop:
marco@darkstar:~/Programmi/Dolphin> time ./TestCase && time mpirun -np 2 TestCase
750000
real 0m25.556s
user 0m24.827s
sys 0m0.665s
Process 0: Number of global vertices: 132651
Process 0: Number of global cells: 750000
Process 0: Partitioned mesh, edge cut is 6083.
Process 1: Partitioned mesh, edge cut is 6083.
Process 0: 375030
Process 1: 374970
real 16m55.749s
user 33m39.199s
sys 0m6.482s
2) Analysis.
2.1) MeshPartitioning.h line 223:
std::find on a vector is an
O(global_entity_indices.size()) operation.
It is performed insed a loop spanning [0, ldata.size())
-> global_entity_indices.size() * ldata.size() operations.
The proposed solution leads to N*log(N) average operations.
I agree with you that hashed containers should be investigated
in order to check if the better scalability
they could offer is does not cost too much for
smaller meshes.
2.2) MeshPartitioning.h lines 245-273:
The second loop at line 251 is performed to check
the condition of line 260. If I've understoo correctly,
the condition is true, for the test case,
only four time for each time the whole loop is run
(i.e. for the four facets of each cell); it can be
avoided pre-building the map (of hashed container) map_of_ldata
2.3) MeshDistributed.cpp line 89 and 145: same problem of 2.1:
std::find on a vector performed inside a loop.
>This sounds strange.
My initial report was missing a testcase, an analysis and, worst of all, the
title is wrong. It should have been something like "reading facet mesh function
from a file in parallel is extremely slow".
Apologize for that.
1) Testcase: run the following code first with only a process in order to
build the input file, then in parallel.
#include <dolfin.h> processes( ) == 1) n<uint> mfc(mesh, 2, 0); facet_markers. xml.gz" ); uint> mfc(mesh, "mesh_facet_ markers. xml.gz" ); :MPI::sum( mesh.num_ cells() ) << endl;
using namespace dolfin;
int main(void)
{
UnitCube mesh(50, 50, 50);
if (MPI::num_
{
MeshFunctio
File mfcf("mesh_
mfcf << mfc;
}
MeshFunction<
cout << dolfin:
return 0;
}
Im' getting this on my laptop: ~/Programmi/ Dolphin> time ./TestCase && time mpirun -np 2 TestCase
marco@darkstar:
750000
real 0m25.556s
user 0m24.827s
sys 0m0.665s
Process 0: Number of global vertices: 132651
Process 0: Number of global cells: 750000
Process 0: Partitioned mesh, edge cut is 6083.
Process 1: Partitioned mesh, edge cut is 6083.
Process 0: 375030
Process 1: 374970
real 16m55.749s
user 33m39.199s
sys 0m6.482s
2) Analysis. entity_ indices. size()) operation. entity_ indices. size() * ldata.size() operations.
2.1) MeshPartitioning.h line 223:
std::find on a vector is an
O(global_
It is performed insed a loop spanning [0, ldata.size())
-> global_
The proposed solution leads to N*log(N) average operations.
I agree with you that hashed containers should be investigated
in order to check if the better scalability
they could offer is does not cost too much for
smaller meshes.
2.2) MeshPartitioning.h lines 245-273:
The second loop at line 251 is performed to check
the condition of line 260. If I've understoo correctly,
the condition is true, for the test case,
only four time for each time the whole loop is run
(i.e. for the four facets of each cell); it can be
avoided pre-building the map (of hashed container) map_of_ldata
2.3) MeshDistributed.cpp line 89 and 145: same problem of 2.1:
std::find on a vector performed inside a loop.