Performance bug in Advection_Diffusion_DG.F90
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fluidity |
Incomplete
|
Undecided
|
Michael Lange |
Bug Description
Since trunk 3981.1.75 was added the performance of the threaded version of Fluidity has been adversely affected. The problem only seems appear when configuring/
I've tracked the problem down to the following lines (lines 2657-2659) which were added to Advection_
if (neumann) then
call addto(RHS_diff, T_face, shape_rhs(T_shape, detwei * ele_val_
end if
The differences between 3981.1.74 and 3981.1.75 can be viewed here: http://
Commenting out these 3 lines results in the performance of the threaded code behaving as expected. The numbers below were obtained using the lock_exchange_3d_dg long test. Most of my testing has been carried out on revision 3996 of branch fluidity-petsc-3.3 as after seeing odd performance with the trunk I returned back to where the code still performed correctly. fluidity-petsc-3.3 has been merged into the main trunk for sometime but I suspect very few people are running fluidity with threads and thus haven't had any issues. Results from revision 4128 of the trunk are also included to show the problem exists there too.
Temperature:
32 MPI (1 thread) 56.53 56.77 56.56 56.74
16 MPI (2 thread) 62.85 63.89 62.90 64.10
8 MPI (4 thread) 52.85 78.67 53.47 70.92
4 MPI (8 thread) 46.50 87.62 46.63 88.05
2 MPI (16 thread) 42.58 94.27 42.94 94.08
1 MPI (32 thread) 123.66 210.466 119.25 194.96
Basically the effect is that the threaded code now almost never goes faster than the MPI version which means all benefit from using threads has been lost.
I guess removing the 3 lines above isn't really an option? so any ideas / suggestions would be appreciated.
Cheers,
Fiona
First of all this change has nothing to do with the petsc 3.3 branch. It's actually commit 3879.2.1 on the trunk (bzr commit numbers aren't consistent like that between different branches), merged in at commit 4056.
I'm a bit puzzled by what you are seeing. Are you using the unmodified flml from the longtests/ lock_exchange_ 3d_dg/ ? Cause in that .flml the only scalar field doesn't actually have a Neumann boundary condition so 'neumann' should always be .false. Can you check whether it is, and if not, why?