lagrangian detectors segfault with RHEL

Bug #1260019 reported by Jean Mensa on 2013-12-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fluidity
Undecided
Unassigned

Bug Description

Lagrangian detectors fail with a segfault during the first timestep. The OS is RHEL 6 with gcc-4.4.6. The same simulation seems to run longer with RHEL 5 and gcc-4.6.4. Also notice that the simulation runs fine with static detectors.

I am attaching the files needed to run the simulation.
Cheers,
Jean

Related branches

Jean Mensa (jeanmensa) wrote :
Jean Mensa (jeanmensa) on 2013-12-11
description: updated
Changed in fluidity:
status: New → Confirmed
Michael Lange (michael-lange) wrote :

Hi Jean,

I managed to reproduce the segfault you reported and tracked it down to some MPI buffers being corrupted during the detector exchange. This seems to be happening directly during the data exchange, but I am not yet sure what exactly causes this. I will let you know when I find out more.

Thanks,
Michael

Jean Mensa (jmensa) wrote :

Hi Michael,
do you have any news? I have tested the branch fix-lag-detectors in case it was also a fix to this bug but as you might know, it wasn't...
Thank you for looking into this,
Jean

Michael Lange (michael-lange) wrote :

Hi Jean,

I just commited a potential fix for the observed buffer corruption to the fix-lagr-detectors branch. Can you please check this with your setup to confirm it works?

Thanks,
Michael

Jean Mensa (jmensa) wrote :

Hi Michael,
thanks, that fixed the problem!
Jean

Hi Michael,

Is there any chance we could get these fixes merged into trunk? I'm happy to review if you will propose the merge?

Rhod

On 8 Jan 2014, at 03:28, Michael Lange <email address hidden> wrote:

> Hi Jean,
>
> I just commited a potential fix for the observed buffer corruption to
> the fix-lagr-detectors branch. Can you please check this with your setup
> to confirm it works?
>
> Thanks,
> Michael
>
> ** Branch linked: lp:~fluidity-core/fluidity/fix-lagr-detectors
>
> --
> You received this bug notification because you are a member of Fluidity
> Core Team, which is subscribed to Fluidity.
> https://bugs.launchpad.net/bugs/1260019
>
> Title:
> lagrangian detectors segfault with RHEL
>
> Status in The Fluidity computational fluid dynamics code:
> Confirmed
>
> Bug description:
> Lagrangian detectors fail with a segfault during the first timestep.
> The OS is RHEL 6 with gcc-4.4.6. The same simulation seems to run
> longer with RHEL 5 and gcc-4.6.4. Also notice that the simulation runs
> fine with static detectors.
>
> I am attaching the files needed to run the simulation.
> Cheers,
> Jean
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fluidity/+bug/1260019/+subscriptions

Jean Mensa (jeanmensa) wrote :

Hi Michael,
in my configuration (openmpi-1.4.5) it seems that the detectors file can't grow larger than 2.1GB which I think is due to a problem in Diagnostic_variables.F90. At line 2732 realsize is declared as integer(4) which causes mpi_write_at to not be able to access the end of files larger than 2.1GB. Changing the type of realsize to MPI_OFFSET_KIND solves the problem for me. Please let me know if you can confirm the problem and the fix.
Cheers,
j

Changed in fluidity:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers