lagrangian detectors segfault with RHEL

Bug #1260019 reported by Jean Mensa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fluidity
Fix Committed
Undecided
Unassigned

Bug Description

Lagrangian detectors fail with a segfault during the first timestep. The OS is RHEL 6 with gcc-4.4.6. The same simulation seems to run longer with RHEL 5 and gcc-4.6.4. Also notice that the simulation runs fine with static detectors.

I am attaching the files needed to run the simulation.
Cheers,
Jean

Related branches

Revision history for this message
Jean Mensa (jeanmensa) wrote :
Jean Mensa (jeanmensa)
description: updated
Changed in fluidity:
status: New → Confirmed
Revision history for this message
Michael Lange (michael-lange) wrote :

Hi Jean,

I managed to reproduce the segfault you reported and tracked it down to some MPI buffers being corrupted during the detector exchange. This seems to be happening directly during the data exchange, but I am not yet sure what exactly causes this. I will let you know when I find out more.

Thanks,
Michael

Revision history for this message
Jean Mensa (jmensa) wrote :

Hi Michael,
do you have any news? I have tested the branch fix-lag-detectors in case it was also a fix to this bug but as you might know, it wasn't...
Thank you for looking into this,
Jean

Revision history for this message
Michael Lange (michael-lange) wrote :

Hi Jean,

I just commited a potential fix for the observed buffer corruption to the fix-lagr-detectors branch. Can you please check this with your setup to confirm it works?

Thanks,
Michael

Revision history for this message
Jean Mensa (jmensa) wrote :

Hi Michael,
thanks, that fixed the problem!
Jean

Revision history for this message
Rhodri Davies (rhodri-davies) wrote : Re: [Bug 1260019] lagrangian detectors segfault with RHEL

Hi Michael,

Is there any chance we could get these fixes merged into trunk? I'm happy to review if you will propose the merge?

Rhod

On 8 Jan 2014, at 03:28, Michael Lange <email address hidden> wrote:

> Hi Jean,
>
> I just commited a potential fix for the observed buffer corruption to
> the fix-lagr-detectors branch. Can you please check this with your setup
> to confirm it works?
>
> Thanks,
> Michael
>
> ** Branch linked: lp:~fluidity-core/fluidity/fix-lagr-detectors
>
> --
> You received this bug notification because you are a member of Fluidity
> Core Team, which is subscribed to Fluidity.
> https://bugs.launchpad.net/bugs/1260019
>
> Title:
> lagrangian detectors segfault with RHEL
>
> Status in The Fluidity computational fluid dynamics code:
> Confirmed
>
> Bug description:
> Lagrangian detectors fail with a segfault during the first timestep.
> The OS is RHEL 6 with gcc-4.4.6. The same simulation seems to run
> longer with RHEL 5 and gcc-4.6.4. Also notice that the simulation runs
> fine with static detectors.
>
> I am attaching the files needed to run the simulation.
> Cheers,
> Jean
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fluidity/+bug/1260019/+subscriptions

Revision history for this message
Jean Mensa (jeanmensa) wrote :

Hi Michael,
in my configuration (openmpi-1.4.5) it seems that the detectors file can't grow larger than 2.1GB which I think is due to a problem in Diagnostic_variables.F90. At line 2732 realsize is declared as integer(4) which causes mpi_write_at to not be able to access the end of files larger than 2.1GB. Changing the type of realsize to MPI_OFFSET_KIND solves the problem for me. Please let me know if you can confirm the problem and the fix.
Cheers,
j

Changed in fluidity:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.