Failure in checkpointing: stack smash on readVTKFile()

Bug #1065262 reported by IES developer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fluidity
Incomplete
Medium
Jon Hill

Bug Description

This bug causes Fluidity to crash upon reading in a checkpoint, and is absolutely repeatable. Checkpointing becomes unusable.

Operating systems / software:
- Fully-updated Ubuntu 11.04, 11.10, 12.04
- Stock Ubuntu versions of OpenMPI, PETSc, ParMetis, VTK, Python-Numpy, etc. (excepting Zoltan)
- Fluidity releases 4.1.6, 4.1.7.1, and trunk.

Conditions of error:
1) Defining environmental variables CFLAGS, CXXFLAGS, and FCFLAGS with -O2 or -O3
2) Compiling Fluidity with ./configure --enable-vtk --with-zoltan --enable-2d-adaptivity
3) Running test case water_collapse_2d
4) Editing finish_time in water_collapse_17_checkpoint.flml, so that simulation runs past checkpoint
5) Running fluidity -v 2 water_collapse_17_checkpoint.flml

Nature of error:
- Fluidity crashes upon trying to read in the checkpointed pressure vtu file.
- Traceback indicates point of failure readVTKFile() / vtk_get_sizes() / vtk_read_state()
(Log attached to bug entry)

Steps for bug work-around:
1) Remove optimisation flags (eg. -O2 or -O3) from enviromental variables CFLAGS, CXXFLAGS and FCFLAGS
2) make distclean
3) Re-run ./configure and recompile Fluidity

Suggested interim fix:
./configure informs users that the environmental variables CFLAGS, CXXFLAGS and FCFLAGS are picked up by configure. It should now also recommend removing -O/-O2/-O3 from these variables (and state why you should do so).

Revision history for this message
IES developer (ac-ies) wrote :
description: updated
description: updated
IES developer (ac-ies)
description: updated
description: updated
description: updated
Revision history for this message
Cian Wilson (cwilson) wrote :

Have you modified water_collapse.flml at all before step 3?

The reason I ask is because with this example there shouldn't be a water_collapse_17_checkpoint.flml (it should reach water_collapse_50_checkpoint before terminating producing no checkpoints before then).

Does the first run exit cleanly? Are any errors reported in fluidity.err-0?

Revision history for this message
IES developer (ac-ies) wrote :

I tested all three version of Fluidity with fluidity-tests version 4.1.7.1 -- I'll download the latest tests from bzr and run the trunk against those. Will post err + log here when I'm done.

Revision history for this message
IES developer (ac-ies) wrote :

Have re-run with the up-to-date dev trunk and tests. It's stopping at 18 dumps, and producing a checkpoint at the 18th. The log file looks like it has some uninitialised strings in parts, so something is definitely awry, even without readVTKFile() being called.

Have zipped up the water_collapse_2d/ directory after running the test, and passed it onto Cian (bit big to attach here).

Revision history for this message
Cian Wilson (cwilson) wrote :

Apologies, I had been looking at the example water_collapse_2d. The test water_collapse_2d should indeed stop when you're saying.

The random stuff in your first log looks like output from the profiler trying to print out uninitialized strings.

Revision history for this message
IES developer (ac-ies) wrote : Re: [Bug 1065262] Re: Failure in checkpointing: stack smash on readVTKFile()

On 11/10/2012 15:09, Cian Wilson wrote:
> Apologies, I had been looking at the example water_collapse_2d. The
> test water_collapse_2d should indeed stop when you're saying.
>
> The random stuff in your first log looks like output from the profiler
> trying to print out uninitialized strings.

Hi Cian

That's what I thought. What's very odd about this is, when I remove all the
optimisation flags from CFLAGS, CXXFLAGS and FCFLAGS, Fluidity's configure
still includes -O3 in the Makefiles. So the code is still being optimised.

I'm uploading a zip file of my water_collapse_2d/ test run to FileExchange if
it helps. You should get an email soon.

Cheers
--

Dr Angus Creech

Research Fellow
Institute of Energy Systems
University of Edinburgh
Scotland

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Revision history for this message
IES developer (ac-ies) wrote :

On 28/02/2013 10:00, Jon Hill wrote:
> Any update on this?
>

Hi Jon

As mentioned in the bug report, removing the optimisation flags from CFLAGS,
CXXFLAGS etc., running configure and recompiling stops the checkpoint crashing.
This is with GCC.

That's what I've done, and had no problems since (currently on release 4.1.8).
I've not had time to dig deeper.

Cheers
--

Dr Angus Creech

Research Fellow
Institute of Energy Systems
University of Edinburgh
Scotland

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Jon Hill (jon-hill)
Changed in fluidity:
assignee: nobody → Jon Hill (jon-hill)
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Stephan Kramer (s-kramer) wrote :

Is this still a problem? I can't reproduce it on current trunk. This sort of thing is going to be very dependent on your exact environment. So you'll have to give us the gcc version, the *exact* FCFLAGS/CFLAGS/CFLAGS you were using, etc.

Changed in fluidity:
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.