DFNflow crashes for compiled trunk but not non-optimized debug compiled trunk

Bug #1666339 reported by Robert Caulk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Yade
Fix Released
Undecided
Bruno Chareyre

Bug Description

Distro: Xenial 16.04LTS
Yade Version: yade-2017-0207.git-11c276f
Compilation: default compilation with debug flags and '#define DFNFLOW' uncommented in DFNFlow.cpp

Summary:

DFNFlowEngine crashes for compiled yade-2017-0207.git-11c276f sources. The segmentation fault also occurs for a debug compiled version and yields the attached core dump. Interestingly, the DFNFlowEngine does not crash for a non-optimized debug compilation of the same sources.

Description of failure:

According to the core dump, the failure can be traced back to DFNFlow.cpp:176, where it is checking if the cell is inifinite (although I have also had it fail at the permeability assignment directly below line 176 for a modified version of DFNflow.cpp).

 DFNFlow.cpp:

 176: if ( Tri.is_infinite(cell1) || Tri.is_infinite(cell2)) cerr<<"Infinite cell found in trickPermeability, should be handled somehow, maybe"<<endl;
 177: cell1->info().kNorm()[facet->second]=cell2->info().kNorm()[Tri.mirror_index(cell1, facet- >second)] = pow((aperture+residualAperture),3)/(12*viscosity);

I am unsure why this line is causing a crash in the optimized-debug compiled code, but not the non-optimized-debug compiled code.

My optimized-debug compiled executable is simply built with the flag -DDEBUG=ON. My non-optimized debug compiled code uses an edited CMakeLists.txt to avoid optimization:

IF(CMAKE_COMPILER_IS_GNUCC)
  SET(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0")
  SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -O0")
ENDIF(CMAKE_COMPILER_IS_GNUCC)

The attached zip contains:
  mwe.py // input script
  liteSpecimen2mm.spheres // packing file
  jointSurf.stl // stl for smooth joint
  coreDump2.txt // core dump after executing mwe.py with optimized debug compiled yade

Any assistance with this bug is greatly appreciated.

Revision history for this message
Robert Caulk (rcaulk) wrote :
Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :

Thanks for reporting.
The behavior you describe is a typical result of manipulating bad pointers. It crashes or not, sometimes randomly, sometimes depending on compilation options, and the crash can be delayed a little so that it seems to occur a few lines later.

Changed in yade:
assignee: nobody → Bruno Chareyre (bruno-chareyre)
Revision history for this message
Robert Caulk (rcaulk) wrote :

I've narrowed it down to the cell1 and cell2 declarations. These objects are not playing nicely with CGAL's is_infinite, mirror_index, and neighbor functions.

Here is my working theory:

The cell1 and cell2 handles originate down at the facet_circulator on line 195 where an arbitrary facet is defined by dereferencing the edge of interest.

195: RTriangulation::Facet_circulator facet1 = Tri.incident_facets(*edge);

This is where I believe there may be a problem. facet1 does not appear to be the address of a facet, instead it appears to be the circulator. Am I right to say that in this case *facet1 is the facet that we are interested in using [1]? If so, we should be able to change lines 174 and 175 to:

const CellHandle& cell1 = *facet->first;
const CellHandle& cell2 = *facet->first->neighbor(*facet->second);

But of course, the compiler says this is an invalid initialization of reference.

[1] http://doc.cgal.org/latest/Circulator/index.html

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote : Re: [Bug 1666339] Re: DFNflow crashes for compiled trunk but not non-optimized debug compiled trunk

On 02/23/2017 11:24 PM, Robert Caulk wrote:
> 195: RTriangulation::Facet_circulator facet1 =
> Tri.incident_facets(*edge);
>
> This is where I believe there may be a problem. facet1 does not appear
> to be the address of a facet, instead it appears to be the circulator.
The "circulator" is just an iterator (i.e. a pointer) which cyclically
goes back to the same point.
So, "facet->first" is ok.
If "facet" was really a facet (not a circulator) it would be "facet.first".

> Am I right to say that in this case *facet1 is the facet that we are
> interested in using [1]? If so, we should be able to change lines 174
> and 175 to:
>
> const CellHandle& cell1 = *facet->first;
> const CellHandle& cell2 = *facet->first->neighbor(*facet->second);
>
> But of course, the compiler says this is an invalid initialization of
> reference.
If your idea was correct the current code would not work at all, you
need to include in the picture the fact that this same code actually
works for us without problem.
So it is more a matter of finding which special case makes the problem
appear, keeping in mind that it usually works.
You are maybe not far: what happens if during the circulation about one
edge we find a facet incident to the infinite vertex, for instance?
There can be special cases of this sort.

Could you output more data on the crashing facet? Which are the vertices
connected to it, etc. It may ring a bell about which special case need
to be handled differently.

Bruno

Revision history for this message
Robert Caulk (rcaulk) wrote :

First, thank you Bruno for the tips and please excuse my developing understanding of C++, CGAL, and YADE source.

>If your idea was correct the current code would not work at all, you
>need to include in the picture the fact that this same code actually
>works for us without problem.
I should mention that the attached mwe.py is my attempt at creating the most basic DFNflow model possible by avoiding any special cases that may extend beyond the original functionality. Since you mention that this code does work fine for you guys, it is possible that my input script contains an error.

Bug update:

>Could you output more data on the crashing facet? Which are the vertices
>connected to it, etc. It may ring a bell about which special case need
>to be handled differently.
The crash occurs no matter which facet is currently being used to declare cell1 and cell2. Based on my debugging, the crash occurs even when the facet maps to nonfictitious cells. The problem occurs in the declaration of cell1 (which should simply grab the cellhandle of the facet using "facet->first"). Instead, it is declaring some junky values within cell1 despite facet1 pointing to a perfectly normal cell. Once the junky cellhandle is passed to CGAL, it throws the error.

So I've managed to fix the problem by dereferencing the circulator to obtain the facet, and then grabbing the cellhandle from the facet directly [1]. It appears to do exactly what the original code is doing, but in a more verbose manner. However, cell1 is no longer filled with junk values and I am no longer getting a segmentation fault.

Obviously, this is strange since the original code was already working for other people, but this fix suggests that it should not be. Since I've already associated this bug with the compiler optimization level, it might be possible that my compiler is optimizing the code differently than it was intended to when this was written in 2014. Also, I will continue to investigate my input script for possible mistakes.

[1] void DFNFlowEngine::trickPermeability(RTriangulation::Facet_circulator& facet, Real aperture, Real residualAperture)
{
 const RTriangulation::Facet& currentFacet = *facet; // seems verbose but facet->first was declaring a junk cell and crashing program
 const RTriangulation& Tri = solver->T[solver->currentTes].Triangulation();
 const CellHandle& cell1 = currentFacet.first;
 const CellHandle& cell2 = currentFacet.first->neighbor(currentFacet.second);
if ( Tri.is_infinite(cell1) || Tri.is_infinite(cell2)) cerr<<"Infinite cell found in trickPermeability, should be handled somehow, maybe"<<endl;
 cell1->info().kNorm()[currentFacet.second]=cell2->info().kNorm()[Tri.mirror_index(cell1, currentFacet.second)] = pow((aperture+residualAperture),3)/

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :

Hi,
I agree that it sounds like some sort of gcc bug, or maybe a bug in circulators implementation.
Two versions of a code doing obviously the same thing but one of them crash, I've seen that before.
If I have a chance I'll send you a short "hello world" program with c++/cgal to see if you can reproduce the bug with your system. If so we can report it.

If the above "verbose" code fix the problem that is great, I'll push it to trunk.

Bruno

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Hi,

it can also be a bug in a specific version of CGAL. To exclude a compiler error, you can try to compile the Yade using Clang (instructions in the documentation).

Anton

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :

Hi Robert,
I was about to commit your fix, then I had another another guess.

Could you remove the pass-by-reference for the facet circulator in the function's signature?
void DFNFlowEngine::trickPermeability(RTriangulation::Facet_circulator facet, ...)
instead of
void DFNFlowEngine::trickPermeability(RTriangulation::Facet_circulator& facet, ...)

If it works I'll go for this second option since it looks a bit less magic (I can admit that circulators involve some in-place data which is lost when passed by reference, although I can't imagine how exactly).

Bruno

Changed in yade:
status: New → Fix Committed
Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :
Revision history for this message
Robert Caulk (rcaulk) wrote :

Hey Bruno,

The compiler does not let me compile if I remove the pass-by-reference as you suggest.

I have yet to run into another issue w.r.t this bug, so we can stick with the "magic" fix for now :-). Thank you for the assistance and for committing the fix.

Best,

Robert

Changed in yade:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.