segfault on appending facets with active capillary law

Bug #1105177 reported by Christian Jakob
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Yade
Fix Released
Medium
Unassigned

Bug Description

Hi,

I get a segfault when running a script in a loop as follows:

O.load( ... )
 ...

for cc in range (0,1):
 O.run(100,True)
 ...

The first execution of O.run() works well. Segfault occurs, when cc = 1.

I commented out everything, that could be responsible for the segfault, but it always happens with this loop. It is possible to run simulation without segfault by:

O.run(1000000)

No problem, but I have no chance with the loop.

I compiled the same yade version in debug mode to get additional informations:

###################### start

christian@fast-machine:~/YADE/my-yade-projects/10-prediction-model$ yade201209-debug 0-MASTERtest2.py
Welcome to Yade Unknown%s
TCP python prompt on localhost:9001, auth cookie `ckyasd'
XMLRPC info provider on http://localhost:21001
/home/christian/YADE/YADEgit-20120906-debug/lib/yade-Unknown/py/yade/__init__.py:14: RuntimeWarning: to-Python converter for boost::shared_ptr<SnapshotEngine> already registered; second conversion method ignored.

Running script 0-MASTERtest2.py
/home/christian/YADE/YADEgit-20120906-debug/lib/yade-Unknown/py/yade/utils.py:55: UserWarning: Overwriting yade.params.values which already exists.
  if mark in yade.params.__dict__: warnings.warn('Overwriting yade.params.%s which already exists.'%mark)
SIGSEGV/SIGABRT handler called; gdb batch file is `/tmp/yade-k0c8qG/tmp-0'
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffc13f9d700 (LWP 8540)]
[New Thread 0x7ffc2276a700 (LWP 8537)]
[New Thread 0x7ffc22f6b700 (LWP 8536)]
0x00007ffc4c4788ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: Datei oder Verzeichnis nicht gefunden.
No symbol "info" in current context.

Thread 4 (Thread 0x7ffc22f6b700 (LWP 8536)):
#0 0x00007ffc4b914573 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000507518 in ?? ()
#2 0x00000000004b074e in PyEval_EvalFrameEx ()
#3 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#4 0x00000000004acb98 in PyEval_EvalFrameEx ()
#5 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#6 0x00000000004acb98 in PyEval_EvalFrameEx ()
#7 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#8 0x00000000004b4b4c in ?? ()
#9 0x0000000000481cc4 in ?? ()
#10 0x0000000000460d0e in PyEval_CallObjectWithKeywords ()
#11 0x00007ffc22f76102 in ?? () from /usr/lib/python2.7/dist-packages/sip.so
#12 0x00007ffc23220887 in ?? () from /usr/lib/python2.7/dist-packages/PyQt4/QtCore.so
#13 0x00007ffc232617b0 in ?? () from /usr/lib/python2.7/dist-packages/PyQt4/QtCore.so
#14 0x00007ffc3b8c2d0b in ?? () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#15 0x00007ffc4c470b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#16 0x00007ffc4b91aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ffc2276a700 (LWP 8537)):
#0 0x00007ffc4b914573 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000507518 in ?? ()
#2 0x00000000004b074e in PyEval_EvalFrameEx ()
#3 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#4 0x00000000004acb98 in PyEval_EvalFrameEx ()
#5 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#6 0x00000000004acb98 in PyEval_EvalFrameEx ()
#7 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#8 0x00000000004b4b4c in ?? ()
#9 0x0000000000481cc4 in ?? ()
#10 0x0000000000460d0e in PyEval_CallObjectWithKeywords ()
#11 0x00007ffc22f76102 in ?? () from /usr/lib/python2.7/dist-packages/sip.so
#12 0x00007ffc23220887 in ?? () from /usr/lib/python2.7/dist-packages/PyQt4/QtCore.so
#13 0x00007ffc232617b0 in ?? () from /usr/lib/python2.7/dist-packages/PyQt4/QtCore.so
#14 0x00007ffc3b8c2d0b in ?? () from /usr/lib/x86_64-linux-gnu/libQtCore.so.4
#15 0x00007ffc4c470b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#16 0x00007ffc4b91aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ffc13f9d700 (LWP 8540)):
#0 0x00007ffc4b8eb7fd in __libc_waitpid (pid=8541, stat_loc=<optimized out>, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:41
#1 0x00007ffc4b87fc99 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2 0x00007ffc4b87ffd0 in __libc_system (line=<optimized out>) at ../sysdeps/posix/system.c:190
#3 0x00007ffc4b0571a3 in crashHandler (sig=11) at /home/christian/YADE/yade-trunk-20120906/core/main/pyboot.cpp:21
#4 <signal handler called>
#5 0x00007ffc4901ae19 in std::_List_iterator<boost::shared_ptr<Interaction> >::operator++ (this=0x7ffc13f9c940) at /usr/include/c++/4.7/bits/stl_list.h:156
#6 0x00007ffc4901b4aa in std::list<boost::shared_ptr<Interaction>, std::allocator<boost::shared_ptr<Interaction> > >::remove (this=0x39cc630, __value=...) at /usr/include/c++/4.7/bits/list.tcc:248
#7 0x00007ffc49012735 in BodiesMenisciiList::remove (this=0x38a7e50, interaction=...) at /home/christian/YADE/yade-trunk-20120906/pkg/dem/Law2_ScGeom_CapillaryPhys_Capillarity.cpp:564
#8 0x00007ffc4900fd41 in Law2_ScGeom_CapillaryPhys_Capillarity::action (this=0x38a7e00) at /home/christian/YADE/yade-trunk-20120906/pkg/dem/Law2_ScGeom_CapillaryPhys_Capillarity.cpp:176
#9 0x00007ffc43fc8113 in Scene::moveToNextTimeStep (this=0x38b1180) at /home/christian/YADE/yade-trunk-20120906/core/Scene.cpp:97
#10 0x00007ffc43f79f78 in SimulationFlow::singleAction (this=0x1ad7700) at /home/christian/YADE/yade-trunk-20120906/core/SimulationFlow.cpp:21
#11 0x00007ffc43fbef46 in ThreadWorker::callSingleAction (this=0x1ad7700) at /home/christian/YADE/yade-trunk-20120906/core/ThreadWorker.cpp:71
#12 0x00007ffc43f4aae5 in ThreadRunner::call (this=0x3bf2c50) at /home/christian/YADE/yade-trunk-20120906/core/ThreadRunner.cpp:54
#13 0x00007ffc43f4a8f1 in ThreadRunner::run (this=0x3bf2c50) at /home/christian/YADE/yade-trunk-20120906/core/ThreadRunner.cpp:28
#14 0x00007ffc43f4c38f in boost::_mfi::mf0<void, ThreadRunner>::operator() (this=0x388f410, p=0x3bf2c50) at /usr/include/boost/bind/mem_fn_template.hpp:49
#15 0x00007ffc43f4bf78 in boost::_bi::list1<boost::_bi::value<ThreadRunner*> >::operator()<boost::_mfi::mf0<void, ThreadRunner>, boost::_bi::list0> (this=0x388f420, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
#16 0x00007ffc43f4bd2d in boost::_bi::bind_t<void, boost::_mfi::mf0<void, ThreadRunner>, boost::_bi::list1<boost::_bi::value<ThreadRunner*> > >::operator() (this=0x388f410) at /usr/include/boost/bind/bind_template.hpp:20
#17 0x00007ffc43f4bb72 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, ThreadRunner>, boost::_bi::list1<boost::_bi::value<ThreadRunner*> > >, void>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153
#18 0x00007ffc43f4c5f6 in boost::function0<void>::operator() (this=0x388f408) at /usr/include/boost/function/function_template.hpp:760
#19 0x00007ffc43f4c5a2 in boost::detail::thread_data<boost::function0<void> >::run (this=0x388f280) at /usr/include/boost/thread/detail/thread.hpp:62
#20 0x00007ffc42e49169 in ?? () from /usr/lib/libboost_thread.so.1.49.0
#21 0x00007ffc4c470b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#22 0x00007ffc4b91aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#23 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffc4c87e700 (LWP 8533)):
#0 0x00007ffc4c4788ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007ffc2949283c in pyOmega::wait (this=0x1a77170) at /home/christian/YADE/yade-trunk-20120906/py/wrapper/yadeWrapper.cpp:373
#2 0x00007ffc294926c1 in pyOmega::run (this=0x1a77170, numIter=100, doWait=true) at /home/christian/YADE/yade-trunk-20120906/py/wrapper/yadeWrapper.cpp:367
#3 0x00007ffc295b4786 in boost::python::detail::invoke<int, void (pyOmega::*)(long, bool), boost::python::arg_from_python<pyOmega&>, boost::python::arg_from_python<long>, boost::python::arg_from_python<bool> > (f=@0x1bac608: (void (pyOmega::*)(pyOmega * const, long, bool)) 0x7ffc2949264c <pyOmega::run(long, bool)>, tc=..., ac0=..., ac1=...) at /usr/include/boost/python/detail/invoke.hpp:94
#4 0x00007ffc2959667e in boost::python::detail::caller_arity<3u>::impl<void (pyOmega::*)(long, bool), boost::python::default_call_policies, boost::mpl::vector4<void, pyOmega&, long, bool> >::operator() (this=0x1bac608, args_=0x32ea820) at /usr/include/boost/python/detail/caller.hpp:223
#5 0x00007ffc295700b1 in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<void (pyOmega::*)(long, bool), boost::python::default_call_policies, boost::mpl::vector4<void, pyOmega&, long, bool> > >::operator() (this=0x1bac600, args=0x32ea820, kw=0x0) at /usr/include/boost/python/object/py_function.hpp:38
#6 0x00007ffc4308115b in boost::python::objects::function::call(_object*, _object*) const () from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffc43081378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#8 0x00007ffc4308a39b in boost::python::handle_exception_impl(boost::function0<void>) () from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffc4307f635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#10 0x00000000004acc66 in PyEval_EvalFrameEx ()
#11 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#12 0x0000000000536723 in ?? ()
#13 0x0000000000446bf2 in PyRun_FileExFlags ()
#14 0x0000000000446dbc in ?? ()
#15 0x00000000004ac5ce in PyEval_EvalFrameEx ()
#16 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#17 0x0000000000536723 in ?? ()
#18 0x0000000000446bf2 in PyRun_FileExFlags ()
#19 0x0000000000446dbc in ?? ()
#20 0x00000000004ac5ce in PyEval_EvalFrameEx ()
#21 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#22 0x00000000004acb98 in PyEval_EvalFrameEx ()
#23 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#24 0x00000000004acb98 in PyEval_EvalFrameEx ()
#25 0x00000000004b3fd8 in PyEval_EvalCodeEx ()
#26 0x0000000000536723 in ?? ()
#27 0x0000000000446bf2 in PyRun_FileExFlags ()
#28 0x00000000004470ec in PyRun_SimpleFileExFlags ()
#29 0x0000000000447cdc in Py_Main ()
#30 0x00007ffc4b85eead in __libc_start_main (main=<optimized out>, argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff53ea5838) at libc-start.c:228
#31 0x00000000004c7f39 in _start ()
segmentation fault

################end

I see, that in thread 1 py/wrapper/yadeWrapper.cpp:367 and 373 is executed (looking at source code of this yade version tells me, that lines 367 and 371 ... 376 is the code of wait() ...).

Also very curios is, that the same script was working several times before. In the previous script I deleted some particles.
Example:
- script works well for deletion of any amount of particles (clumps)
- script works well for deletion of 20 particles (spheres)
- script work end with segfault for deletion of 15 or 10 particles (spheres) ?!

Regards,

christian

Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Bug 1105177] [NEW] segfault on run() after wait()

Hi Christian,

could you, please, provide the "cutted" version of your script, which
demonstrates the bug?

Thanks,

Anton

Revision history for this message
Christian Jakob (jakob-ifgt) wrote : Re: segfault on run() after wait()

Well, for the cutted version: see above!

I could not reproduce the bug in a small script. It only appears in the complex one, which is too much for posting it here (>> 1000 lines). I will send parts of the script (where I think they can be responsible for this segfault) next week.

Revision history for this message
Christian Jakob (jakob-ifgt) wrote :
Download full text (3.5 KiB)

Hi again,

Maybe the problem is in this part of the code : ?!

### start

O.engines=O.engines+[PyRunner(iterPeriod=9999,command='calm()',label='calmRunner')]

def calm_for_damping():
 print 'Change damping values step 1: run to equilibrium (iter period = 999) ......................................'
 calmRunner.dead = False
 calmRunner.iterPeriod = 999
 run_to_equilibrium()

 print 'Change damping values step 2: run to equilibrium (iter period = 9999) ......................................'
 calmRunner.iterPeriod = 9999
 run_to_equilibrium()

 print 'Change damping values step 3: run to equilibrium (iter period = 99999) ......................................'
 calmRunner.iterPeriod = 99999
 run_to_equilibrium()

 print 'Change damping values step 4: run to equilibrium (calm function deactivated) ......................................'
 calmRunner.dead=True
 run_to_equilibrium()

integrator.damping=local_damping # integrator = NewtonIntegrator / local_damping = 0.3
print 'local damping changed to: ',local_damping
calm_for_damping()

###end

or it is here (erasing some particles):

###start

 c_deleted = 0
 center_point = []
 for kk in range(0,number_macropores):
  z_o,V_body,V_model,poro_now = get_state_info(id_clump,V_load_clump)
  #get center of the model:
  center_point.append(Vector3((x_cu-x_cl)/2,(y_cu-y_cl)/2,(kk+1)*(z_o-z_cl+shift_l)/(number_macropores+1)))
  for jj in range(0,how_many_to_delete):
   min_dist = 1e9
   c_deleted += 1
   for b in O.bodies:
    if isinstance(b.shape,Sphere):

     #get erase pointers from particles nearest to center_point:

     vec_dist=b.state.pos - center_point[kk]
     dist = abs(vec_dist.norm())

     if dist < min_dist:
      min_dist = dist
      if (b.isClumpMember):
       erase_clump_id = b.clumpId #this is id of the clump, where sphere is part of
      erase_id = b.id
      erase_pointer = b
      dist_erased = dist
   #erase possible clump first to avoid a segmentation fault:
   if erase_pointer.isClumpMember:
    if erase_clump_id == id_clump:
     print '\033[31mCaution! The sphere, that should be erased is part of load clump with id ',id_clump,'. Nothing is erased!\033[0m'
    else:
     erase_clump_pointer = O.bodies[erase_clump_id]
     id_member1 = erase_clump_pointer.shape.members.keys()[0]
     id_member2 = erase_clump_pointer.shape.members.keys()[1]
     for ii in [id_member1,id_member2]: #erase all interactions between clump members and neighbor particles (see neverErase flag)
      for i in O.bodies[ii].intrs():
       O.interactions.erase(i.id1,i.id2)
     O.bodies.erase(id_member1) #erase body 1 of clump
     O.bodies.erase(id_member2) #erase body 2 of clump
     print 'Bodies ',id_member1,' and ',id_member2,' erased (part of clump ',erase_clump_id,'with distance ',dist_erased,' from center)'
     O.bodies.erase(erase_clump_id) #erase clump
     print 'Body was part of clump ',erase_clump_id,', so this clump has also been erased!'

   #erase single sphere:
   else:
    for i in erase_pointer.intrs(): #erase all interactions between sphere and neighbor particles
     O.interactions.erase(i.id1,i.id2)
    O.bodies.erase(erase_id) #erase sphere
    print 'Body ',er...

Read more...

Revision history for this message
Christian Jakob (jakob-ifgt) wrote :

Hello again,

I was able to reproduce the bug with a small script (see attachment). The problem occurs, when CapPhys is active and all if conditions are set to 1. You will get a "WARNING: cannot open files used for capillary law, ..." For this example it is not neccessary to use the files, that are needed for cap. law. Just ignore this warning ...

You should be able to load the savefile with the newest yade version (af444722f5).

I have no idea, why this happens.

Please help me on this.

Regards,

Christian

Revision history for this message
Christian Jakob (jakob-ifgt) wrote :

oops, i detected a small mistake in the attached script: move line "once = 0" to a place before the "for ..." loop. sorry for this.

Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Bug 1105177] Re: segfault on run() after wait()

It crashes here [1]:

interactionsOnBody[interaction->getId1()].remove(interaction);

I do not know the logic of this interactionsOnBody list. But it definitely
conflicts with the current implementation of interactioncontainer.

Anton

[1] https://github.com/yade/trunk/blob/master/pkg/dem/Law2_ScGeom_CapillaryPhys_Capillarity.cpp#L562

Revision history for this message
Christian Jakob (jakob-ifgt) wrote : Re: segfault on run() after wait()

Thank a lot anton. this is a useful information.
I do not understand why a remove() method in cap. phys. is executed, while inserting facets ... ?!

summary: - segfault on run() after wait()
+ segfault on appending facets with active capillary law
Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Bug 1105177] Re: segfault on run() after wait()

https://github.com/yade/trunk/blob/master/pkg/dem/Law2_ScGeom_CapillaryPhys_Capillarity.cpp#L176

if ((*ii)->isReal()) {
...
...
} else if (fusionDetection) bodiesMenisciiList.remove((*ii));

it is executed from here

Anton

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :

>> else if (fusionDetection) bodiesMenisciiList.remove((*ii));

Sounds like a call to me, but I have not much time at the moment.
Clearly, this line was typed years ago (by Luc or me) for a specific type of simulation, and it was not anticipated that someone would insert facets in the middle of a simulation, and that it would happen after a change in contact logic...

There is maybe a need to re-design something. I'll try and answer basic questions on this problem but I can't dig into the code immediatly.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Fixed in 5b6667b892

The interactionsOnBody container did not honour adding of new bodies.
Thus crashed, when interactionsOnBody.size()<maxBodyId

Anton

Changed in yade:
status: New → Fix Released
importance: Undecided → Medium
Revision history for this message
Christian Jakob (jakob-ifgt) wrote :

Thank you very much Anton. It works now with the new version.
Also it explains why segfault occurs on 15 deleted ball, but not on 20 deleted balls ...
And it explains why it segfault never happens, when I replaced some balls by clumps ...

Regards,

Christian

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.