crash after add particles in parallel mode

Bug #724396 reported by Sergei Dorofeenko
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Yade
Fix Released
High
Unassigned

Bug Description

Yade crash in parallel mode if to add particles after run of simulation.
Attached script runs without errors in serial mode (-j 1 option),
but in parallel mode (-j 2 or more) it crash at second O.run.

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :
Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

Seems, something wrong with ForceContainer, see attached log file.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Hi, Sergei.

I have tried your script, but I cannot reproduce the crash.

Will try on other systems.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

I have tried it on Ubuntu 10.04

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

Seems, cause is next:

After adding new particles, the first thread runs on the "old" particles, so without resizing of _forceData in ensureSize() (ForceContainer.hpp:87)
And at time, when first thread adds new force to _forceData, second thread starts to run on the "new" particles and do resize of _forceData in ensureSize(). After this first thread have segfault.

Bug fixed after adding "forces.resize(bodies->size());" to Scene.cpp, after line 76.

Vaclav, could not you fix it?

Revision history for this message
Václav Šmilauer (eudoxos) wrote :

Yes, you're right, the locking logic in ForceContainer::resize is not sufficient. What you propose does not solve the problem of adding bodies inside the loop, though -- that is what a factory does, for instance.

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

 I want to rewrite ensureSize() in order to the each thread do resize only the "own" vector, not all.
So, no need locking at all. Vaclav, do you agree?

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

Apparently, _forceData will not store vvector, but vvector*.
Because, if it will store vvector (as now) then _forceData[A].resize(N) can affect _forceData[B]. Or no?

Revision history for this message
Václav Šmilauer (eudoxos) wrote :

I don't think so. I think sizeof(vector<T>) is always sizeof(size_t)+sizeof(void*), the array is allocated on the heap dynamically. For the independent resizing: sounds fine, though sizes should be re-synced in sync() before the summation loop, otherwise it will run slower due to in-loop size checks.

Changed in yade:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Anton Gladky (gladky-anton) wrote :

I have done some performance tests on 6-core machine.

The simulation have been carried out with different thread numbers and execution time has been measured. In the first case it was the latest trunk revision (2781) clean compile, in the second case the ForceContainer used only "non-parallel flavor" even in multy-threading mode, so the "parallel flavor" have been deleted from ForceContainer.hpp (patch provided).

Number of bodies was 47412, number of iterations was 5000.

Results are presented below (execution time).

-j Parallel Non-Parallel %
------------------------------------------------------------
1 997 984 1.3
2 821 800 2.6
3 764 736 3.6
4 752 709 5,7
5 745 699 6,2
6 742 693 6,6

So using non-parallel ForceContainer makes the simulation sufficiently faster (from 1 to 6% and more).

My proposal is to delete parallel implementation of ForceContainer. It will increase the productivity and "fixes" this bug.

What ideas and opinions?

Revision history for this message
Anton Gladky (gladky-anton) wrote :
Revision history for this message
Anton Gladky (gladky-anton) wrote :
Revision history for this message
Anton Gladky (gladky-anton) wrote :
Revision history for this message
Václav Šmilauer (eudoxos) wrote :

Do you mean that you used non-parallel ForceContainer with OpenMP-enabled simulation run with multiple threads? If so... threads could have been overwriting each other's data. https://www.yade-dem.org/doc/prog.html#forcecontainer explains why threads must have separate data (unless you want to use locking, which will be very very likely much less efficient).

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Yes, you are right. The proposed way breaks simulations (--check does not path with j>1).

I have done locks in non-parallel implementation, and the performance goes down, as you said:

-j Parallel Non-Parallel %
------------------------------------
1 997 1002 -0.5
2 821 842 -2.6
3 764 825 -8
4 752 908 -20.7
5 745 1132 -51.9
6 742 1180 -59

Revision history for this message
Anton Gladky (gladky-anton) wrote :

I tried to implement an idea, indicated here: https://bugs.launchpad.net/yade/+bug/724396/comments/7

Sergei, could you not, please, test it?
Vaclav, could you not, please, have a look at it?

Patch is attached.

Thanks

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

Fixed path is attached.
Compilation and test model.py from this thread is ok.
Need test of reading forces from python and perfomance benchmark.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

The last patch gives a segmentation fault in --test:

yade --test
Math: Matrix3 operations ... ok
Math: Quaternion operations ... ok
Math: Vector2 operations ... ok
Math: Vector3 operations ... ok
Core: correct types are instantiated ... ok
Core: dispatcher ctors with functors ... ok
Core: Attr::hidden ... ok
Core: InteractionLoop special ctor ... ok
Core: invalid attribute access raises AttributeError ... ok
Core: Attr::noSave ... Segmentation fault

I dont have a debug build for the moment to provide a backtrace.

Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Bug 724396] Re: crash after add particles in parallel mode

Backtrace is attached

Anton

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

fixed

Revision history for this message
Anton Gladky (gladky-anton) wrote :

I have checked the last patch, all is working well for me!
The performance is slightly higher now, than without patch, so I think we did not break the performance.

I modified model.py a little bit (attached), now it reads forces of every body on each step. I did not see any problem with that as well. So our doubts about incorrect work of get*Single() functions are empty (I hope).

Actually, I have got finally a crash on unpatched version of YADE, there is no problem on patched version.

Vaclav, could you not, please, have a look at last patch https://bugs.launchpad.net/yade/+bug/724396/+attachment/1908010/+files/fix_724396-v3.diff

If it is ok, we will apply it on all active branches of Yade.

Thanks.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Fixed in r2788.

Thanks all!

Changed in yade:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers