Performance optimization of InsertionSortCollider

Bug #729079 reported by Anton Gladky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Yade
Fix Released
Wishlist
Unassigned

Bug Description

Sergei Dorofeenko (https://launchpad.net/~sergei.dorofeenko) found, that InsertionSortCollider is probably is a "bottle neck" in simulations with >10^5 number of particles even in many-threads mode.

http://<email address hidden>/msg06573.html

Сitation:
"...I did a perfomance test for parallel mode and results in no good.

Performance boost only about 40% from 1 thread to 4 thread for 200k particles... Cause is a non-parallelised InsertionSortCollider, who need about 80% time with 4 threads.

Results attached."

Revision history for this message
Anton Gladky (gladky-anton) wrote :
Revision history for this message
Anton Gladky (gladky-anton) wrote :
Revision history for this message
Anton Gladky (gladky-anton) wrote :
Changed in yade:
importance: Undecided → Wishlist
Revision history for this message
Anton Gladky (gladky-anton) wrote :

Sergei (and others), could you not, please, test this simple patch. It increases the calculation speed from 10% till 50%(!) for me.

If it is ok, we will commit it to the trunk.

Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

> Yes, it is the insertion sort as in the collider.

Thanks,

I have created a branch and placed there this code for possible
following modifications of the algorithm.
https://code.launchpad.net/~yade-dev/yade/insertion-sort-test

To check it just:

scons; ./insertion-sort-test

Anton

Revision history for this message
Anton Gladky (gladky-anton) wrote :

I have implemented "pre-sorting" algorithm, which starts in parallel mode.
The "clean" sorting is done in 1-thread mode.

https://code.launchpad.net/~yade-dev/yade/insertion-sort-test

The benefit is about 20% of time on artificial tests. We'll see, how it will work in real work.

To test it, you just need to checkout the pointed branch and then "./build.py".

Revision history for this message
Anton Gladky (gladky-anton) wrote :

The following patch makes "pre-sorting" in parallel mode.
The clean final sorting is in one-thread mode.

I have done some performance measurements. In 4-thread-mode with 2*10^5 particles the calculation speed is higher on 20%, than without patch. No performance regression with smaller number of particles.

But anyway, sorting mechanism takes the most of calculation time.

Good to hear opinions.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Is it ok for everybody, if I apply this patch?

I do not see any problems and regressions with it.

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote :

How do you got a 20% speedup? I'm got no speedup at all.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

I used yade --performance option. But it needs to be checked yet.

Anton

On Mon, Apr 4, 2011 at 5:44 PM, Sergei Dorofeenko <email address hidden> wrote:
> How do you got a 20% speedup? I'm got no speedup at all.
>
> --
> You received this bug notification because you are a member of Yade
> developers, which is the registrant for Yade.
> https://bugs.launchpad.net/bugs/729079
>
> Title:
>  Performance optimization of InsertionSortCollider
>
> Status in Yet Another Dynamic Engine:
>  New
>
> Bug description:
>  Sergei Dorofeenko  (https://launchpad.net/~sergei.dorofeenko) found,
>  that InsertionSortCollider is probably is a "bottle neck" in
>  simulations with >10^5 number of particles even in many-threads mode.
>
>  http://<email address hidden>/msg06573.html
>
>  Сitation:
>  "...I did a perfomance test for parallel mode and results in no good.
>
>  Performance boost only about 40% from 1 thread to 4 thread for 200k
>  particles... Cause is a non-parallelised InsertionSortCollider, who
>  need about 80% time with 4 threads.
>
>  Results attached."
>
> _______________________________________________
> Mailing list: https://launchpad.net/~yade-dev
> Post to     : <email address hidden>
> Unsubscribe : https://launchpad.net/~yade-dev
> More help   : https://help.launchpad.net/ListHelp
>

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :

The collider can be optimized by tracking displacements of particles instead of recording max velocities, and also by a series of fixes in different places. Collider speedup is up to a factor 2 in some cases (e.g. Sergei's dynamic cloud above) compared with default collider behaviour in r2813. The gain in cpu-time is around 30% max ( Sergei's script, 125k spheres, 4 threads).

Quick question for branch experts: what is the simplest way to commit that to a branch. Reading bzr doc, I have the impression that I need to download a fresh branch wit "bzr branch lp:yade". Can't I do that directly with my local version?

Revision history for this message
Sergei Dorofeenko (sergei.dorofeenko) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

15.04.2011 20:21, Chareyre:
> Quick question for branch experts: what is the simplest way to committhat to a branch. Reading bzr doc, I have the impression that I need todownload a fresh branch wit "bzr branch lp:yade". Can't I do thatdirectly with my local version?
Do you want to create a new branch on launchpad?
If no, and you want to create a new branch on your local system you need:

bzr branch local_trunk_dir new_branch_dir

You can update new branch with lp:yade by

bzr pull lp:yade

Is is what you need?

--
Best regards,
Sergei D.

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

Thanks Sergei, but I meant a new branch on launchpad, so you (and
others) can test it.

Revision history for this message
Anton Gladky (gladky-anton) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

Hi, Bruno.

Try so:
https://code.launchpad.net/yade/+addbranch

create there a new branch and push there your local exemplar.

Anton

On Sat, Apr 16, 2011 at 8:55 PM, Chareyre <email address hidden> wrote:
> Thanks Sergei, but I meant a new branch on launchpad, so you (and
> others) can test it.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/729079
>
> Title:
>  Performance optimization of InsertionSortCollider
>
> Status in Yet Another Dynamic Engine:
>  New
>
> Bug description:
>  Sergei Dorofeenko  (https://launchpad.net/~sergei.dorofeenko) found,
>  that InsertionSortCollider is probably is a "bottle neck" in
>  simulations with >10^5 number of particles even in many-threads mode.
>
>  http://<email address hidden>/msg06573.html
>
>  Сitation:
>  "...I did a perfomance test for parallel mode and results in no good.
>
>  Performance boost only about 40% from 1 thread to 4 thread for 200k
>  particles... Cause is a non-parallelised InsertionSortCollider, who
>  need about 80% time with 4 threads.
>
>  Results attached."
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/yade/+bug/729079/+subscribe
>

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

Ok, so you don't need to create a branch via launchpad interface finally.
You just type "bzr push --no-strict lp:~bruno-chareyre/yade/collide",
and it creates a branch called "collide".

You can get this branch with "bzr branch lp:~bruno-chareyre/yade/collide".

There are different sort of the changes (I'll need to clean before merging):
- 1. some are between #ifdef ORI_VERLET guards (mostly those adding some
data to bounds and engines), ORI_VERLET is defined by default, in Bound.hpp
- 2. some are hardcoded and apply always (whatever nBin, ORI_VERLET,
Collider::oriVerlet)
- 3. some are activated or not at runtime, depending on attribute
oriVerlet of the collider. If oriVerlet=false (true by default), the old
bins algorithm is used, but it is still improved on the basis of 1 and 2.

Using default behaviour of the branch will give some of the best
performance according to my tests (using the default verletDist=-0.15),
although verletDist<-0.15 can give better performance in some case (very
high number of particles).
The best speedup is apparently obtained with 100k particles and more,
but I didn't do any serious benchmark yet.

Any feedback welcome.

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

On 216k bodies (N=50 in Sergei's script) and 3 threads, the total
speedup is x1.75!
The collider speedup is x3.5, but more time is spent in interactions
(since verletDist is higher): x0.72 speed-down.
Performance should be even better with more threads, since the
collider/dispatcher are better balanced (50/50 with 3 threads).
I'm currently testing 4 threads.

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote : Re: [Yade-dev] [Bug 729079] Re: Performance optimization of InsertionSortCollider

> On 216k bodies (N=50 in Sergei's script) and 3 threads, the total
N=60, sorry.
With 4 threads, the speedup is ~x2.

Revision history for this message
Anton Gladky (gladky-anton) wrote :

Hi Bruno,

there is a comparison of new and old collider.
A newer one is slower, but it can be because of
tons of warnings like this:

931937 ERROR yade.InteractionContainer /home/gladk/dem/yade/cleanCompBruno/yade/core/InteractionContainer.cpp:55 erase: InteractionContainer::erase: attempt to delete non-existent interaction ##14889+15066

It can be related to the bug 813925 [1]

The script, what I used, creates and removes bodies very actively.

Speed, iter/sec
-j Old collider New_Collider:
1 7.94 6.12
2 13.85 11.4
3 14.83 14.2

[1] https://bugs.launchpad.net/yade/+bug/813925

Anton

Revision history for this message
Bruno Chareyre (bruno-chareyre) wrote :
Changed in yade:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.