Slow startup with too many networks

Bug #1205527 reported by Eduardo rocha
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Haguichi
Fix Released
Medium
Stephen Brandt

Bug Description

- With more than 70 networks with 20 machines each , Haguichi takes a long time to start and to refresh networks (F5).

- I've tested with the following hardware:
  - AMD Fx-8120 3.1GHz (8-cores) with 8GB RAM: about 7 seconds to start.
  - AMD Phenon 2 X4 3.0 GHz with 8GB RAM: about 35 seconds to start.
  - Intel Core2 Quad (4-cores) with 8GB RAM: about 30 seconds to start.
  - AMD Phenon 2 X2 with 8GB RAM: about 1 minute to start.

- During the startup time, the whole machine becomes incredibly slow, and we had to disable the 'Update the network list every X seconds' feature to make it usable after it started.

- Tinkering with the sources, I've found that class Member starts a new Thread in method 'GetLongNick' every time it founds a name longer than 25 characters to call a "hamachi peer" and get the full name.
  - It also may start a new thread calling "hamachi peer" when it needs to get the addressess in method 'GetCompleteAddresses'

- When many threads try to call the "hamachi" client, it is bound to return some "busy" messages, and Haguichi keeps on trying until it gets an actual response, making the situation even worse.

- Proposed solution:
  - Create some kind of 'Event Dispatch Queue/Thread' that processes these events in a single separate thread, instead of creating so many threads.
  - Because the "hamachi" client seems to be single-thread also, I don't think it will actually take longer to fetch all the long names.

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

- Just tried commenting out the lines that start a new Thread in Member.cs, and the results were great!

- On the AMD Phenon 2 X4, it took only 3 seconds to start (instead of 35)

Revision history for this message
Stephen Brandt (ztefn) wrote :

What Hamachi version do you use? GetCompleteAddressesThread is only run on version 2.1.0.81 or older. I would not recommend using these old versions because it trims IP-addresses of peers.

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

- I´m using Hamachi 2.1.0.86 and 2.1.0.101. None of my machines is using 2.1.0.81 or older.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Okay, then I assume we can solely focus on the GetLongNickThread processes?

To get an overview of the situation, how many members with a long (>= 25) nickname are roughly present in your list? Is it only one and the same member which is present in all 70 networks? Or are there many different members with long nicknames unique to each network?

PS: If your network list doesn't contain any extremely sensitive data you could also send me the output of "hamachi list" in a file attached to a private e-mail.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Hi Eduardo,

I have successfully received your e-mail with the output of "hamachi list". As you indicated that the list contains more than 120 unique members with nicknames of 25 characters or more, I think at least the following subjects are candidates for optimization:
* Faster command processes throughput.
* Member.GetLongNickThread: No need to request long nick again when updating if the first 25 characters are unchanged.
* Network.DetermineOwnershipThread: Since version 2.1.0.68 network ownership can be extracted from the network list, full network info is only needed when the owner is "This computer", so that we can read the lock and approval state.

I'm going to address these subjects one by one, so that you can test all changes individually and check if each of them is actually an improvement in your case.

For now I've addressed the first one. Ironically a piece code that waited when a hamachi command was in progress (to prevent Haguichi from "hammering") seems to slow things down. Therefore I made the following revision:
http://bazaar.launchpad.net/~ztefn/haguichi/trunk/revision/354

This should already make a big difference, because all the GetLongNickThreads are executed much more rapidly. But, please confirm if this is true on your machine(s) too.

Changed in haguichi:
status: New → In Progress
assignee: nobody → Stephen Brandt (ztefn)
milestone: none → 1.0.22
importance: Undecided → Medium
Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen!

I've tested your last commit, and your changes seemed to have worked, but it still looks a bit unstable: sometimes the startup is very fast, and sometimes it is still slow. The good thing is that the machine doesn't hang up anymore. =)

Here are my new marks:
  - AMD Fx-8120 3.1GHz (8-cores) with 8GB RAM: from 2 to 7 seconds to start.
  - AMD Phenon 2 X4 3.0 GHz with 8GB RAM: from 3 to 15 seconds to start.

As you can see, in the best case, the startup time dropped from 35 seconds to 3.

I consider it a huge improvement, but the fact that a new thread is created for every unique member that has 25 characters still concerns me (think about a thousand unique members: will a thousand threads still be ok?)

I'll be looking forward to test your future fixes.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Hi Eduardo,

Thanks for your feedback. The random slow startup sometimes is indeed a little spooky. I myself often experience different startup speeds caused by the internet connection check in Controller.HasInternetConnection. I attempted to improve this check a few versions ago by using Dns.GetHostAddresses instead of TcpClient (http://bazaar.launchpad.net/~ztefn/haguichi/1.0/revision/293), but it's still not quite optimal. You can check if this function is the bottle neck by directly returning "true" inside it and see if after that the startup speed is consistent.

Furthermore, I share your concern regarding the number of threads. I've just made a change to add these threads to the thread pool instead of spawning all separate threads:
http://bazaar.launchpad.net/~ztefn/haguichi/trunk/revision/355
I don't expect any speed improvements for this change, but I guess it should prevent (slow) machines from overcooking. =)

Revision history for this message
Stephen Brandt (ztefn) wrote :

Made a revisions that tackles subject two. Updating the network list should now work without any problems:
http://bazaar.launchpad.net/~ztefn/haguichi/trunk/revision/356

Revision history for this message
Stephen Brandt (ztefn) wrote :

Made a revision that tackles subject three. In your case this should circumvent 70 "hamachi network" commands:
http://bazaar.launchpad.net/~ztefn/haguichi/trunk/revision/357

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen!

Tested revisions 356 and 357, and both had about the same results:
  - AMD Fx-8120 3.1GHz (8-cores) with 8GB RAM: from 2 to 7 seconds to start, 8 seconds to load all nick names.
  - AMD Phenon 2 X4 3.0 GHz with 8GB RAM: from 2 to 22 seconds to start, 12 to 32 seconds to load all nick names.
  - AMD Phenon 2 X2 with 8GB RAM : same results from Phenon X4.

With revision 357, I tried returning "true" in Controller.HasInternetConnection, and had these results:
  - AMD Fx-8120 3.1GHz (8-cores) with 8GB RAM: 1 to 2 seconds to start, 3 seconds to load all nick names.
  - AMD Phenon 2 X4 3.0 GHz with 8GB RAM: from 2 to 3 seconds to start, 9-12 seconds to load all nick names.
  - AMD Phenon 2 X2 with 8GB RAM : same results from Phenon X4.

Well, the 'Dns.GetHostAddresses ( "www.google.com" )' in Controller.HasInternetConnection clearly is slowing down a LOT the whole process.

I measured my DNS server using 'time dig www.google.com > /dev/null', and the response took from 100ms to 1200 ms (well, my ISP DNS server seems to not be helping that much... )

I also tried pinging the Google main DNS Server ( time ping -c 1 8.8.8.8 > /dev/null) and OpenDNS.org main DNS Server (time ping -c 1 208.67.222.222 > /dev/null ), and the response rearely took more than 150 ms in both cases.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Interesting! What's the result when you use the following statement in Controller.HasInternetConnection:

return new Ping ().Send ( "8.8.8.8" ).Status == IPStatus.Success;

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Tried adding your "ping" statement, and the results were almost the same as "returning true". =)

Revision history for this message
Stephen Brandt (ztefn) wrote :

I guess that's great news! I've commited the new method:
http://bazaar.launchpad.net/~ztefn/haguichi/trunk/revision/358

Okay, so let's make up the balance. Is everything working smoothly now?

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Well, I do consider the issue completely resolved! My machines are not freezing anymore and the startup time is pretty acceptable!

I think that in the end, the whole app got faster!

Revision history for this message
Stephen Brandt (ztefn) wrote :

Good to hear! I'll mark this bug fixed.

Thanks for all your testing and providing me with some new insights!

Changed in haguichi:
status: In Progress → Fix Committed
Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen!

After all those changes made in revisions 356 and 367, I noticed a strange behavior in wich some nicknames are blank until you hit F5 for a full refresh.

Never saw this before in version 1.0.21, but it happened twice since the last changes, so I thought it may be some kind of regression.

- Screenshot in attachment.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Hi Eduardo,

At first glance I don't see any obvious wholes in the new if-else switch in Member.GetLongNick, so could you tell me first if this happens directly on startup or later after some update cycles?

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen!

I only noticed blank nicknames right after startup, and never experienced then after some update cycles.

Could it be related to the fact that my Hamachi runs as a service and I keep all options disabled in "Preferences -> General" ?

Revision history for this message
Stephen Brandt (ztefn) wrote :

Hi Eduardo,

Could you launch Haguichi from the terminal using:
haguichi -d | grep -C 1 xxx-xxx-xxx

Where xxx-xxx-xxx is the id of any member that gets a blank nickname. And then please paste the output in a comment here or, if you prefer, attach it to an private e-mail.

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen,

The blank nickames occur at random times, so I can't do what you asked because I don't know wich member will get a blank nick, and even lately I haven't noticed this issue anymore.

Anyway, I'll start to launch Haguichi with 'haguichi -d > /tmp/log.txt' and when a blank nick appear, I'll send the log file to you.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Or if you have not "installed" the trunk version but you do have build a binary, you can execute it directly from that location, like:
mono ./Haguichi.exe -d | grep -C 1 xxx-xxx-xxx

Revision history for this message
Stephen Brandt (ztefn) wrote :

Okay, then it might be just an hamachi glitch where the "hamachi peer" command lists an empty nickname. But evidence to confirm this suspicion is very welcome indeed.

Revision history for this message
Stephen Brandt (ztefn) wrote :

Hi Eduardo,

Have you spotted any blank nicknames lately?

Revision history for this message
Eduardo rocha (joao-eduardo-rocha) wrote :

Hi Stephen!

Haven't noticed blank nicknames anymore, maybe because I keep all my networks collapsed most of the time.

If I spot a blank nick I'll send you the logfiles!

Thanks anyway!

Stephen Brandt (ztefn)
Changed in haguichi:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.