Rare connection hang, on joining a hub

Bug #190964 reported by MikeJJ
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
AirDC++
Confirmed
Undecided
Unassigned
DC++
New
Low
Unassigned

Bug Description

This happened in 0.699 a lot more than now, but it still happening in svn 1000.
Sometimes (not very often) on connecting to a hub it sticks. (i did report this on bugzilla just before it disappeared).

From hubframe

[14:34:17] *** An existing connection was forcibly closed by the remote host.
[14:34:21] *** Connecting to blah...
[14:34:22] *** Connected
[15:33:50] *** Disconnected <---- manual reconnect done here
[15:33:50] *** Connecting to blah...
[15:33:51] *** Connected
[15:33:54] *** Stored password sent...

Tags: core
MikeJJ (mrmikejj)
Changed in dcplusplus:
importance: Undecided → Low
Revision history for this message
Fredrik Ullner (ullner) wrote :

Todd experienced this on both ADC and NMDC hubs (with 0.699.) Do you as well, with SVN 1000 (or I assume the later revisions)?

Revision history for this message
MikeJJ (mrmikejj) wrote :

I did see it with svn 1000 a few times. Got it a lot with 0.699.
Not seen it yet with svn version i currently using (1026 i think). :)

Revision history for this message
MikeJJ (mrmikejj) wrote :

Just happened now with svn 1032. :(

Revision history for this message
Jacek Sieka (arnetheduck) wrote :

is this the same as bug 185549?

Revision history for this message
MikeJJ (mrmikejj) wrote :

I'm not really sure about the "inf" and "remaining in identify" bits, but this one happens with nmdc hubs (not sure about adc), and this also happens with generally every hub, not just restricted to localhost.

Revision history for this message
Jacek Sieka (arnetheduck) wrote :

technically the fix I committed affects both nmdc and adc - if anyone reproduces this after bzr 1218, please mark this bug confirmed...

Changed in dcplusplus:
status: New → Incomplete
Revision history for this message
MikeJJ (mrmikejj) wrote :

Happened 2 hours ago with bzr 1218, i left it on the "Connected" state to see if it would recover, but it didn't.

Changed in dcplusplus:
status: Incomplete → Confirmed
Revision history for this message
MikeJJ (mrmikejj) wrote :

carmatic wrote in duplicate https://bugs.launchpad.net/dcplusplus/+bug/271716 :

i have a very problematic internet connection and many times, the connection attempt will stop at 'connected' without getting the hub's name, the userlist, etc etc... those hubs which do get connected properly would often then get the connection dropped, and when dc++ tries to reconnect, it will end up stuck at just 'connected' ... so sooner or later all the hubs will end up 'connected' without actually being properly connected
i suggest that unless dc++ gets the userlist or the hub title ,or anything which is a telltale sign of a proper connection ,within a set amount of time, it will automatically reconnect as it suggests that there is a connection problem... maybe the timeout can be like 5 minutes to allow for slow but functional connections and to stop the hubs from being suspicous from the excessive reconnection attempts...
what do you guys think?

Revision history for this message
Honzik (antosj) wrote :

Hello, I am having this exact problem on 95% of connection attempts. Also, dc++ seems to have problem to keep the connection to the hub - it only works for few minutes and then I get disconnected by the hub. My connection is realiable - i very rarely have any problems with it. I am using version v0.707

Revision history for this message
Adrian Moș (adimosh) wrote :

I'd like to offer my 2 cents on this problem.

This happens to me almost every evening and I can blame my crappy ISP for that one. Here's what happens: every evening, between 8 and 10PM (approximately), my ISP starts packet-losing heavily because (naturally) most users on the network connect during that time (just for laughs: I am on the "old" system using coaxial cable+modem as connection emthod; the "new" system uses fiber optic upto the blocks of flats, then descending with UTP cable to the apartments; the "new" system also uses PPPoE as mandatory, and it so happens that during the "connection rush" every evening, many will be left out with the error "Not enough IP addresses in pool"; this to realize just how crappy my ISP is).

Anyway... every now and then the connection works for one or two ping packets, then starts packet-losing again. If there is a connection to a hub in progress during this time, chances are that it will remain hung precisely as MikeJJ pointed out in the first post. I've also noticed that this happens a lot more often if the client is on a PC behind a router.

Sockets will (unfortunately) not timeout in this case if they are expecting to receive something. I've seen this strange behavior not only in DC++ (and every DC client based on it that I've tested until now - StrongDC++ (not sure about the latest version, though), ApexDC++, RSX++ (this seems to be the one in which it happens the most), zK++, AirDC++), but also in other software (some of which I myself created) that does not implement a connection timeout mechanism above the standard sockets connection timeout.

What I suggest as a fix: a "LastCommunicationTime" value associated with every hub connection; If this goes past a certain limit (for instance 10 minutes), make the client send something harmless to the hub (kind of like the NOOP in FTP). If the hub responds, all's well. If it doesn't, connection has failed, so close and release associated sockets. This value should be updated every time there is an active communication to/from the hub.

I ask for this sollution: will it induce a lot of stress on the client? In theory, it should not slow down in any way noticeable (considering that nobody sane would attempt connecting to more than, say, 100-150 hubs). It also shouldn't impact on the memory or CPU usage very much. Also, it can be reduced to just this value being updated only when receiving data from the hub (in the software over which I have complete source control I noticed it happens when waiting to receive; for reasons unknown to me, certain software considered that it had sent the data correctly, although no data ever got to the other party).

Does this sound too difficult / unworthy ?

PS: if this is already implemented, it isn't working properly.
PS2: under normal circumstances, this system should not generate any extra traffic, considering that on hubs with even a few users, data is received from the hub quite often, so the NOOP-like message should never have the reason to be sent.

Petrus (trucker-boy550)
description: updated
poy (poy)
Changed in dcplusplus:
status: Confirmed → Fix Committed
Revision history for this message
poy (poy) wrote :

Fixed in DC++ 0.820.

Changed in dcplusplus:
status: Fix Committed → Fix Released
Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

To me it seems that it is still not fixed. I am running r3336 for the time being and I experienced the following:

[23:07] *** Connecting to <DcHub://********:***>... <-- connection attempt at the startup of the client
[23:07] *** Connected
[16:21] *** Disconnected <-- manually reconnected to the hub
[16:21] *** Connecting to <DcHub://*********:***>...
[16:21] *** Connected
[16:21] <****> ***************** (RunTime: 1weeks 6days / Current user count: 46)

Obviously the hub was available at the time of the first connection trial.

Here is another proof from the same client session that shows this not only happens at startup (considering that many connections need to be establised when the client is being started), but can occur at any time:

[10:48] *** An existing connection was forcibly closed by the remote host.
[10:50] *** Connecting to <******:***>...
[10:50] *** Connected
[10:50] <[*****]> This hub is running Aquila Version 0.1.11-pre2-beta2 (Uptime 10 weeks, 02:01:04.042).
[10:50] *** Connection closed
[10:52] *** Connecting to <******:***>...
[10:53] *** Connection timeout
[10:55] *** Connecting to <******:***>...
[10:55] *** Connected
[16:24] *** Disconnected <-- manual reconnect issued
[16:24] *** Connecting to <******:***>...
[16:24] *** Connected
[16:24] <[*****]> This hub is running Aquila Version 0.1.11-pre2-beta2 (Uptime 10 weeks, 07:35:29.065).

eMTee (realprogger)
Changed in dcplusplus:
status: Fix Released → New
Fredrik Ullner (ullner)
tags: added: core
Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

While I suppose this one is rather hard to fix, too, kindly check whether the following proposal is good enough to be implemented to finally get rid of this issue.
If I am not mistaken, when the client is truly connected, then there must be users available on the hub, at least one, which is the client itself.
Based on this I was thinking of whether it would be possible to do a user count several seconds after the client receives the connected state, and if the amount of the users cannot be determined (or equals zero), then an automated reconnect should be issued by the client itself.

maksis (maksis)
Changed in airdcpp:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.