Threading issues in hub management

Bug #1194299 reported by maksis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DC++
Fix Released
High
Unassigned

Bug Description

AdcHub::info is called from various different threads (timer, refresh, hub, GUI) but it's not thread safe. lastInfoMap is unprotected while it can be modified from different threads simultaneously. This has been fixed in AirDC++ by syncing the info calls from other threads to the hub thread.

Client::updateCounts is another function that shouldn't be called from different threads because the hub counts may go wrong. This started to cause real problems in AirDC++ when the info calls from TimerManager were synced to the hub threads, which caused them to get executed asynchronously all the time.

Revision history for this message
poy (poy) wrote :

both functions are only called from ClientManager under a locked mutex so it's safe. they are public indeed, which might raise some questions; but the only way for an external party to get a Client instance is to call ClientManager::getClients() which is documented as requiring a lock, so that's safe too.

Changed in dcplusplus:
status: New → Invalid
Revision history for this message
maksis (maksis) wrote :

Sorry, some of those really don't apply to DC++. But what you are saying still isn't true. The info function is called from the hub thread (handling of SID command) and from ClientManager. The crashes that AirDC users were experiencing happened when their file list refresh finished and they were joining hubs on the same time. What comes to updataCounts, it's called from the hub thread (handling of INF and SID commands and various NMDC commands) and from the GUI thread (when the client object is deleted). It's very unlikely to go wrong with that few async calls but still possible.

Changed in dcplusplus:
status: Invalid → New
Revision history for this message
maksis (maksis) wrote :

Ah, tricky things to track :) The SID command won't cause problems with the SID command because the state isn't normal yet. What comes to INF, the state is just changed to normal before the info function is called (which also triggers updateCounts). I'm not sure about NMDC because I don't know it that well.

Revision history for this message
maksis (maksis) wrote :

Another correction for updateCounts: the call from the client destructor won't cause any problems, so only ClientManager and the hub threads are left also with this one.

Revision history for this message
maksis (maksis) wrote :

Patch for ADC with minimum extra locking.

Revision history for this message
maksis (maksis) wrote :

It's not that good idea to call info from inside a lock....

Revision history for this message
poy (poy) wrote :

updateCounts is thread-safe since it uses atomic counts and the Identity class, both of which are thread-safe. :)

however, that info call does look like it might cause problems.

Revision history for this message
maksis (maksis) wrote :

updateCounts is not thread safe what comes to counting the hubs correctly.

 if(countType != COUNT_UNCOUNTED) {
  --counts[countType];
  countType = COUNT_UNCOUNTED;
 }

When two threads access that function simultaneously, the second thread may see the type as uncounted because it has been changed by the first thread. This means that the second thread won't decrease the hub count as it should.

 if(!aRemove) {
  if(getMyIdentity().isOp()) {
   countType = COUNT_OP;
  } else if(getMyIdentity().isRegistered()) {
   countType = COUNT_REGISTERED;
  } else {
   countType = COUNT_NORMAL;
  }
  ++counts[countType];
 }

The count is still incremented by both threads. When AirDC started to call this function simultaneously once per minute, it usually took a day or so before users started to get extra hubs in their tags.

Revision history for this message
poy (poy) wrote :

see rev 3315 (not tested).

Changed in dcplusplus:
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
maksis (maksis) wrote :

Adding the async call probably caused another threading issue. Since the client is destroyed from the GUI thread, the async funtion may be executed for the deleted object because it doesn't go through the listener like the other calls from BufferedSocket. AirDC++ has been modified to pass the deletion function to the socket with the shutdown call as lambda (it might be more simple to pass that as void* though).

Revision history for this message
maksis (maksis) wrote :

Or possibly BufferedSocket should always have a child object that gets deleted when the shutdown function is handled... that would directly make it safe to use the async call for all sockets without taking care of the deletion.

Revision history for this message
poy (poy) wrote :

i thought async calls couldn't run after a disconnect request but a closer look showed me they could... rev 3316 should fix this.

Revision history for this message
maksis (maksis) wrote :

Hmm, is the fix correct? The async call could be in progress when the client object is deleted.

Revision history for this message
poy (poy) wrote :

would the following be enough?

the same lock is entered in the shutdown function, which is called before deleting Client objects.

=== modified file 'dcpp/BufferedSocket.cpp'
--- dcpp/BufferedSocket.cpp 2013-06-25 15:45:51 +0000
+++ dcpp/BufferedSocket.cpp 2013-06-25 17:00:48 +0000
@@ -454,7 +454,7 @@
    return false;

   } else if(p.first == ASYNC_CALL) {
- if(!disconnecting) { static_cast<CallData*>(p.second.get())->f(); }
+ if(!disconnecting) { Lock l(cs); static_cast<CallData*>(p.second.get())->f(); }
    continue;
   }

Revision history for this message
maksis (maksis) wrote :

That's probably stable.. but I also thought that using the listener lock might be more logical so that running async tasks wouldn't block the calling thread when new (async) tasks are queued (BufferedSocked could probably use lockfree queue instead of locking anyway). This hasn't been tested:

=== modified file dcpp/BufferedSocket.cpp
--- dcpp/BufferedSocket.cpp 2013-06-25 15:45:51 +0000
+++ dcpp/BufferedSocket.cpp 2013-06-25 17:36:31 +0000
@@ -454,7 +454,7 @@
    return false;

   } else if(p.first == ASYNC_CALL) {
- if(!disconnecting) { static_cast<CallData*>(p.second.get())->f(); }
+ if(!disconnecting) { Lock l(listenerCS); static_cast<CallData*>(p.second.get())->f(); }
    continue;
   }

=== modified file dcpp/BufferedSocket.h
--- dcpp/BufferedSocket.h 2013-06-25 15:45:51 +0000
+++ dcpp/BufferedSocket.h 2013-06-25 17:36:14 +0000
@@ -63,8 +63,8 @@

  static void putSocket(BufferedSocket* aSock) {
   if(aSock) {
+ aSock->shutdown();
    aSock->removeListeners();
- aSock->shutdown();
   }
  }

Revision history for this message
maksis (maksis) wrote :

No, my patch wouldn't work, better go with yours :p

Revision history for this message
poy (poy) wrote :

in rev 3318. i have a feeling this will need more tweaks... we shall see. :)

Revision history for this message
poy (poy) wrote :

Fixed in DC++ 0.828.

Changed in dcplusplus:
status: Fix Committed → Fix Released
Revision history for this message
maksis (maksis) wrote :

The async call in Client::info is unsafe since it doesn't check that the socket exists

Changed in dcplusplus:
status: Fix Released → New
Revision history for this message
poy (poy) wrote :

it's only being called from 2 places guarded by isConnected so i think it is safe... what do you think of the following patch?

=== modified file 'dcpp/Client.cpp'
--- dcpp/Client.cpp 2013-06-25 11:26:24 +0000
+++ dcpp/Client.cpp 2013-07-30 17:06:00 +0000
@@ -127,7 +127,9 @@
 }

 void Client::info() {
- sock->callAsync([this] { infoImpl(); });
+ if(isConnected()) {
+ sock->callAsync([this] { infoImpl(); });
+ }
 }

 void Client::send(const char* aMessage, size_t aLen) {

=== modified file 'dcpp/ClientManager.cpp'
--- dcpp/ClientManager.cpp 2013-06-25 16:04:21 +0000
+++ dcpp/ClientManager.cpp 2013-07-30 17:05:39 +0000
@@ -459,9 +459,7 @@
 void ClientManager::infoUpdated() {
  Lock l(cs);
  for(auto i: clients) {
- if(i->isConnected()) {
- i->info();
- }
+ i->info();
  }
 }

@@ -551,9 +549,7 @@
  }

  for(auto j: clients) {
- if(j->isConnected()) {
- j->info();
- }
+ j->info();
  }
 }

Revision history for this message
maksis (maksis) wrote :

Nevermind, I didn't see the implementation differences again. The patch would move the check to a more logical place :p

Changed in dcplusplus:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.