Crash under heavy load, 0.801

Bug #1029629 reported by LoRenZo
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
DC++
Fix Released
High
Unassigned

Bug Description

I can't really give detailed information about this, I didn't do anything special when the crash occured.
Please find the crash log attached. I assume that it has not been fixed in the 2 newer releases.

If you need anything else from my side regarding this, let me know about it.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :
Revision history for this message
poy (poy) wrote :

can you check that your debug information (.pdb file) is up-to-date? files & line numbers in the report are completely off.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

I'm using dcpp_rev_3013_mingw_release_x86.zip that has been published on dcbase.org.
If it hasn't been compiled properly, than I guess this could be the reason for your finding.

Revision history for this message
eMTee (realprogger) wrote :
Changed in dcplusplus:
importance: Undecided → High
Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

Happened with r0351 as well. Log can be found attached.

Revision history for this message
eMTee (realprogger) wrote :

A modified (diff: http://pastie.org/4725853 ) test build of r3053 produces the same crash.

Revision history for this message
poy (poy) wrote :

good, that clears plugins as a possible trigger for these crashes.

the next obvious suspect is user matching; attached is a patch that disables it. note that you will loose current user matching settings, so save them beforehand.

i have updated the boost atomic implementation in rev 3058. give it a try; it might help as it is often referenced in the crash logs.

the next time you encounter a crash, try to follow <https://answers.launchpad.net/dcplusplus/+faq/1865> in order to get traces of all the running threads.

since the crashes all seem to occur on reception of a message on an NMDC hub, it may be useful to try to isolate the originating hubs (if this is specific to a set of hubs at all, which might not be the case). it would also be interesting to know which messages were sent right before the crash.

Revision history for this message
eMTee (realprogger) wrote :

A tentative fix released in 0.801. Please reopen if you still experience this crash.

Changed in dcplusplus:
status: New → Fix Released
eMTee (realprogger)
summary: - Crash with r3013
+ Crash under heavy load, 0.801
Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

The issue has re-occured with r3072.
Going to try to gather additional information about it by using gdb on further events.

eMTee (realprogger)
Changed in dcplusplus:
status: Fix Released → Incomplete
Revision history for this message
poy (poy) wrote :

since the crashes all seem to occur around the same code, and the atoi function is often referenced, let's see whether this is related to the old C API MinGW programs are linked with not being able to handle heavy loads.

try to run the attached threadtest.exe so that it runs for a few minutes, and see if you can get it to crash. the CPU use should be maxed out during the run.
i run it with the following values, feel free to tweak depending on your system: threadtest 50 20000000

Revision history for this message
poy (poy) wrote :

source code for threadtest.exe, to be dropped in the "utils" directory of a DC++ repository.

Revision history for this message
eMTee (realprogger) wrote :

Tried it, I needed to run with higher values not to finish in half a minute. With params 100 80000000 it has been running for about 5 minutes, finished without crash.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

Please find the full backtrace of r3074 under the following link: http://pastebin.com/kGKkzpUG .

Revision history for this message
iceman50 (bdcdevel) wrote :

I ran with the same params as eMTee and no crash

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

Here a more detailed backtrace - this time I started the application using gdb: http://pastebin.com/EB27F6cs
Also, I have executed the thread test with the same parameters that were used by the others and there was no crash for me either.

Revision history for this message
poy (poy) wrote :

LoRenZo,

have you ever seen these crashes with debug builds (those that open a console window while the application is running)?

unlike release builds, debug ones aren't optimized. it is possible some optimizations are causing these crashes; if you can confirm that a debug build doesn't have this issue, the offending optimization level can be tracked down by editing it in the SConstruct file, line 27: possible optimization flags are -O3, -O2, -O1 and -O0 (aka no optimization - same as the debug build).

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

Well, I abandoned the debug builds not long ago, because I was having troubles opening huge files (took them forever to get opened), and during that time I checked quite a lot of them.

I can of course revert back to them for at least a peroid of time, but for any case, I would like to do each and every test I perform the same way (i.e. connected to the same hubs), which is only possible is I have only one instance running.

I think I will definitely have time for testing those build next week, but right now I am running a couple of other tests already. One of them will result in a different type of report, which I am preparing right now, you might want to check it out first, perhaps you will not find it necessary to have these builds running at all afterwards. If you conclude that it is unrelated, I am looking forward to running the special debug builds.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

I am not sure whether the following information needs to be provided here, but I just encountered crashes with all of the following versions: latest release (3144) and its modified builds as well (O2, O1 and O0).
Before the crash occured I was broswing 1 filelist and downloading with 2-7MB/s files the size of 14-96MB as the vast majority. What I can think of that could be related is that my HashData.dat (183MB) and HashIndex.xml (11,5MB) files could not keep up the related data being flushed to the disk. The crash for each build happened within 1-10 minuties starting after the connection(s) were established to the source(s).
For each case I received a Microsoft Visual C++ Runtime Library notification about crash in the first place. After I pressed OK, got the usual, DC++ related crash message. Neither of the 4 crashes had resulted in generating a single crashlog. I, however, created a full backtrace by starting the release build with gbd. The related information can be found here: http://pastebin.com/y080FSqc .

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

I have tried to reproduce the same with the latest available debug build, but the downloads completed O.K . this time, and the client is still up and running ever since.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

The debug build has been crashed as well, this time I did downloaded filelist matching and file list refreshing in parallel.
The crash happened basicly the way it did in case of the other builds (Runtime error and shortly after that the client stopped responding to anything). No crash log has been generated this time either, and the same files were updated last on disk as before.

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

I have encountered 100% CPU usage with r3146, release build. No crash has happened, just made my system completely unusable.

I would like to add that the best workaround I could find for such cases is to struggle a little with the system at its peak, open a task manager and set the proirity of the client's process to low. This is the only way I am able to create backtrace whenever this occurs without terminating the client somehow or rebooting the system. I only mention this because I hope that this little suggestion can help out others who will ever face the same issue as I do every once in a while.

The full backtrace can be found here: http://pastebin.com/NRw4mZgp .

Revision history for this message
LoRenZo (lorenzo-mailbox-deactivatedaccount) wrote :

Another quick update: 100% CPU usage happened once again with r3146 release build, this time only 2 hours after the application was started. No special action was performed by me, the client was uploading silently in the background. I have checked the disk for some useful evidence, but no private message or new filelist were to be found close to the time of the current event. However, this time I saw that crash reporter was trying to create a log, but it never got past "Writing the stack trace..." part.
So, yet again, I have created a full backtrace, which I have matched up with the previous one and found some some slight differences. I would like to point out that at the end of the backtrace, you will be able to find what probably would have gotten in the crashlog if report could have been completed. Please have a look at it, it might get you closer to the real source of the problem. Please find the corresponding log here: http://pastebin.com/RrfGHD8V

As a sidenote: after setting the client's priority to the low and assigning only 1 thread of my CPU, I was trying the find the responsible thread within the client by browsing it with Process Explorer, that could have caused the issue. I have suspended the threads one by one, until none of them were active, but no matter which one I have released afterwards, the CPU activity was the same, so it did not allow me to provide you a rather partial log.

If I can do anything that brings you closer to fixing this issue, kindly let me know about it.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for DC++ because there has been no activity for 60 days.]

Changed in dcplusplus:
status: Incomplete → Expired
Revision history for this message
eMTee (realprogger) wrote :

Tentatively fixed in r3213. Switching to MinGW-w64 for the next release.

Changed in dcplusplus:
status: Expired → Fix Committed
Revision history for this message
eMTee (realprogger) wrote :

Fixed in DC++ 0.811.

Changed in dcplusplus:
status: Fix Committed → Fix Released
alan cee (alancaulfield)
Changed in dcplusplus:
assignee: nobody → alan cee (alancaulfield)
eMTee (realprogger)
Changed in dcplusplus:
assignee: alan cee (alancaulfield) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.