Ubuntu

thunderbird uses 100% CPU and doesn't terminate if quit during connection

Reported by nobody on 2010-12-12
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Mozilla Thunderbird
Fix Released
Critical
thunderbird (Ubuntu)
High
Unassigned

Bug Description

I have the same problem: TB consumes almost 100% of one core all the time.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7
uname -a: Linux golem 2.6.32-26-generic-pae #48-Ubuntu SMP Wed Nov 24 10:31:20 UTC 2010 i686 GNU/Linux
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.1 LTS"

############
Binary package hint: thunderbird

If I quit thunderbird normally while in the 'looking up' or other stage of connecting to a mail server, the process does not terminate, and thunderbird-bin uses 100% CPU on waiting channel 0.

natty
thunderbird 3.1.7+build3+nobinonly-0ubuntu2
uname Linux delan1 2.6.37-8-server #21-Ubuntu SMP Sun Dec 5 18:13:15 UTC 2010 x86_64 GNU/Linux

I haven't found a reliable way to trigger this. Basically, when I shutdown Thunderbird, it goes 100%cpu and doesn't respond anymore, and I have to kill it.

The last time this happened, I managed to strace the running process. Here's what I got:

Process 4156 attached - interrupt to quit
read(3, 0xb625a058, 4096) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 1 ([{fd=18, revents=POLLIN}])
read(18, "\372", 1) = 1
gettimeofday({1273581961, 969398}, NULL) = 0
read(3, 0xb625a058, 4096) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 0 (Timeout)
gettimeofday({1273581961, 972734}, NULL) = 0
read(3, 0xb625a058, 4096) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=17, events=POLLIN}, {fd=44, events=POLLIN}, {fd=47, events=POLLIN}, {fd=10, events=POLLIN}, {fd=18, events=POLLIN}], 11, 0) = 0 (Timeout)

I filled 5mb of logs with the same sequence repeated over and over again, until I got bored and killed Thunderbird.

There's a few reports of similar issues but I'm confused by the insane number of calls to gettimeofday, and I don't think I can reliably consider this as a duplicate of other report issues involving 100% cpu at shutdown, so I'm reporting here in case we're missing something important.

I'm using the official b2 build FWIW.

strace output is not super useful unless you also happened to do an "ls -l /proc/4156/fd" and logged it. Most useful would be stack traces via gdb or 'perf' although I must concede I'm not clear on where the debug symbols for official linux builds end up... one would ideally have that already installed and available so those tools could use them. If there's a way to kill it to get breakpad to report it, that might also work...

protz, adding critical + hang. you can add hang keyword when creating the bug if you do "Show Advanced Fields" on the bug entry form.

I just managed to reproduce the bug! I also managed to kill the process using a SIGSEGV. That triggered breakpad and I submitted a crash report.

    bp-e1da6913-ec18-49cf-a591-b180f2100522 05/22/2010 12:08 PM

Plus, I also did ls -l and a strace -p. The files are here.

http://jonathan.protzenko.free.fr/bug565065

And the crash report is there.

http://crash-stats.mozilla.com/report/index/e1da6913-ec18-49cf-a591-b180f2100522

It might be an extension, but it's hard to tell.

bienvenu, you'll probably want to look at this crash report. It looks like nsMsgCompressIStream has gone off the rails and just spends its entire life being spoon fed the byte 0xfa which presumably causes it/zlib to think there is still more work to do. Unfortunately, it's not immediately obvious (to me) from the crash who keeps posting events to the main thread that keeps it alive, but I would assume it's related.

protz, are you using the fastmail imap server, or an other imap server that supports compress? You can try toggling mail.server.default.use_compress_deflate to see if it makes the crash go away.

Do you have empty trash/expunge inbox on exit set? I wonder if this is some failure to connection problems in the decompress code...

(In reply to comment #5)
> protz, are you using the fastmail imap server, or an other imap server that
> supports compress? You can try toggling
> mail.server.default.use_compress_deflate to see if it makes the crash go away.
>
How can I know if a IMAP server supports deflate? One is imap.googlemail.com, the other two are:

jonathan@nala:~ $ nc imap.ens-lyon.fr 143
* OK bubulcus Cyrus IMAP4 v2.2.13-Debian-2.2.13-14+lenny3 server ready
^C
jonathan@nala:~ $ nc imap.free.fr 143
* OK [CAPABILITY IMAP4REV1 X-NETSCAPE LOGIN-REFERRALS AUTH=LOGIN] IMAP4rev1 Free
^C

> Do you have empty trash/expunge inbox on exit set? I wonder if this is some
> failure to connection problems in the decompress code...

None of my accouts has this setting.

It's kind of hard to reproduce. I'll try a tcpdump next to see which connections are still open when it loops like that. I haven't any reliable way to reproduce this.

You'd need an imap protocol log to see what the capability response was for each server - https://wiki.mozilla.org/MailNews:Logging

Protz can we get somewhere with this bug ?

Not really. For a while I exported the right environment variables when starting Thunderbird, so that I would get a protocol dump when it got stuck, but I haven't been able to reproduce it in such conditions. I don't have a reliable way to reproduce, and as I said, it's very random.

David, do you need the protocol log right when it crashes, or just the first few lines?

I don't need a protocol log of the getting stuck part, just the initial part of the log where we ask the server what its capabilities are. But I believe gmail has added compress/deflate support.

Created attachment 466412
IMAP protocol log, first few seconds

Here's the log. It's not the same computer that experiences the bug, but it's the same setup for accounts.

so yeah, that confirms that your gmail server does support COMPRESS, so that would be consistent with asuth's diagnosis. But that's only marginally helpful. I suspect this is a linux-y issue, since we have a ton of gmail users, and have not heard other reports of this.

What can I do to help track down this issue?

You could try toggling
mail.server.default.use_compress_deflate to see if it makes the hang go away, though if it's random and hard to reproduce, that may not be that helpful. How random is it, in the sense of how frequent is the hang? It's unfortunate that turning on protocol logging seems to make this problem go away because my first suggestion would be to try to turn on other sorts of logging, but that may not be helpful.

It used to be ~twice a week, now I don't know, I'm using momo's mac as my primary machine :).

You can tell me which parts of logging to enable, and I'll make sure whenever I run Thunderbird on the faulty machine, these are enabled, just in case.

IMAP:5 is where I'd start, since it tells us what's happening with server communications at shutdown, and the other forms of logging (e.g., socket transport) are too fire-hosey.

Created attachment 466644
Hang log

I managed to reproduce the bug. Seems to happen after Tb has been open for a long time (overnight). Tb feels sluggish and unresponsive, feel like restarting it, and then the bug appears.

Tb went 100% cpu, waited ~10 seconds, and then killed it. Here's the protocol log.

Nikolay, Jay, does anything in the imap log pop out? :)

protz, can you reasonably rule out extensions? Which ones do you run?

That error is NS_ERROR_ABORT, which is the status we give when trying to close the transport because the app is getting shut down. In this case, I guess a stack trace of the hang might be helpful.

I feel like this is happening when some IMAP connection is open and my network connection is shut down (i.e. I loose the wifi connection, for instance).

I already put a crashreport in comment #3. What do you mean by stack trace? Do you want me to attach gdb to the running process and bt once it's killed? Do I need to run a debug build for this or is a shared build ok?

(In reply to comment #20)

> I already put a crashreport in comment #3.
Yeah, that didn't contain much useful that I could see.

> Do you want me to attach gdb to the running process and bt once it's killed?
Yes, I want to see what the threads are doing that's preventing the app from shutting down. So attach and break into the debugger.

> need to run a debug build for this or is a shared build ok?
If you can get symbols, a shared build is fine.

Download full text (12.0 KiB)

Yay! Managed to get a backtrace.

Please note that the symptoms are linked to my connection being particularly crappy (mobile phone tethering) and sockets timing out. I event got a "too many imap connections for this account" error message.

After I understood that Thunderbird was not closing properly, I attached gdb to it, killed Thunderbird, and ran a backtrace. Here comes the full backtrace.

0x00007f7ea666f0bd in read () from /lib/libpthread.so.0
(gdb) bt
#0 0x00007f7ea666f0bd in read () from /lib/libpthread.so.0
#1 0x00007f7ea538cbba in nsAppShell::EventProcessorCallback (
    source=<value optimized out>, condition=<value optimized out>,
    data=0x7f7e9b3d1fd0)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/gtk2/nsAppShell.cpp:66
#2 0x00007f7ea10b86f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
#3 0x00007f7ea10bc568 in ?? () from /lib/libglib-2.0.so.0
#4 0x00007f7ea10bc71c in g_main_context_iteration ()
   from /lib/libglib-2.0.so.0
#5 0x00007f7ea53a23c7 in nsBaseAppShell::DoProcessNextNativeEvent (this=0xf,
    mayWait=-1871547345)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:161
#6 0x00007f7ea53a25b1 in nsBaseAppShell::OnProcessNextEvent (
    this=0x7f7e9b3d1fd0, thr=0x7f7e9b3364c0, mayWait=1,
    recursionDepth=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:299
#7 0x00007f7ea569fd11 in nsThread::ProcessNextEvent (this=0x7f7e9b3364c0,
    mayWait=1, result=0x7fff9072766c)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/threads/nsThread.cpp:517
#8 0x00007f7ea566fcd1 in NS_ProcessNextEvent_P (thread=0xf,
---Type <return> to continue, or q <return> to quit---
    mayWait=-1871547345) at nsThreadUtils.cpp:250
#9 0x00007f7ea4b5d8e6 in nsSyncStreamListener::WaitForData (
    this=0x7f7e80e7d250)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:58
#10 0x00007f7ea4b5d93a in nsSyncStreamListener::Available (
    this=0x7f7e80e7d250, result=0x7fff907276cc)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:160
#11 0x00007f7ea4b5d7c4 in nsSyncStreamListener::Read (this=0xf,
    buf=0x7f7e80e7f000 "", bufLen=4096, result=<value optimized out>)
    at /home/jonathan/Code/comm-central/mozilla/netwerk/base/src/nsSyncStreamListener.cpp:178
#12 0x00007f7ea5443293 in unsigned int NS_ReadLine<char, nsIInputStream, nsCAutoString>(nsIInputStream*, nsLineBuffer<char>*, nsCAutoString&, int*) ()
   from /home/jonathan/Code/objdir-comm-central/mozilla/dist/bin/libxul.so
#13 0x00007f7ea543783f in nsMsgDBFolder::GetMsgTextFromStream (
    this=<value optimized out>, stream=<value optimized out>,
    aCharset=<value optimized out>, bytesToRead=<value optimized out>,
    aMaxOutputLen=<value optimized out>,
    aCompressQuotes=<value optimized out>, aStripHTMLTags=1, aContentType=...,
    aMsgText=...)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgDBFolder.cpp:533---Type <return> to continue, or q <return> to quit---
5
#14 0x00007f7ea56abc4a in NS_InvokeByIndex_P (that=0xf,
    methodIndex=2423419951, paramCo...

So that just means we're trying to get some message text for the new mail alert (most likely). It doesn't look like a hang, per se. Does linux display message text for the new mail alert?

I've had issues with the popup alert, yes, the most notable one being the popup flickering for a fraction of a second and not showing properly, although because I wasn't able to reliably trigger it, I never filed a bug.

I'll try to get more backtraces to see if we're always stuck in the same area.

(In reply to comment #23)
> It doesn't look like a hang, per se. Does linux display message text for the new mail alert?

what is this, if it isn't a hang?
if it's a linux new message pop up perf issue, see Bug 367431 - New Mail alert popup takes 100% CPU for 10 secs

(In reply to comment #25)

>
> what is this, if it isn't a hang?
I'm just saying that stack trace doesn't look like it's hung.

In the past, we've had bugs where the code on linux and windows that's supposed to shut down the app when the last window is closed gets the count wrong, and leaves the app spinning the event loop, but no window up. And the new mail alert window might confuse that count.

Happened again. The computer is an 8-core i7 Xeon, and I waited for 30s+ before attaching gdb, and the CPU kept going through the roof at 100% all the time, so I think it's definitely a hang. One more backtrace (not very helpful, I'm afraid).

(gdb) bt
#0 0x00007f674b4b2113 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x00007f674c6dd2c9 in ?? () from /lib/libglib-2.0.so.0
#2 0x00007f674c6dd71c in g_main_context_iteration () from /lib/libglib-2.0.so.0
#3 0x00007f673b5dd337 in nsBaseAppShell::DoProcessNextNativeEvent (this=0x7f672e7aae10, mayWait=10)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:155
#4 0x00007f673b5dd5cd in nsBaseAppShell::OnProcessNextEvent (this=0x7f67337457c0, thr=0x7f6747b28550,
    mayWait=0, recursionDepth=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:311
#5 0x00007f674f807e05 in nsThread::ProcessNextEvent (this=0x7f6747b28550, mayWait=1, result=0x7fff00dc7d2c)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:508
#6 0x00007f674f7d439a in NS_ProcessNextEvent_P (thread=0x7f672e7aae10, mayWait=10) at nsThreadUtils.cpp:250
#7 0x00007f674f80815f in nsThread::Shutdown (this=0x7f671b0c4ee0)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:468
#8 0x00007f674f809058 in nsThreadManager::Shutdown (this=0x7f674fa6e880)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThreadManager.cpp:158
#9 0x00007f674f7d7534 in mozilla::ShutdownXPCOM (servMgr=0x7f6747ba9148)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/build/nsXPComInit.cpp:873
#10 0x00007f674fc94dc5 in ~ScopedXPCOMStartup (this=0x7fff00dc8440, __in_chrg=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:1052
#11 0x00007f674fc984c2 in XRE_main (argc=<value optimized out>, argv=<value optimized out>,
    aAppData=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:3527
#12 0x0000000000401b04 in main (argc=1, argv=0x7fff00dc8728)
    at /home/jonathan/Sources/comm-1.9.2/mail/app/nsMailApp.cpp:101

Is there anything useful you want me to do next time?

0x00007fda3be9f113 in *__GI___poll (fds=<value optimized out>,
    nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
87 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
 in ../sysdeps/unix/sysv/linux/poll.c
(gdb) bt
#0 0x00007fda3be9f113 in *__GI___poll (fds=<value optimized out>,
    nfds=<value optimized out>, timeout=0)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x00007fda3d0ca2c9 in ?? () from /lib/libglib-2.0.so.0
#2 0x00007fda3d0ca71c in g_main_context_iteration ()
   from /lib/libglib-2.0.so.0
#3 0x00007fda2f61b337 in nsBaseAppShell::DoProcessNextNativeEvent (
    this=0x7fda1c39d350, mayWait=10)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:155
#4 0x00007fda2f61b5cd in nsBaseAppShell::OnProcessNextEvent (
    this=0x7fda32b8a400, thr=0x7fda38528550, mayWait=0,
    recursionDepth=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:311
#5 0x00007fda401f4e05 in nsThread::ProcessNextEvent (this=0x7fda38528550,
    mayWait=1, result=0x7ffffc74e1cc)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:508
#6 0x00007fda401c139a in NS_ProcessNextEvent_P (thread=0x7fda1c39d350,
    mayWait=10) at nsThreadUtils.cpp:250
#7 0x00007fda401f515f in nsThread::Shutdown (this=0x7fda27ac3fb0)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThread.cpp:468
#8 0x00007fda401f6058 in nsThreadManager::Shutdown (this=0x7fda4045b880)
---Type <return> to continue, or q <return> to quit---
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/threads/nsThreadManager.cpp:158
#9 0x00007fda401c4534 in mozilla::ShutdownXPCOM (servMgr=0x7fda385a9148)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/xpcom/build/nsXPComInit.cpp:873
#10 0x00007fda40681dc5 in ~ScopedXPCOMStartup (this=0x7ffffc74e8e0,
    __in_chrg=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:1052
#11 0x00007fda406854c2 in XRE_main (argc=<value optimized out>,
    argv=<value optimized out>, aAppData=<value optimized out>)
    at /home/jonathan/Sources/comm-1.9.2/mozilla/toolkit/xre/nsAppRunner.cpp:3527
#12 0x0000000000401b04 in main (argc=1, argv=0x7ffffc74ebc8)
    at /home/jonathan/Sources/comm-1.9.2/mail/app/nsMailApp.cpp:101

Seems to be correlated with a flaky connection.

not sure if this is what you saw as well, Ben.

Yes, I saw exactly the same as in initial description, including the very same "\372".

Ledbol (ldosso) on 2010-12-13
description: updated

many major/hang bugs cite gettimeofday - Bug 533464, Bug 361730, Bug 508263. needs a deeper look. dup of bug 508263?

Download full text (3.6 KiB)

Created attachment 529281
GDB log

David, I've had another hang at shutdown, but not 100% cpu this time. I did a complete GDB session, and inspected every one of the 15 threads running. One is indeed hanging in the IMAP code. The full backtrace is near the end of the file. Relevant parts:

#5 nsPipeInputStream::Wait (this=0x7fe274372910)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/io/nsPipe3.cpp:652
No locals.
#6 0x00007fe2a1cebfd5 in nsPipeInputStream::ReadSegments (
    this=0x7fe274372910,
    writer=0x7fe2a1cec8fb <NS_CopySegmentToBuffer(nsIInputStream*, void*, char const*, P
    readCount=0x7fe26d9fc80c)
    at /home/jonathan/Code/comm-central/mozilla/xpcom/io/nsPipe3.cpp:777
        writeCount = 1924833280
        originalLen = <value optimized out>
        rv = <value optimized out>
        segment = 0x7fe26d9fc858 "\n"
        segmentLen = 32738
#7 0x00007fe2a20616b8 in nsMsgCompressIStream::Read (this=0x7fe275d70950,
    aBuf=0x7fe272baa000 "+ idling\r\n", aCount=7008, aResult=0x7fe26d9fc868)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgCompressIStream.cpp:175
        bytesRead = 0
        rv = <value optimized out>
        rv = <value optimized out>
#8 0x00007fe2a2044619 in nsMsgLineStreamBuffer::ReadNextLine (
    this=0x7fe270a5d310, aInputStream=0x7fe275d70950,
    aNumBytesInLine=@0x7fe26d9fc928, aPauseForMoreData=@0x7fe26d9fc92c,
    prv=0x7fe26d9fc924, addLineTerminator=0)
    at /home/jonathan/Code/comm-central/mailnews/base/util/nsMsgLineBuffer.cpp:403
        startOfNewData = <value optimized out>
        i = <value optimized out>
        rv = <value optimized out>
        numBytesInStream = 7008
        numFreeBytesInBuffer = <value optimized out>
        numBytesCopied = 0
        nonBlockingStream = 0
        endOfLine = <value optimized out>
        startOfLine = 0x7fe272baa000 "+ idling\r\n"
#9 0x00007fe2a1fb81b8 in nsImapProtocol::CreateNewLineFromSocket (
    this=0x7fe2709c6000)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapProtocol.cpp:4718
        numBytesInLine = 0
        rv = 0
        kungFuGrip = {<nsCOMPtr_base> = {
            mRawPtr = 0x7fe275d70950}, <No data fields>}
        needMoreData = 0
        newLine = <value optimized out>
#10 0x00007fe2a1fc7b6e in nsImapServerResponseParser::GetNextLineForParser (
    this=0x7fe2709c6290, nextLine=0x7fe2709c62a0)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp
        rv = 1
#11 0x00007fe2a2085569 in nsIMAPGenericParser::AdvanceToNextLine (
    this=0x7fe2709c6290)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsIMAPGenericParser.cpp:181
        ok = <value optimized out>
#12 0x00007fe2a20855ff in nsIMAPGenericParser::AdvanceToNextToken (
    this=0x7fe2709c6290)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsIMAPGenericParser.cpp:152
No locals.
#13 0x00007fe2a1fc93d2 in nsImapServerResponseParser::ParseIMAPServerResponse (
    this=0x7fe2709c6290, aCurrentCommand=<value optimized out>,
    aIgnoreBadAndNOResponses=0, aGreetingWithCapability=0x0)
    at /home/jonathan/Code/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp
        pl...

Read more...

this stack trace just means that we have an imap thread waiting for data - it doesn't mean anything other than that. Other threads could be blocking the shutdown process - I've seen a strange behavior with the UI thread and PSM where the UI thread is blocked on shutdown trying to clean up a PSM thread.

Created attachment 529838
possible fix for one case of hang on shutdown

I just had a hang on shutdown like this happen to me today. Poking around, I discovered that the url was not in the imap log, which meant we had run the url after the shutdown process had turned off logging, and I discovered that mailnews had been shut down. Since we ran the url anyway, I believe this meant that the account manager had been destroyed and then recreated, with m_shuttingDown set to false. I remembered from previous experience that sometimes we reload the accounts at shutdown, which means that every once in a while, some piece of code tries to do stuff after shutdown.

So this fix does a couple things:

1. Make the variables that track whether we're shutting down truly global
2. Move the check for shutting down lower down in the IMAP call stack (but still on the UI thread) so that various edge cases don't avoid the check. One edge case is queued urls, though I know that's not what happened to me or others, because queuing urls shows up in the protocol log.

try server builds here - http://ftp.mozilla<email address hidden> (if they succeed)

fixed on trunk - http://hg.mozilla.org/comm-central/rev/a217f63f0cee

I'm going to mark this fixed - the symptom is (usually) one imap thread blocked waiting for input, no other threads blocked, and a hang on shutdown. Protz, I'm not sure if this is what you are seeing, but I would be very interested to hear if you see it with builds going forward. As I mentioned before, the other symptom is that the imap url that's blocked doesn't show up in an imap protocol log because it was started after logging is torn down.

Sure. I haven't seen this happen frequently lately, but I'll make sure I try to get a relevant backtrace in case this happens again. Ehsan, you said your Thunderbird reliably had this issue, do you think you could try this out?

(In reply to comment #37)
> Sure. I haven't seen this happen frequently lately, but I'll make sure I try to
> get a relevant backtrace in case this happens again. Ehsan, you said your
> Thunderbird reliably had this issue, do you think you could try this out?

Sure. Do I need to download a trunk build for this? (From where?! :D)

Will that build blow away my profile? Specifically, should I make a backup before running it?

I mean, I can probably run the trunk build regularly if I can get some of my extensions working on it.

Ehsan, tomorrow's miramar nightly builds will have this fix in it - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-miramar/

You can disable extension compatibility checking and most extensions will run. Binary extensions like Lightning or Enigmail aren't so lucky.

However, I thought your big issue was a hang after wakeup from sleep, not a hang on shutdown. This patch has no effect on the former, but there's work going on in Bug 508263 that may help with that. (Sorry, all these bugs are dirty, in that they conflate a bunch of issues)

Same here:

3.1.10+build1+nobinonly-0ubuntu0.11.04.1

Linux 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:50 UTC 2011 i686 i686 i386 GNU/Linux

Micah Gersten (micahg) wrote :

Thank you for reporting this. This bug has actually been fixed upstream in Thunderbird 3.3. Please report any other issues you may find.

Changed in thunderbird (Ubuntu):
importance: Undecided → High
status: New → Triaged
Changed in thunderbird:
importance: Unknown → Critical
status: Unknown → Fix Released
mejo (jonas-freesources) wrote :

This bug may have been fixed in thunderbird 3.3, but this version is not available in Natty yet. And I'm still discovering this bug. Thus I suggest to change the status back to 'Confirmed'.

Was this released in 5.0? Because I'm still seeing a hang on quit... now with Lion.

(In reply to comment #42)
> Was this released in 5.0? Because I'm still seeing a hang on quit... now
> with Lion.

Yes it was, but please file a new bug, especially as you've changed OS. It could be that you're seeing something different.

the hints that the patch is in v5 are comment 40 which cites Miramar, and Target Milestone=Thunderbird 3.3a4 - these are the precursor to v5.

If you still see a hang, please consult https://wiki.mozilla.org/Thunderbird:Testing:Shutdown_Hang

mejo (jonas-freesources) wrote :

please, can you push either thunderbird 5, or a version with backported fix for this issue to Natty? This bug is annoying. I often don't realize the thunderbird crash immediately, and wonder why cpu heat and cpu fan both run crazy, and the battery lifetime decreases that fast.
Sometimes I even left the laptop in that state for some time while doing something else, and was shocked that it was hot like burning when I came back. This is the main reason why I consider the bug as critical.

Chris Coulson (chrisccoulson) wrote :

If you want a more up-to-date version of Thunderbird on Ubuntu, I highly recommend using the thunderbird-stable PPA:

https://launchpad.net/~mozillateam/+archive/thunderbird-stable

Michael Nagel (nailor) wrote :

Is this still an issue with thunderbird 12 installed by default now?

Changed in thunderbird (Ubuntu):
status: Triaged → Incomplete
mejo (jonas-freesources) wrote :

at least for me, the bug disappeared with the upgrade to ubuntu 12.04 and thunderbird 12.

Changed in thunderbird (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.