thunderbird-bin crashed with SIGSEGV in nsQueryInterface::operator()

Bug #556829 reported by WolphFang on 2010-04-06
90
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Mozilla Thunderbird
Fix Released
Critical
thunderbird (Fedora)
Invalid
Medium
thunderbird (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: thunderbird

Ubuntu 10.4 / Clicked on Sent expand triangle against Google IMAP

ProblemType: Crash
DistroRelease: Ubuntu 10.04
Package: thunderbird 3.0.4+nobinonly-0ubuntu1
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
NonfreeKernelModules: openafs nvidia
Architecture: amd64
Date: Tue Apr 6 16:27:17 2010
ExecutablePath: /usr/lib/thunderbird-3.0.4/thunderbird-bin
ProcCmdline: /usr/lib/thunderbird-3.0.4/thunderbird-bin
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Signal: 11
SourcePackage: thunderbird
StacktraceTop:
 nsQueryInterface::operator()(nsID const&, void**) const () from /usr/lib/thunderbird-3.0.4/libxpcom_core.so
 nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) () from /usr/lib/thunderbird-3.0.4/libxpcom_core.so
 ?? ()
 ?? ()
 ?? ()
Title: thunderbird-bin crashed with SIGSEGV in nsQueryInterface::operator()
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev sambashare video

(In reply to comment #1)
> related to any of these?
> *
I don't think so. I can't find any crash in nsImapIncomingServer::OnStopRunningUrl on crashstats nor something similar.

This is happening to me multiple times a day: https://bugzilla.redhat.com/show_bug.cgi?id=570041

Is this ticket getting the visibility required? Having thunderbird constantly segfault on me is surprising since it's always been fairly solid in the past.

Created an attachment (id=431555)
proposed fix

I don't know why this situation should happen, or how to test for it, but this should fix it.

Could this happen when getting new messages for non inbox folders while there are still folders waiting to update statue?

FWIW, I took a spin through crashes with OnStopRunningUrl in the stack and did not find any matches with it in the top couple frames.
* nsXPTCStubBase::Release() bp-2bab6ac8-4f77-4780-beeb-923892100308
* nsImapProtocol::SetupWithUrl(nsIURI*, nsISupports*) bp-fc01edc8-dbf5-42c8-846a-6faa92100306

(In reply to comment #5)
> Could this happen when getting new messages for non inbox folders while there
> are still folders waiting to update statue?

Well, we're running a STATUS url when this happens, and the incoming server is the listener for the url. AFAIK, that combination only happens when doing a STATUS to check non-inbox folders for new mail. We run other STATUS urls, but the imap incoming server is not the listener for those urls.

(From update of attachment 431555)
So, as discussed on IRC, this just wallpapers over the crash. The real cause is that each call to GetNewMessagesForNonInboxFolders triggers a chain of folder updates, but there's no protection against multiple chains simultaneously using the same array.

This crash is rather frequent for significant amount of Linux users (see https://bugzilla.redhat.com/show_bug.cgi?id=570655 for example). Could this be a blocker for next release (3.0.5)?

It's not showing up high in our crashstats for some reason. I am planning on fixing it, though I have to convince Neil of the rightness of the fix...the first thing is that I need to remove the assumption that the current folder is also the last folder in the stat array, which can be wrong when you get multiple calls to GetNewMessagesForNonInboxFolders.

I'm looking at related crash sigs/bugs and may have more info later. But I want add that Matej in https://bugzilla.redhat.com/show_bug.cgi?id=570655#c3 mentions their stack is the same as bug 411147. I haven't verified that. Not also, I wouldn't bank on bug 411147 being windows-only because my comment "windows only per crash-stats" simply means I didn't see any linux crashes.

I've just dug deep into crash-stats. I've found only one instance of this reported in the last two weeks:

bp-ab9ff0c2-3e7f-429d-a482-ad2a32100316

Therefore not blocking on this as that isn't anywhere near a top-crasher. I would say its wanted though.

jhorak: Are you sure this isn't something more specific to the Fedora build? An additional patch (I don't know what you guys do) or something?.

Also I notice from the first comment in that bug, it looks like your users are running a 64 bit build there and that's something we don't support/test/run at all at the moment (we don't even have nightlies). So you may want to take a look around and see if there's a 64-bit bug hiding somewhere.

I believe this is a case of linux users being much more likely to be checking all folders for new messages, because of server side filters, and being likely to have multiple accounts. I've seen this crash myself, and I'm only checking a few folders for new messages.

That's where it would be nice to receive crash stats from linux builds by distros.

(In reply to comment #14)
> That's where it would be nice to receive crash stats from linux builds by
> distros.

exactly.

My plan for fixing this is not to leave the currently statted folder in m_foldersToStat. This will make things a lot simpler as far as dealing with multiple actors on the array is concerned.

Created an attachment (id=435225)
less hacky but more complicated fix

This change makes it so the current STATUS operation doesn't have its folder in m_foldersToStat, which means we have to check if there are any folders left before doing the next stat. It also means we won't be potentially removing the wrong folder when the STATUS operation finishes. And I put some code to prevent us from putting the same folder in the array multiple times, which should ameliorate the case of the user hitting get new mail twice.

(From update of attachment 435225)
Although this obviously avoids removing the wrong folder, it doesn't actually stop multiple attempts to check for status, and in fact would make it harder to retrofit such a check.

>+ // if we get an error running the url, it's better
>+ // not to chain the next url.
>+ if (NS_FAILED(exitCode))
>+ break;
What about the remaining folders?

>+ nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
>+ m_foldersToStat.RemoveObjectAt(folderCount - 1);
>+ folder->UpdateStatus(this, nsnull);
Is UpdateStatus not always async?

>- if (imapFolder && !isServer)
>+ if (imapFolder && !isServer &&
>+ m_foldersToStat.IndexOfObject(imapFolder) == -1)
Nit: IndexOfObject is really slow because it QIs the argument and each member to nsISupports first. If you don't need that just use IndexOf instead.

(In reply to comment #17)
> (From update of attachment 435225 [details])
> Although this obviously avoids removing the wrong folder, it doesn't actually
> stop multiple attempts to check for status, and in fact would make it harder to
> retrofit such a check.

It does, by preventing duplicates from getting added to the array of folders to stat.

>
> >+ // if we get an error running the url, it's better
> >+ // not to chain the next url.
> >+ if (NS_FAILED(exitCode))
> >+ break;
> What about the remaining folders?

An error at this point is generally something that's not recoverable, e.g., the server is down, or we're shutting down. The remaining folders will still be in the array, so if we do STAT again, we'll get them on the next biff round.

>
> >+ nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> >+ m_foldersToStat.RemoveObjectAt(folderCount - 1);
> >+ folder->UpdateStatus(this, nsnull);
> Is UpdateStatus not always async?

It is. I've changed m_foldersToStat to mean the folders still remaining to stat. Once we've started statting a folder, it is removed from the array.

>
> >- if (imapFolder && !isServer)
> >+ if (imapFolder && !isServer &&
> >+ m_foldersToStat.IndexOfObject(imapFolder) == -1)
> Nit: IndexOfObject is really slow because it QIs the argument and each member
> to nsISupports first. If you don't need that just use IndexOf instead.

OK, that's good to fix.

(In reply to comment #18)

> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
>
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
>
I suppose it makes it so the currently stat'd folder could again be requested to be stat'd. That doesn't bother me. You might actually get a different answer the next time.

(In reply to comment #18)
> (In reply to comment #17)
> > (From update of attachment 435225)
> > Although this obviously avoids removing the wrong folder, it doesn't actually
> > stop multiple attempts to check for status, and in fact would make it harder to
> > retrofit such a check.
> It does, by preventing duplicates from getting added to the array of folders to
> stat.
Sorry for being unclear, I was referring to multiple checks on the same server, rather than the same folder. (If you clicked Get Messages too many times, could you max out the cached connections?)

> > >+ // if we get an error running the url, it's better
> > >+ // not to chain the next url.
> > >+ if (NS_FAILED(exitCode))
> > >+ break;
> > What about the remaining folders?
> An error at this point is generally something that's not recoverable, e.g., the
> server is down, or we're shutting down. The remaining folders will still be in
> the array, so if we do STAT again, we'll get them on the next biff round.
So the idea is that because you won't add them again, you don't need to remove them?

> > >+ nsCOMPtr<nsIMsgImapMailFolder> folder = m_foldersToStat[folderCount - 1];
> > >+ m_foldersToStat.RemoveObjectAt(folderCount - 1);
> > >+ folder->UpdateStatus(this, nsnull);
> > Is UpdateStatus not always async?
> It is. I've changed m_foldersToStat to mean the folders still remaining to
> stat. Once we've started statting a folder, it is removed from the array.
I was just wondering whether you could write
m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
or is that being too sneaky?

(In reply to comment #19)
> (In reply to comment #18)
> > > Although this obviously avoids removing the wrong folder, it doesn't actually
> > > stop multiple attempts to check for status, and in fact would make it harder to
> > > retrofit such a check.
> > It does, by preventing duplicates from getting added to the array of folders to
> > stat.
> I suppose it makes it so the currently stat'd folder could again be requested
> to be stat'd. That doesn't bother me.
On the other hand you could perhaps work around that with a hybrid approach whereby you get the folder from the url (rather than the existing assumption that it's the last folder) and removing that from the array (helpfully RemoveObject returns PR_TRUE only if the object was in fact in the array.)

(In reply to comment #20)

> Sorry for being unclear, I was referring to multiple checks on the same server,
> rather than the same folder. (If you clicked Get Messages too many times, could
> you max out the cached connections?)

Maybe, but that would just mean that the requests would get queued. It wouldn't break anything. Ideally, we would prevent duplicate attempts, but the cure might be worse than the disease, if we get it wrong and permanently think we're doing a stat when we're not.

> So the idea is that because you won't add them again, you don't need to remove
> them?

Well, the idea was more that the next time around we should pick up where we left off. But since it's going to be the same set of folders the next time around, you're right that I might as well clear them.

> I was just wondering whether you could write
> m_foldersToStat[folderCount - 1]->UpdateStatus(this, nsnull);
> or is that being too sneaky?

Too sneaky, and that would cause issues if UpdateStatus failed synchronously, which is a possibility we should allow for (I guess that's what you were asking).

(In reply to comment #21)

> On the other hand you could perhaps work around that with a hybrid approach
> whereby you get the folder from the url (rather than the existing assumption
> that it's the last folder) and removing that from the array (helpfully
> RemoveObject returns PR_TRUE only if the object was in fact in the array.)

I started off doing just that, and decided not to, for a very good reason which I'll try to remember :-)

(In reply to comment #23)
> (In reply to comment #21)
> > On the other hand you could perhaps work around that with a hybrid approach
> > whereby you get the folder from the url (rather than the existing assumption
> > that it's the last folder) and removing that from the array (helpfully
> > RemoveObject returns PR_TRUE only if the object was in fact in the array.)
> I started off doing just that, and decided not to, for a very good reason which
> I'll try to remember :-)
Please do, that's the only remaining issue blocking my review!

Created an attachment (id=436199)
fix addressing comments

The idea of not having to find the currently STATUS'd folder and remove it when done was the main appeal, iirc. But it occurred to me that if I start with the first folder in the array instead of the last, finding the current folder should in general be quick, and the code could be simplified a bit...

(From update of attachment 436199)
>+ if (NS_FAILED(exitCode))
>+ {
>+ m_foldersToStat.Clear();
>+ break;
>+ }
>+ if (m_foldersToStat.Count() > 0)
>+ m_foldersToStat[0]->UpdateStatus(this, nsnull);
Nit: could use else or could just leave out the break
(the compiler should notice that the count will be zero anyway)

Created an attachment (id=436533)
patch landed

I just moved the break to the end of the case, because a case w/o a break is a bug waiting to happen.

fixed on trunk.

WolphFang (mjoyner-vbservices) wrote :

Binary package hint: thunderbird

Ubuntu 10.4 / Clicked on Sent expand triangle against Google IMAP

ProblemType: Crash
DistroRelease: Ubuntu 10.04
Package: thunderbird 3.0.4+nobinonly-0ubuntu1
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
NonfreeKernelModules: openafs nvidia
Architecture: amd64
Date: Tue Apr 6 16:27:17 2010
ExecutablePath: /usr/lib/thunderbird-3.0.4/thunderbird-bin
ProcCmdline: /usr/lib/thunderbird-3.0.4/thunderbird-bin
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Signal: 11
SourcePackage: thunderbird
StacktraceTop:
 nsQueryInterface::operator()(nsID const&, void**) const () from /usr/lib/thunderbird-3.0.4/libxpcom_core.so
 nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) () from /usr/lib/thunderbird-3.0.4/libxpcom_core.so
 ?? ()
 ?? ()
 ?? ()
Title: thunderbird-bin crashed with SIGSEGV in nsQueryInterface::operator()
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev sambashare video

WolphFang (mjoyner-vbservices) wrote :
visibility: private → public

StacktraceTop:
 nsQueryInterface::operator() (this=<value optimized out>,
 nsCOMPtr_base::assign_from_qi (this=0x7fffae2e7eb0,
 nsImapIncomingServer::OnStopRunningUrl (
 nsMsgMailNewsUrl::SetUrlState (this=0x7ff2e2ea0c08,
 nsImapMailFolder::SetUrlState (this=0x7ff2f85c5000,

Changed in thunderbird (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace
Max Bowsher (maxb) wrote :

Confirming. This happens to me several times a day.

Here is my gdb bt:
#0 nsQueryInterface::operator() (this=<value optimised out>, aIID=..., answer=0x7fffffffd208) at nsCOMPtr.cpp:47
#1 0x00007ffff7480c72 in nsCOMPtr_base::assign_from_qi (this=0x7fffffffd250, qi=..., iid=...) at nsCOMPtr.cpp:96
#2 0x0000000000c8f4ed in nsCOMPtr (this=0x7fffe7e9e740, url=<value optimised out>, exitCode=<value optimised out>) at ../../../mozilla/dist/include/xpcom/nsCOMPtr.h:572
#3 nsImapIncomingServer::OnStopRunningUrl (this=0x7fffe7e9e740, url=<value optimised out>, exitCode=<value optimised out>) at nsImapIncomingServer.cpp:2353
#4 0x0000000000d596c8 in nsMsgMailNewsUrl::SetUrlState (this=0x7fffd806cc08, aRunningUrl=<value optimised out>, aExitCode=0) at nsMsgMailNewsUrl.cpp:135
#5 0x0000000000c978c3 in nsImapMailFolder::SetUrlState (this=0x7fffe4c43800, aProtocol=0x7fffd81ce000, aUrl=0x7fffd806cc08, isRunning=0, statusCode=0)
    at nsImapMailFolder.cpp:6609
#6 0x00007ffff74baa05 in NS_InvokeByIndex_P (that=0x8000007f, methodIndex=17506960, paramCount=4294955528, params=0x1) at xptcinvoke_x86_64_linux.cpp:208
#7 0x00007ffff74b3a17 in nsProxyObjectCallInfo::Run (this=0x7fffd3703200) at nsProxyEvent.cpp:181
#8 0x00007ffff74afc4b in nsThread::ProcessNextEvent (this=0x7fffed23d9d0, mayWait=1, result=0x7fffffffd4fc) at nsThread.cpp:521
#9 0x00007ffff7484f3c in NS_ProcessNextEvent_P (thread=0x8000007f, mayWait=17506960) at nsThreadUtils.cpp:247
#10 0x000000000060e133 in nsBaseAppShell::Run (this=0x7fffe7a5d3a0) at nsBaseAppShell.cpp:170
#11 0x0000000000b38640 in nsAppStartup::Run (this=0x7fffe7a98640) at nsAppStartup.cpp:193
#12 0x0000000000449c72 in XRE_main (argc=<value optimised out>, argv=<value optimised out>, aAppData=<value optimised out>) at nsAppRunner.cpp:3321
#13 0x0000000000445690 in main (argc=1, argv=0x7fffffffdcf8) at nsMailApp.cpp:103

Changed in thunderbird (Ubuntu):
status: New → Confirmed
Max Bowsher (maxb) wrote :

I would like to propose that the patch from the linked mozilla bug be applied to the Ubuntu package. I have been running with a PPA build of this for several days now, and it makes the difference between Thunderbird being useable or not, by turning tens of crashes per day into none at all.

Changed in thunderbird:
status: Unknown → Fix Released

Can we have this fix in 3.0 line? It affects many of Fedora users.

(In reply to comment #29)
> Can we have this fix in 3.0 line? It affects many of Fedora users.

Probably - it is already flagged for approval, but as we haven't yet got a firm date for 3.0.5 I've not looked at the approvals yet.

FWIW, it also affects seamonkey 2.0.x

(In reply to comment #31)
> FWIW, it also affects seamonkey 2.0.x

That's why it is in Mailnews Core and not Thunderbird product...

fixed for 3.0.5

John Baron (jbaron42) wrote :

What is the status of this bug? It basically makes Thunderbird unusable on Ubuntu 10.04.

Max Bowsher (maxb) wrote :

I tried to suggest that it needed fixing before Lucid release, but was basically told to wait for 3.0.5:
http://irclogs.ubuntu.com/2010/04/15/%23ubuntu-mozillateam.html#t20:43

So I fixed it in my PPA for myself:
https://launchpad.net/~maxb/+archive/ppa

and gave up on pushing for an earlier fix in Lucid.

This will be fixed in Thunderbird 3.0.5 which is due out within a few
weeks. We will be pushing the pre-release to the Ubuntu Mozilla
Security PPA today or tomorrow:
https://launchpad.net/~ubuntu-mozilla-security/+archive/ppa

On 05/24/2010 10:40 AM, John Baron wrote:
>
> What is the status of this bug? It basically makes Thunderbird
> unusable on Ubuntu 10.04.
>

Micah Gersten (micahg) wrote :

Marking as triaged. This will be fixed in the 3.0.5 upload.

Changed in thunderbird:
milestone: none → 3.0.5
Changed in thunderbird (Ubuntu):
status: Confirmed → Triaged
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thunderbird - 3.0.5+build2+nobinonly-0ubuntu1

---------------
thunderbird (3.0.5+build2+nobinonly-0ubuntu1) maverick; urgency=low

  * New upstream release v3.0.5 (THUNDERBIRD_3_0_5_BUILD2)
    - see USN-943-1
    - fix LP: #556829 - thunderbird-bin crashed with SIGSEGV in
      nsQueryInterface::operator()
    - fix LP: #571296 - Thunderbird 3 "Message Pane" always visible

  [ Micah Gersten <email address hidden> ]
  * Drop patch after upstream landing of (bmo: 544481) aka
    Build fails on Ubuntu Lucid Lynx using 'dash' shell
    - drop debian/patches/fix-build-glitch.patch
    - update debian/patches/series
 -- Micah Gersten <email address hidden> Wed, 26 May 2010 12:28:42 -0500

Changed in thunderbird (Ubuntu):
status: Triaged → Fix Released

*** Bug 564206 has been marked as a duplicate of this bug. ***

Thomas Babut (thbabut) wrote :

When will this be fixed respectively be uploaded for Lucid? Thanks.

Thunderbird 3.0.5 should be in Lucid.

On 07/08/2010 02:01 AM, Thomas Babut wrote:
> When will this be fixed respectively be uploaded for Lucid? Thanks.
>
>

Changed in thunderbird:
importance: Unknown → Critical
Changed in thunderbird (Fedora):
importance: Unknown → Medium
status: Unknown → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.