frequent hanging

Bug #386326 reported by Nick Lally
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
thunderbird (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: thunderbird

I am not sure if this should be a duplicate of bug #110836 or bug #144437 or bug #264993 or bug #150578

I am seeing frequent hangs of thunderbird. I have multiple, shared IMAP accounts configured and several times a day thunderbird will completely hang.
An strace shows only futex() activity. For what it's worth I've attached a gdb backtrace from a hung process.

Revision history for this message
In , Ssitter (ssitter) wrote :

Do you see any error messages in the Error Console?

Revision history for this message
In , Chabrie (chabrie) wrote :

I see the following logs in the error console:

gCacheStyleSheet is not defined
chrome://lightning/content/calUtils.js
in Zeile 111

and

standard has no properties
chrome://calendar/content/calUtils.js
in Zeile 167

If anybody needs an imap account and a webdav account for testing, I see no problem. We could reproduce the error oon several systems.

Revision history for this message
In , Ssitter (ssitter) wrote :

Frank, does the issue still exists using Lightning 0.7 Release Candidate 1?
<http://releases.mozilla.org/pub/mozilla.org/calendar/lightning/releases/0.7rc1/>

The last error could be related to Bug 396580 or Bug 396873.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Build Identifier: 2008031218

Erreur : [Exception... "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIHttpChannel.getRequestHeader]" nsresult: "0x80040111 (NS_ERROR_NOT_AVAILABLE)" location: "JS frame :: file:///C:/Users/mdelorme/AppData/Roaming/Thunderbird/Profiles/lj3gm3xo.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/js/calDavCalendar.js :: checkDavResourceType_oSC :: line 1287" data: no]
Fichier source : file:///C:/Users/mdelorme/AppData/Roaming/Thunderbird/Profiles/lj3gm3xo.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D/js/calDavCalendar.js
Ligne : 1287

Reproducible: Sometimes

Steps to Reproduce:
1.happens very very often at startup

I restart 3 or more times thunderbird to make it not consume 100% of my CPU !!!
How could I've got more information
Actual Results:
lightning + thunderbird hangs
I'm _not_ using cached calendar
the CPU is 100% used
I noticed that when it hangs and I closed thunderbird its pop-up a message, "a security connection has not been closed properly"
It's hang as soon as lightning fetch calendar the cpu reach 100% and then hangs
until it has passed all calendars
What is stranged is that lightning fetch calendars disabled even when alarm are not to bee shown on this calendars

lightning + thunderbird hangs during 5mn at least with 23 caldav calendars

Expected Results:
lightning works properly as soon as I launch it

I've not real idea of what happens !!
I'm not sure that the error log is about that

Revision history for this message
In , Ssitter (ssitter) wrote :

What Thunderbird version do you use? What Lightning version do you use? What CalDAV server and version do you use?

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

Ligthning : Build Identifier: 2008031218
Thunderbird : version 2.0.0.12 (20080213)
Server : DavIcaL 0.9.1

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

Belong one of my calendar, I've got one that was cached so this bug is probably the same as 412914

Sorry for polluting BMO

*** This bug has been marked as a duplicate of bug 412914 ***

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

With
lightning :2008031318
Thunderbird : version 2.0.0.12 (20080213)
Server : DavIcaL 0.9.1

with _no_ cached calendars (this time I check carefully), lightning hangs at startup
Lightning doesn't connect anymore to DaviCAL, and can not send email !

How could I give more information ?

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

Sometimes it hangs sometimes not, but when it hangs its do not come back to a normal state

Revision history for this message
In , Bruno Browning (browning) wrote :

1) Do you have "Show Alarms" selected for your calendars? If so, does deselecting it (and restarting) have any effect?
2) Do you see anything in the error console? (you might want to create a new preference named calendar.debug.log and set it to true in order to get a bit more information)
3) Do you have access to the server logs? See anything there?

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

I confirm that id does not hang each time :-/
1) with or without alarms enable (for all my calendars) same result and same log
2) done no diffrence between hang or not
3) don't see anything particular

It seems more difficult to make it hangs when no "Show Alarms" checked

Revision history for this message
In , Bruno Browning (browning) wrote :

In order to get the error reported in the bug description, we're most likely getting an error back from the server that we're not handling. Any chance you could do a wiretrace and tell us what that error is?

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

With alarm activated, I succeed to make Lightning hangs
Error Console : lot of "refresh completed with status 207"
Aparche access_log : lot of
-----
82.227.98.214 - - [16/Mar/2008:14:19:55 +0100] "PROPFIND /cal/some_guy/home/ HTTP/1.1" 406
-----
suspect is to detect if the server support scheduling
then a lot of
-----
82.227.98.214 - mdelorme [16/Mar/2008:14:20:20 +0100] "OPTIONS /cal/some_guy/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080213 Lightning/0.8 Thunderbird/2.0.0.12"
---------
and lot of
---------
82.227.98.214 - mdelorme [16/Mar/2008:14:20:53 +0100] "REPORT /cal/some_guy/home/ HTTP/1.1" 207 14180 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080213 Lightning/0.8 Thunderbird/2.0.0.12"
-----
apache error_log : nothing interesting there for REPORT query at least

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

I've the feeling (but that's just a feeling) that when the server does not answer in short of time, Lightning hangs

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

last time it's hang I got this on apache Log
the console log was stuck on :
itemUri.spec = https://www.tennaxia.net/cal/mdelorme/home/db56b625-4196-41af-8f7c-fb74c0cbc63d.ics

And Apache access log :
82.227.98.214 - - [22/Mar/2008:11:29:02 +0100] "GET /cal/mdelorme/home/db56b625-4196-41af-8f7c-fb74c0cbc63d.ics HTTP/1.1" 401 1519 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12"
82.227.98.214 - mdelorme [22/Mar/2008:11:29:07 +0100] "GET /cal/mdelorme/home/db56b625-4196-41af-8f7c-fb74c0cbc63d.ics HTTP/1.1" 404 28 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12"

And the event doesn't exists

Revision history for this message
In , Bugzilla-babylonsounds (bugzilla-babylonsounds) wrote :

(In reply to comment #11)
> 82.227.98.214 - - [22/Mar/2008:11:29:02 +0100] "GET
> /cal/mdelorme/home/db56b625-4196-41af-8f7c-fb74c0cbc63d.ics HTTP/1.1" 401 1519
> "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; fr; rv:1.8.1.12) Gecko/20080201
> Firefox/2.0.0.12"

This not Lightning accessing a file on your server. That's your browser.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

you're right, seeing that it hangs there, I try to GET it by FF, my mistake :-/

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

It's very inconvenient, I need to restart 3 or more time Tb to make it not hang.
Is there something I can do to help ?
I'am a bit familiar with Eclipse, Javascript, Firebug ...

Revision history for this message
In , Christoph E (mail-christoph-evers) wrote :

Today I got the latest Lightning 0.8 RC2 (have been using 0.7 before) and have a very similar problem as Frank has:

Using an IMAP SSL Account together with an https caldav calendar makes Thunderbird stuck with 50% (dual core) CPU load. If I disable SSL in the IMAP account everything works fine. Unlike Frank, I can close my Thunderbird regulary. It just does not retrieve any email and calendar information. The error console does not report anything. Also, the problem will first come up when you restart Thunderbird (if you changed the settings). For instance, if you add a new https calendar into lightning then this new calendar will work until you restart Thunderbird.

This problem does not occur in Lightning 0.7 only with newer 0.8 RC2.

Revision history for this message
In , Temp2000 (temp2000) wrote :

I see a similar problem on a friend's computer, but in this case she's using an https connection to the webdav server and a secure SSL connection to the POP3 server (not IMAP). As Frank said, it seems to happen when she changes the calendar during a timed mail check. The timed mail checks happen every five minutes so she sees this problem at least once per day. In this case it doesn't matter if it's a meeting invite (she doesn't send or receive them).

The CPU was at 100% and I had to kill Thunderbird. Before I killed it I checked the current internet connections using Process Explorer. Thunderbird had open connections to both the POP3 server and the webdav server. The connections were stuck in the "Close_Waiting" state. Both servers are at FastMail.

This is on a single core 1.0 GHz CPU (WinXP), a high-speed internet connection, and the ICS file is over 100 KB. I forgot to check the Error Console but it's too late now because she wasn't happy so I switched her calendar to a local ICS file. She's using the 0.7 release version of Lightning and Thunderbird 2.0.0.9.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

One on my coworker in my compoany, since ligthning 0.8 with caldav calendar, has experince also a hang.
No error log. the connexion between thunderbird + lightning with external server seems cut

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

all of my coworkers are impact by this, their cpu is used 100%
We are now manually downgrading all of them to lightning 0.7
Is a very serious regression, I think
If I can help tell me what we can do, but I assure you that their a really big problem here
Ligthning : 0.8
Thunderbird : version 2.0.0.12 (20080213)
Server : DavIcaL 0.9.1
This bug in not UNCONFIRMED as all of my employees as the same bug !

Revision history for this message
In , Dbo-moz (dbo-moz) wrote :

More investigation for 0.9 is wanted.

Revision history for this message
In , Bruno Browning (browning) wrote :

I haven't been able to reproduce this myself, but I'm reasonably certain that it's caused by attempting to fetch&parse multiple large CalDAV calendars at startup, since we now fetch the entire calendar into a memory cache then. We'll probably want to serialize those initial loads to avoid the CPU spike Maxime is reporting, possibly through a calICalDavSiteManager interface though I'd prefer to avoid that if possible. It would also be good to have the ability to do that initial load from a local CachedCalendar if present so that we only need to go to the net for deltas.

It's possible that this is partly an issue with the webdav extension, so I'd be curious if the patch proposed in bug 416239 made any difference.

Revision history for this message
In , Stefan Max (stefan-max) wrote :

I am having the same problem (at least I think so), using the same Software (DavIcaL) with Thunderbird and Lightning.

I did a quick an dirty Test with Thunderbird 3 and the latest Lightning Nightly. The calenders did not show up, either, but the high CPU load was gone.

What puzzled me was that if you delete the calenders from Lightnings list, restart Thunderbird and resubscribe to the calenders, they show up instantly and correctly.

The DavIcaL and Lightning logs did not show anything at all. Lightning only said the server did not support scheduling.

Revision history for this message
In , Temp2000 (temp2000) wrote :

Maxime, is Thunderbird configured to automatically download new email when Thunderbird starts? Do you use a secure connection (e.g. SSL) to your email server? What happens if you don't automatically check for new email when Thunderbird starts, does the problem disappear?

I'm thinking that this could be the same as bug 390036 and bug 428522.

I'm not sure that Lightning 0.8 is causing the problem. As I reported in one of those bugs, Lightning 0.7 can also show this behavior.

Revision history for this message
In , Bruno Browning (browning) wrote :

(In reply to comment #20)
> I'm thinking that this could be the same as bug 390036 and bug 428522.
>
> I'm not sure that Lightning 0.8 is causing the problem. As I reported in one
> of those bugs, Lightning 0.7 can also show this behavior.
>

Interesting. Maxime's original report of this bug did coincide rather precisely with a major change to the CalDAV provider that caused increased startup loads, but I suppose that could be a red herring. Since at least one and I think probably both of the bugs Pete cites involved ICS calendars, not CalDAV, I have to wonder if the patch proposed in bug 416239 would make a difference: if so that would point a finger at the webdav extension. Good to know even if we don't decide to accept that patch.

The error cited in the bug description here is caused by the CalDAV provider trying to fish the WWW-Authenticate header out of the reponse to a PROPFIND request, and failing. That header kind of really ought to be there at that point, and we've had other reports of odd things happening wrt authentication with DAViCal (bug 423767, bug 428034) so it's possible this is a DAViCal issue rather than a Mozilla one. It would be helpful if someone could wireshark the interaction leading up to this error.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

(In reply to comment #20)
> Maxime, is Thunderbird configured to automatically download new email when
> Thunderbird starts?
YES
> Do you use a secure connection (e.g. SSL) to your email
> server?
YES
> What happens if you don't automatically check for new email when
> Thunderbird starts, does the problem disappear?
Can test it anymore ( see further ), but when I got this pb even if I restart and no mails was downloaded the bug appears
>
> I'm thinking that this could be the same as bug 390036 and bug 428522.
looks very same
>
> I'm not sure that Lightning 0.8 is causing the problem. As I reported in one
> of those bugs, Lightning 0.7 can also show this behavior.
In my case it never happens with 0.7
(In reply to comment #19)
> I haven't been able to reproduce this myself, but I'm reasonably certain that
> it's caused by attempting to fetch&parse multiple large CalDAV calendars at
> startup, since we now fetch the entire calendar into a memory cache then
You should be right, because my administrator (it's not me anymore) has put all 2007's events in a calendar backup, so my current calendar is much smaller and no more bug anymore

Revision history for this message
In , Ssitter (ssitter) wrote :

The error message is also reported in Bug 429329 / Bug 428034. Same issue?

Revision history for this message
In , Rikka (riccardo-granchi) wrote :

On my system A (Ubuntu 8.04 + Thunderbird 2.0.0.14 + Lightning 0.8 + Provider for Google Calendar 0.4) Thunderbird hangs at startup,
on my system B (WinXP SP2 + Thunderbird 2.0.0.14 + Lightning 0.8 + Provider for Google Calendar 0.4) Thunderbird work fine.

This is for specifiyng that I'm not using DavCal but Provider for Google Calendar, and one of my systems hangs in 80% of starts

Revision history for this message
In , Bernard-desruisseaux (bernard-desruisseaux) wrote :

Multiple users have reported this issue at Oracle.

I've been able to reproduce this issue multiple times on Windows XP SP2 with Thunderbird/2.0.0.14 setup with an IMAP account configured to use SSL and Lightning/0.8 setup with a CalDAV calendar configured to use HTTPS. The IMAP server and CalDAV server were on the same host (i.e., same host name).

The problem would occur when Thunderbird is accessing the IMAP server at the same time that Lightning is accessing the CalDAV server. A timing issue!

We've been able to reproduce the issue consistently by monitoring the CalDAV exchange in HTTPAnalyzer and clicking on the IMAP Inbox at just the right time (!) for the IMAP connection to be established in between the CALDAV:calendar-query REPORTs and the CALDAV:calendar-multiget REPORTs.

Most of the users that have reported this issue had Thunderbird configured to check for new messages at startup which would increase the likelihood that the IMAP connection is established at pretty much the same time as the CalDAV connection. That being said, on my machine (Dell Latitude D630, Intel Core2 Duo CPU, Windows XP SP2) the issue was easier to reproduce without checking for new messages at startup automatically, but rather by clicking on the IMAP Inbox at just the right time.

From what we've seen the size of the IMAP Inbox, or the number of calendar components in the CalDAV calendar collection didn't matter so much. That being said, their respective size would surely influence the time it takes the server to return its responses (and thus impact the timing between the IMAP and CalDAV access).

Using Process Explorer we were able to identify the thread that was taking all the CPU (50% on a Core2 Duo) in thunderbird.exe and look at the thread stack. From what we've seen the thread was looping in nsPrintSettings::GetStartPageRange. I don't know if that makes any sense or not.

Revision history for this message
In , Ssitter (ssitter) wrote :

(In reply to comment #25)
So it's indead the same as Bug 390036 but for CalDAV? See Comment #20 above.

Revision history for this message
In , Bernard-desruisseaux (bernard-desruisseaux) wrote :

(In reply to comment #26)
> (In reply to comment #25)
> So it's indead the same as Bug 390036 but for CalDAV? See Comment #20 above.
>

Indeed.

Revision history for this message
In , Bernard-desruisseaux (bernard-desruisseaux) wrote :

Multiple users have reported this issue at Oracle. See:

https://bugzilla.mozilla.org/show_bug.cgi?id=422618#c25

Using Process Explorer we were able to identify the thread that was taking all
the CPU (50% on a Core2 Duo) in thunderbird.exe and look at the thread stack.
From what we've seen the thread was looping in
nsPrintSettings::GetStartPageRange. I don't know if that makes any sense or
not.

Revision history for this message
In , Dmose (dmose) wrote :

This seems like pretty serious failure mode; requesting blocking.

Revision history for this message
In , Ssitter (ssitter) wrote :

*** Bug 422618 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ssitter (ssitter) wrote :

Resolving as duplicate per Comment #22 and Comment #27.

*** This bug has been marked as a duplicate of bug 390036 ***

Revision history for this message
In , Ssitter (ssitter) wrote :

*** Bug 428522 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ssitter (ssitter) wrote :

Confirming per duplicates.

Revision history for this message
In , Dmose (dmose) wrote :

This seems like a pretty serious failure mode; requesting blocking.

Revision history for this message
In , Philipp-bugzilla (philipp-bugzilla) wrote :

I tried to reproduce this, but from reading the previous comments it seems this issue only happens under special circmstances so its quite obvious that I couldn't. I tried the following:

* gmail imap server
* my webdav https server

- accept an event from the gmail server into the webdav calendar
--> Fails, but without error. Might be something else, no hang

- read mail, add/modify events
--> No error, no hang

Is this maybe OS dependant? Are there any sure-fire steps to reproduce this?

Revision history for this message
In , Christoph E (mail-christoph-evers) wrote :

Maybe it depends also from the IMAP Sever itself. For me it happens as soon as I have SSL enabled for both, calendar and imap. It does not matter in which order I enable SSL (first imap or first calendar). It hangs immediatley after restarting TB while connecting to the imap/calendar server. Can be reproduced on Win XP 32Bit and Ubuntu 64bit

Revision history for this message
In , Philipp-bugzilla (philipp-bugzilla) wrote :

What type of IMAP server do you use? What sort of webdav server do you use? Is the IMAP server on the same host as the SSL webdav server? What

I used gmail as imap and apache mod_dav for webdav/

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

(In reply to comment #13)
In my case
> What type of IMAP server do you use?
Cyrus
What sort of webdav server do you use?
DaviCal
> Is
> the IMAP server on the same host as the SSL webdav server?
Yes

Revision history for this message
In , Christoph E (mail-christoph-evers) wrote :

(In reply to comment #13)
> What type of IMAP server do you use?
Don't know. Host ist for example imap.strato.de

>What sort of webdav server do you use?
DaviCal Caldav server

>Is the IMAP server on the same host as the SSL webdav server?
No

Revision history for this message
In , Dmose (dmose) wrote :

See bug 422618 comment 25 for steps on how to reproduce.

Revision history for this message
In , Kai Engert (kaie) wrote :

the 50% busy sounds like a dual core computer with one cpu busy at 100% ?

although highly unlikely, just to be sure, can you trace the IMAP code and check whether it is doing additional i/o in the ssl test case?

can you use a packet sniffer and check whether there is constantly data being transfered, or whether the connection is idle?

if there is constantly data being transfered, one might use a tool like ssltap to snoop the ssl traffic to see what's going on.

is this specific to windows or happening on all platforms?

I know the SSL thread is doing a busy wait on SSL I/O under certain circumstances, if NSPR is unable to create a loopback socket for implementing the nspr pollable event

Revision history for this message
In , Jaap van Ginkel (j-a-vanginkel) wrote :

(In reply to comment #13)
> What type of IMAP server do you use?
Exchange server imaps

>What sort of webdav server do you use?
DaviCal Caldav server

>Is the IMAP server on the same host as the SSL webdav server?
No

Revision history for this message
In , bvdbos (bvdbos) wrote :

(In reply to comment #17)
> the 50% busy sounds like a dual core computer with one cpu busy at 100% ?

Judging similar reports in bugzilla it seems like this.

> I know the SSL thread is doing a busy wait on SSL I/O under certain
> circumstances, if NSPR is unable to create a loopback socket for implementing
> the nspr pollable event

Shouldn't the busy wait be something like 15 seconds max as a server-setting? I see comment 18, comment 14 and comment 15 (the three confirmed setups) all use Davical. Judging http://wiki.davical.org/w/Road_Map ssl-encryption doesn't work for Davical?

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

(In reply to comment #19)

> Judging http://wiki.davical.org/w/Road_Map ssl-encryption doesn't work
> for Davical?

Davical use the mod-SSL of Apache to support ssl-encryption (Davical is in PHP and use Apache)
(In reply to comment #17)
> the 50% busy sounds like a dual core computer with one cpu busy at 100% ?
I confirm that in my computer the 50% are for one core at 100%

Revision history for this message
In , bvdbos (bvdbos) wrote :

> Davical use the mod-SSL of Apache to support ssl-encryption (Davical is in PHP
> and use Apache)

I thought it had to be something like this :-) I see Maxime uses DaviCal 0.9.1, I wonder wether the version of the DaviCal server makes a difference. Bernard, which server does Oracle use? Jaap and Christoph, which versions do you use? As bug 416239 should be solved for caldav since 25-07, do you still see this with a recent nightly?

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

I use Lightning build 2008073119, Apple's CalendarServer SVN from 2008-01-11 on a Debian box, and Courier as IMAP, with both IMAPS and CalDav's https - and I see the same problem, in about 80% of all starts of thunderbird. AFAICS CalendarServer does not use Apache internally.

Revision history for this message
In , Bernard-desruisseaux (bernard-desruisseaux) wrote :

(In reply to comment #21)
> Bernard, which server does Oracle use?

We're using both the CalDAV and IMAP servers part of Oracle Beehive.

Revision history for this message
In , Jaap van Ginkel (j-a-vanginkel) wrote :

>Jaap and Christoph, which versions do you use?
>As bug 416239 should be solved for caldav since 25-07, do you still see this
>with a recent nightly?

ii rscds 0.9.5.1 DAViCal CalDAV Server

No errors on error console

100% CPU on single core box 50% on dual core

Very unresposive but not totaly dead 99% CPU sometimes

Revision history for this message
In , Jaap van Ginkel (j-a-vanginkel) wrote :

Oh sorry Lightning version Nightly build 0.9.pre 2008073119

Revision history for this message
In , Bruno Browning (browning) wrote :

For CalDAV users, with recent nightlies there are two preferences that you could set, calendar.debug.log and calendar.debug.log.verbose, that will significantly increase the amount of data being logged; that might help getting to the bottom of this. Also, as Kai suggested in comment #17, it would be useful if someone could wireshark a machine with this problem to see if there is network traffic associated with the CPU load. His question as to platform is also a good one - does this happen on Windows only?

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

I created the two preference settings as integers in the option's config editor, with their value set to 10000 (the more the better? :-) Where should I see the output? There is not a single line in the error console, even though it's set to display "All".

I noticed that the same issue shows up without network connection, i.e. with all network interfaces turned off. Windows (and yes, I have only tested on windows so far) does not show any network traffic on its interface.

Would you know of of a thunderbird binary with debug symbols? It should allow me to attach a debugger and tell you what keeps my CPU busy. Without I can basically only tell you what dlls the 17 threads are in.

Revision history for this message
In , Bruno Browning (browning) wrote :

Sorry not to have been more explicit: those prefs want to be booleans set to 'true'. The .verbose one is not in 0.8; you'll want to use a fairly new nightly to take advantage of it.
FWIW I've not been able to reproduce this on Linux - and I've tried.
I don't know where one could find a debug binary short of building one.

Revision history for this message
In , Jaap van Ginkel (j-a-vanginkel) wrote :

Updated to the agust 4th Nightly

I've been running with .verbose for a few hours now.

For some reason I can't reproduce the error anymore though Thunderbird is eating CPU cycles every time I touch anything calendar like, It gets sluggish but no more 100% usage.

Nothing out off the ordinari in the logging. Even eating 40% cpu with no extra loging.

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

OK, got it (well, the debug output :-) working, thanks. The last two messages I see are

CalDAV: recv: <?xml version='1.0' encoding='UTF-8'?>
<multistatus xmlns='DAV:'>
  <response>
    <href>/calendars/users/axel/calendar/2e0b9500-0087-4e08-bb63-b880483c0fb9.ics</href>
    <propstat>
      <prop>
        <getetag>"ddbb339c33e6de012516a516690431c5"</getetag>
      </prop>
      <status>HTTP/1.1 200 OK</status>
    </propstat>
  </response>
[...]
  <response>
    <href>/calendars/users/axel/calendar/a4902178-1465-41ab-9ae1-ec37327367d6.ics</href>
    <propstat>
      <prop>
        <getetag>"f38036e9996d56df10039b49783c75e1"</getetag>
      </prop>
      <status>HTTP/1.1 200 OK</status>
    </propstat>
  </response>
  <response>
    <href>/calendars/users/axel/calendar/ba61862e-6ec9-4165-94ca-2fffe5a0a11f.ics</href>
    <propstat>
      <prop>
        <getetag>"eeee26c5605e6211d4f740e0f48edd79"</getetag>
      </prop>
      <status>HTTP/1.1 200 OK</status>
    </propstat>
  </response>
</multistatus>
CalDAV: send: <?xml version="1.0" encoding="UTF-8"?>
<calendar-multiget xmlns:D="DAV:" xmlns="urn:ietf:params:xml:ns:caldav">
  <D:prop>
    <D:getetag/>
    <calendar-data/>
  </D:prop>
  <href xmlns="DAV:">/calendars/users/axel/calendar/ba61862e-6ec9-4165-94ca-2fffe5a0a11f.ics</href>
[...]
  <href xmlns="DAV:">/calendars/users/axel/calendar/eb72ddf8-a764-42f9-aaab-ebecafe2f19c.ics</href>
  <href xmlns="DAV:">/calendars/users/axel/calendar/b650acf2-4e2d-4a9a-8c5f-732f0e15d1ba.ics</href>
  <href xmlns="DAV:">/calendars/users/axel/calendar/f65900c6-4e1a-4b92-9129-6e196a660775.ics</href>
</calendar-multiget>

After that it hangs with 100% CPU (on one core). So that probably doesn't help.

I now have a x86_64 debug build for linux - which doesn't show the problem :-( So to me it looks like it's windows only. I'll post my findings once I have the debug build for windows.

Revision history for this message
In , Kai Engert (kaie) wrote :

Axel, if it's really Windows only, that might support my theory.
Do you have some Security Firewall enabled on your Windows system, that does prevent Firefox from opening a server socket on the loopback device?

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

No firewall but the Windows one. Disabling it has no effect; thunderbird still hangs. Process Explorer tells me that thunderbird has opened four local sockets (say port 2283-2286). 2283 connects to 2284, 2285 to 2286. All connections are in the "established" state. That's the case both when thunderbird and lightning work and when they hang.

Revision history for this message
In , Kai Engert (kaie) wrote :

I had not realized (until now) that we're talking about a real hang, I had assumed we just waste the cpu cycles.

So we've got a real deadlock, and that's not likely to be related to the busy wait I have mentioned.

Ideally someone who is would attach a debugger and get stack traces of all threads.

Revision history for this message
In , Kolargol00 (kolargol00) wrote :

I don't think this issue is Windows-only, see bug #428522. It happens on my Linux system.

IMAP servers are Courier and SurgeMail, CalDAV server is Bedework.

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

Created an attachment (id=333109)
backtrace of all threads when TB hangs

Thread 2892 eats up all the CPU time.

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

(From update of attachment 333109)
Sorry for the long wait. Attachment #333109 is the stack trace for all threads; for those without symbols I only quoted one stack frame (without symbols :-). The thread eating up the CPU time is thread 2892.

I don't see nsSocketTransportService::Run()'s variable "active" ever becoming false, so the while loop starting at line nsSocketTransportService2.cpp:532 is never left and instead runs continuously. But maybe that's the plan and instead it's some WaitFor123manyObject that fails - I didn't really look at the code yet.

Please let me know what variables you care about.

Revision history for this message
In , Dmose (dmose) wrote :

I believe mvl told me he'd be looking at this, so I'm taking the liberty of reassigning to him. If I misunderstood, please let me know.

Revision history for this message
In , Mvl (mvl) wrote :

I can't reproduce this, so it's hard for me to fix. Besides, I won't have time to work on this for at least a few days.

Revision history for this message
In , Kai Engert (kaie) wrote :

Axel, thanks for the stacks. I guess you are using the latest 1.8.1 code (MOZILLA_1_8_BRANCH) ?

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

Yes; both thunderbird and the calendar are MOZILLA_1_8_BRANCH.

Revision history for this message
In , Kai Engert (kaie) wrote :

Axel, yes I think it is intended that the while(active) loop in nsSocketTransportService2 is never left.

You say that loop is consuming all CPU while you are experiencing the deadlock.
cc'ing biesi who is our most experienced active developer of that code.

Maybe we can find a way to use logging that will tell us why that socket code is constantly active, rather than waiting for data to dispatch.

Revision history for this message
In , Kai Engert (kaie) wrote :

Christian, please see my comment 41. We experience a deadlock in socket transport.

Revision history for this message
In , Ssitter (ssitter) wrote :

Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2 and a Lightning 0.6a1 nightly build?

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :
Download full text (4.2 KiB)

I did not test the trunk yet.

I poked around the code a bit, and this is what I believe happens: lightning's (SSL) socket is polled on read | write. I see that in nsSSLThread.cpp requestPoll(), the socket currently serviced by the thread is the IMAPS socket, so the switch on si->mThreadData->mSSLState gets evaluated. We hit ssl_idle. The problem is that si->mThreadData->mOneBytePendingFromEarlierWrite is true, so the poll returns immediately claiming that the lightning socket has something to work on, and resetting all other sockets' poll status.

The poll result "write" is now handled by trying to write to the Lightning SSL socket. That fails, though, because the IMAPS socket is blocking the SSL thread. Here is the relevant backtrace of the failure:

> pipnss.dll!nsSSLThread::requestWrite(nsNSSSocketInfo * si=0x04a66e68, const void * buf=0x04e7c9b5, int amount=323) Line 734 C++
  pipnss.dll!nsSSLIOLayerWrite(PRFileDesc * fd=0x03d6fa90, const void * buf=0x04e7c9b5, int amount=323) Line 1351 + 0x11 bytes C++
  nspr4.dll!PR_Write(PRFileDesc * fd=0x03d6fa90, const void * buf=0x04e7c9b5, int amount=323) Line 146 + 0x14 bytes C
  necko.dll!nsSocketOutputStream::Write(const char * buf=0x04e7c9b5, unsigned int count=323, unsigned int * countWritten=0x022dfe0c) Line 550 + 0x12 bytes C++
  necko.dll!nsHttpConnection::OnReadSegment(const char * buf=0x04e7c9b5, unsigned int count=323, unsigned int * countRead=0x022dfe0c) Line 524 + 0x26 bytes C++
  necko.dll!nsHttpTransaction::ReadRequestSegment(nsIInputStream * stream=0x046e0b80, void * closure=0x049cee60, const char * buf=0x04e7c9b5, unsigned int offset=0, unsigned int count=323, unsigned int * countRead=0x022dfe0c) Line 405 + 0x1c bytes C++
  xpcom_core.dll!nsMultiplexInputStream::ReadSegCb(nsIInputStream * aIn=0x04c554c8, void * aClosure=0x022dfe10, const char * aFromRawSegment=0x04e7c9b5, unsigned int aToOffset=0, unsigned int aCount=323, unsigned int * aWriteCount=0x022dfe0c) Line 288 + 0x29 bytes C++
  xpcom_core.dll!nsStringInputStream::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* writer=0x002ff4f0, void * closure=0x022dfe10, unsigned int aCount=323, unsigned int * result=0x022dfe0c) Line 240 + 0x22 bytes C++
  xpcom_core.dll!nsMultiplexInputStream::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* aWriter=0x02023ee0, void * aClosure=0x049cee60, unsigned int aCount=4096, unsigned int * _retval=0x022dfea4) Line 245 + 0x28 bytes C++
  necko.dll!nsHttpTransaction::ReadSegments(nsAHttpSegmentReader * reader=0x049f1670, unsigned int count=4096, unsigned int * countRead=0x022dfea4) Line 430 + 0x2b bytes C++
  necko.dll!nsHttpConnection::OnSocketWritable() Line 559 + 0x1e bytes C++
  necko.dll!nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream * out=0x049f1900) Line 770 + 0xb bytes C++
  necko.dll!nsSocketOutputStream::OnSocketReady(unsigned int condition=0) Line 490 C++
  necko.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd=0x03d6fa90, short outFlags=2) Line 1474 C++

Writing returns with an error code due to this piece of code (the line numbers ...

Read more...

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

I cannot answer Stefan's question on Shredder a2. Installing the lightning build from 2008-08-17 http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-trunk/windows-xpi/lightning.xpi with a newly installed Shredder alpha 2 build http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/3.0a2-candidates/build1/thunderbird-3.0a2.en-US.win32.installer.exe fails with "Lightning 0.6a1 could not be installed because it is not compatible with Shredder 3.0a2." Thunderbird and Shredder are in different install locations and they use different, separate profiles.

Isn't this supposed to be working or am I doing something obviously wrong? Should I be using a Thunderbird nightly trunk build instead of alpha2 because of bug 448753?

Revision history for this message
In , Dbo-moz (dbo-moz) wrote :

Removing bug from the blocking list, since there's no solution at the horizon and it's still not reproducable for most developers. We really feel sorry about that, but please understand that we need to move on.
Moreover I don't yet see this is something we could fix in calendar-land, but it looks like a necko/platform bug to me, thus a fix would presumably require a thunderbird update.

Revision history for this message
In , Bugzilla-babylonsounds (bugzilla-babylonsounds) wrote :

We should add this bug to the release notes then.

Revision history for this message
In , Kai Engert (kaie) wrote :

Axel, thanks a lot for your very helpful analysis.

Let me describe the scenario, combined with SSL/PSM state

- application code talks to the PSM I/O layer,
  which talks to the NSS libSSL I/O layer

- the described bug happens after libSSL has signalled a short write,
  some bytes not yet flushed out to the socket

- libSSL expects that we call "write" again,
  giving it a chance to flush

- when we arrive in this state,
  PSM reports to the application level "-1 bytes written, would block"

- when the application level calls (write) again,
  PSM calls into libSSL, trying to flush

  I don't know what happens if libSSL is still unable to flush,
  as it appears to be in this bug scenario.

  I suspect it will tell us "would block".

- when the application level polls, while we are in this
  "short write, need flush" mode,
  the PSM layer will always signal "writeable" to the application level.

  It's done this way, because apparently when I wrote the code,
  I didn't know of a way to ensure we'll wake up, once the socket becomes
  writable again.

- Axel tells us, the application code constantly polls and attempts to write
  never succeeding, resulting in a deadlock.

Revision history for this message
In , Kai Engert (kaie) wrote :

So, it was necessary for me to write the previous paragraph, in order to refresh my memory about how the SSL interaction works.

Now I've seen Axel's statement, which is describing the cause for the deadlock:

- the IMAPS socket blocks the SSL thread
  (I don't know yet how this can happen)

- the calendar code tries to write SSL data to the calendar server,
  but fails, because our SSL thread currently only has a single worker.
  If the SSL thread is blocked on a read/write call, then other application
  requests for reading/writing SSL get rejected (postponed) with "would block"

The next step is, we must understand why the SSL thread is blocked by the IMAPS thread.

Revision history for this message
In , Nelson-bolyard (nelson-bolyard) wrote :

I wish that PSM didn't limit itself to a single thread for SSL.
libSSL certainly doesn't impose that limiation.

Revision history for this message
In , Kai Engert (kaie) wrote :

The SSL worker thread is designed in a way that would allow for additional worker threads, someone just have to find the time to write the additional code.

The decoupling into the SSL worker thread had been necessary, in order to allow us to callback into necko, while we are blocked in libssl, waiting for ocsp results.

When I implemented that decoupling I had decided to not increase the complexity of that development project further, and decided to postpone the introduction of multiple worker threads until necessary.

This is the first bug I've seen that really requires us to have more threads. Well, assuming that the analysis is correct.

Revision history for this message
In , Kai Engert (kaie) wrote :

David Bienvenu: On IRC you said, the IMAP code might do blocking I/O. You pointed me to function nsImapProtocol::CreateNewLineFromSocket(), which I indeed can see in the attached list of stacks.

I followed that code to nsPipeInputStream::Wait which says it is waiting for a pipe.

What kind of pipe is that? Is it the input socket/fd, or is it some helper pipe?

Revision history for this message
In , Bienvenu (bienvenu) wrote :

It's just a pipe on the input stream, see nsImapProtocol::SetupWithUrl. We create a transport on the io socket:
        rv = socketService->CreateTransport(&connectionType, connectionType != nsnull,
                                            *socketHost, socketPort, proxyInfo,
                                            getter_AddRefs(m_transport));

and then we open an input stream on that transport:
          rv = m_transport->OpenInputStream(nsITransport::OPEN_BLOCKING, 0, 0, getter_AddRefs(m_inputStream));

Revision history for this message
In , Jeroen van Disseldorp (dizzl) wrote :

I'm not familiar with TB or Lightning's internals, but I think I might have a repeatable test case for this situation. See below. Let me know if I can help with testing a fix.

Test case:
When I fire up Thunderbird with Lightning enabled, it always eats 50% of my dual core directly after starting up. This situation sometimes goes away after about 10-20 seconds, but sometimes not. It seems to be a race condition in Lightning's initialization.

In the first case, TB is blocked for the first 10-20 seconds. Mouse clicks are hardly accepted, and the only thing to do seems to be just wait until the 50% CPU use blows over. After that TB/Lightning are usable as one would expect.

In the latter case, TB is blocked completely. No mouse clicks are handled anymore, but I can close TB and the 50% CPU use goes down again.

Both cases seem to occur about 50% of the time.

Using:
TB 2.0.0.16 (from Ubuntu repository)
Lightning 0.9pre nightly (27 Aug build)
Plain IMAP over 143, which "Use TLS if available" checked
8 CALDAV calendars via HTTPS, using a Davical backend

Revision history for this message
In , Jeroen van Disseldorp (dizzl) wrote :

After some playing with my settings, I found that using plain IMAP (ie. switching off "Use TLS if available" and selecting "Never") makes the lock-ups disappear. Lightning still blocks Thunderbird's thread though in the first 10-20 seconds with 50% CPU use, but after this period all CalDav data is loaded and Thunderbird becomes responsive and usable.

Revision history for this message
In , John Pye (jdpipe) wrote :

I am seeing this problem too. I see 100% CPU usage; I presume that the OP reports 50% CPU because they have dual processor of some sort.

I just installed lightning 0.9 rc2 on Ubuntu Hardy and this problem still exists. I have two IMAP servers, one is secure (SSL) and the other is insecure. The calendar is secure (SSL) webDAV to a server in the same domain as the secure IMAP server (messagingengine.com).

Would have been great if this could be fixed for 0.9...

Revision history for this message
In , Manomi (manomi) wrote :

I have also been experiencing the same problem (100% CPU usage) since Lightning ver. 0.8 and still having it with ver. 0.9.
When I start Thunderbird (TB) in a PC, TB hangs for a while (10-15 minutes) and the problem occurs periodically. When TB hangs, TB-Lightning is communicating with Google site (it was reported by VirusBuster).
I tested it in the safe mode, but the problem continued appearing. So it seems that other programmes (anti-virus, etc.) are irrelevant to the problem.
I tested in other two computers with the same set of calendars; the problem did not come up. All three computers' operating systems are Windows XP SP2. The main difference among them concerning TB is the fact that the profile of the first computer is much larger and more complicated with many multi-layered folders.
Therefore, after creating another profile in the first PC, I tested with the same set of calendars and found no problem. But when I copied the messages to the new profile in the first PC, the problem began occurring.
Another test was to deactivate some calendars. The problem discontinued when I deactivated the largest calendar. I can use safely other smaller calendars.

Judging from these, I suspect the followings:
- This problem happens only with a certain type of profile and a certain type of calendar.
- Such profiles must have many messages and multi-layered folder structure, or other conditions.
- Such calendars must have many items.

Hopefully this information is useful. I will be happy to provide further details if needed.

Revision history for this message
In , Support-vanderhorn (support-vanderhorn) wrote :

Last week 4 users received Lightning-0.9-win.
Also 99% CPU usage single core / 50% CPU usage dual core.
Thunderbird version 2.0.0.16 and 2.0.0.17 on Win XP SP2.
Removing Lightning solved the CPU-load problem completely.
Reinstalling initially appeared to run fine, but the problem
came back at for us not definabele times.
We use Kerio Mail Server for IMAPS and Calendar.
No indications found in the logs there.
As a measure, we asked our user not to accept the 0.9 version for now.

I am happy to provide more details if productive.

Revision history for this message
In , Ssitter (ssitter) wrote :

(In reply to comment #43)
> Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> and a Lightning 0.6a1 nightly build?

Did someone tried to retest using Trunk builds as requested? Matching test builds can be found at <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/> and <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-central/>.

Revision history for this message
In , Support-vanderhorn (support-vanderhorn) wrote :

(In reply to comment #59)
> (In reply to comment #43)
> > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > and a Lightning 0.6a1 nightly build?
>
> Did someone tried to retest using Trunk builds as requested? Matching test
> builds can be found at
> <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/>
> and
> <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-central/>.

Just installed both of them, removed all other add-on's.
I will report as soon as anything stange occurs. Nico

Revision history for this message
In , Support-vanderhorn (support-vanderhorn) wrote :

(In reply to comment #60)
> (In reply to comment #59)
> > (In reply to comment #43)
> > > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > > and a Lightning 0.6a1 nightly build?
> >
> > Did someone tried to retest using Trunk builds as requested? Matching test
> > builds can be found at
> > <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/>
> > and
> > <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-central/>.
>
> Just installed both of them, removed all other add-on's.
> I will report as soon as anything stange occurs. Nico

Attempt to open Junk E-mail gave the following error:
Unable to open the summary file for Junk E-mail. Perhaps there was an error on disk, or the full path is too long.

Later several other folders had the same problem, exit & restart did not solve this.

Therefore I went back to Thunderbird-2.0.0.17 and Lightning-0.9 for now. Nico

Revision history for this message
In , Support-vanderhorn (support-vanderhorn) wrote :

(In reply to comment #61)
> (In reply to comment #60)
> > (In reply to comment #59)
> > > (In reply to comment #43)
> > > > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > > > and a Lightning 0.6a1 nightly build?
> > >
> > > Did someone tried to retest using Trunk builds as requested? Matching test
> > > builds can be found at
> > > <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/>
> > > and
> > > <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-central/>.
> >
> > Just installed both of them, removed all other add-on's.
> > I will report as soon as anything stange occurs. Nico
>
> Attempt to open Junk E-mail gave the following error:
> Unable to open the summary file for Junk E-mail. Perhaps there was an error on
> disk, or the full path is too long.
>
> Later several other folders had the same problem, exit & restart did not solve
> this.
>
> Therefore I went back to Thunderbird-2.0.0.17 and Lightning-0.9 for now. Nico

After going back to Thunderbird 2.0.0.17 and Lightning 0.9, I got again 99% CPU. Stop/start TB again seemed to solve it initially, but the hourglass appeared while cursor over folder-browser or message-list, not during cursor-over message itself. Switching back- and forth between Calendar and Email made the hourglass disappear.

I have no insight in the code, but hope anyway my contributions are helpfull.

Revision history for this message
In , John Pye (jdpipe) wrote :

Does anyone know if the new 'CACHE (experimental)' feature allows one to side-step this bug?

This bug is a really major issue for my use of Lightning -- basically any day that I have a calendar reminder coming up when I first launch my email, I get the 100% CPU problem and I have to kill Thunderbird. Then I have to relaunch it, and wait a few minutes before I dismiss or snooze the reminder (I guess so that all other SSL connections can do their stuff, since only one SSL connection at a time has been implemented, IIUIC).

Revision history for this message
In , Dbo-moz (dbo-moz) wrote :

Caching triggers an initial sync (with network load) on startup. Thus it's unlikely to be a cure.

Revision history for this message
In , John Pye (jdpipe) wrote :

How about some way of delaying the intial sync by say a couple of minutes - is there perhaps some hidden configuration option for that?

Revision history for this message
In , Jeroen van Disseldorp (dizzl) wrote :

At Stefan:
I tried the Thunderbird and Lightning trunk builds on 3 machines: my home workstation, work laptop and work workstation. All 3 perform significantly better. After startup there is a small blockage of around 10 seconds, but after the calendars are loaded, everything works smoothly. And much more snappy I must say.

The only annoyance of the trunk build is the fact that the CALDAV authentication dialogs keep popping up (passwords are stored correctly, so clicking OK works). Since this is a minor issue compared to the 50% CPU usage, I will stick to Shredder for the coming period.

Revision history for this message
In , Stefan Max (stefan-max) wrote :

I am experiencing this bug, too. (Thunderbird 2 and Lightning 0.8+)

Today I tried the trunk builds, and they worked for me. Pretty fast load times and no problems with hanging or cpu load. I tried this with Google Calendar and the new Google CalDAV implementation, both work fine.

CalDAV authentication dialogues keep popping up for me too, "native" Google Calendar does not prompt for passwords.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

Does this bug closed on trunk ?

Revision history for this message
In , Jeroen van Disseldorp (dizzl) wrote :

I don't know. In trunk I see similar behaviour, it is just not as disturbing. Although I do not know the code, what I think happens is the following:

1. TB starts up
2. Lightning starts downloading my CalDav calendars in parallel (I have 7 of them)
3. Every time a calendar has been downloaded, it loops through the list of appointments and starts processing them
4. TB operates as normal

For larger calendars Step 3 can take up to 10-15 seconds, even on a Core Duo. During that time TB blocks completely (Compiz turns it into black&white). I assume thereforce that Step 3 is executed in the main thread, blocking the main window handling. After the processing, TB colorizes again and works as normal. This process only happens at TB startup.

I hope this makes sense for anyone familiar with the code.

Revision history for this message
In , Mdelorme-tennaxia (mdelorme-tennaxia) wrote :

Thanks for your comment
These current bug seems fixed in the trunk
but what you observe and that I observe as well let me think that lightning doesn't perform enough to be used my company

Revision history for this message
In , Paul-kolomiets (paul-kolomiets) wrote :

I've reproduced this for debug build. It seems like IMAP operation acquires ssl_thread_singleton->mBusySocket blocker (see security/manager/ssl/src/nsSSLThread.cpp) and never releases.

Revision history for this message
In , Dmose (dmose) wrote :

Paul, was that on trunk or branch?

Bernard/Frank, are you guys still seeing this with trunk builds?

Revision history for this message
In , Paul-kolomiets (paul-kolomiets) wrote :

Well, actually, it was reproduced for Thunderbird 2.0.0.12 sources.

Revision history for this message
In , Dmose (dmose) wrote :

Paul, can you try a trunk build (either a nightly, or from source) and see if you can reproduce it there?

Revision history for this message
In , Stefan Max (stefan-max) wrote :

I experienced this problem also. I tried out Thunderbird 3 Beta 1
"Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.1b3pre) Gecko/20081204 Lightning/1.0pre Thunderbird/3.0b1"
and a corresponding Lightning build (probably from the same day) and had no problems whatsoever.

Revision history for this message
In , Paul-kolomiets (paul-kolomiets) wrote :

I apologize, I do not have enough time to try a trunk sources. Here the exact steps to reproduce the bug:

0. Ensure that your imap & webdav accounts are using secure connection.
1. Start continuous imap operation, say synchronize for offline.
2. Try to upload a huge file to webdav server while imap operation is in progress.
3. Here it is. Enjoy :)

Revision history for this message
In , Sd4705 (sd4705) wrote :

I have XPSP3. When I start thunderbird (connecting to IMAP with SSL) with lightning (connecting to google cal via HTTPS) the program often hangs with 100% CPU utilization. I can restart thunderbird in safe mode, and it will successfully download my email. I then close and restart in normal mode, and it works OK. On a different computer, also with XPSP3 but much newer with more memory etc, I connect to the same email and google calendars, and I haven't encountered any problems so far.

Revision history for this message
In , Kai Engert (kaie) wrote :

I think I had proposed that one solution is to change the IMAP/mail code implementation to no longer use any blocking I/O. I guess such a change has not happened, or is probably unlikely to expect.

The other solution is to change the PSM code to use multiple SSL threads, instead of just one.

Revision history for this message
In , Bienvenu (bienvenu) wrote :

yes, it's unlikely that we're going to rewrite the imap code to use non-blocking i/o.

Revision history for this message
In , Nelson-bolyard (nelson-bolyard) wrote :

Given that NSS itself does not impose any single-threaded limitations on
users of SSL, and allows many threads to simultaneously do SSL, the fact
that some Mozilla code (which I gather is PSM) imposes a single-thread
limitation on the use of SSL is rather disappointing. I am willing to
work with Kai or anyone to remove that limitation.

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to comment #80)
> Given that NSS itself does not impose any single-threaded limitations on
> users of SSL, and allows many threads to simultaneously do SSL, the fact
> that some Mozilla code (which I gather is PSM) imposes a single-thread
> limitation on the use of SSL is rather disappointing. I am willing to
> work with Kai or anyone to remove that limitation.

Long story, caused by Mozilla's networking code being single-threaded, and the need to allow a callback into http, while blocked on ssl (for ocsp).

When I implemented the fix to allow proxied ocsp requests (by allowing to call back into mozilla network code, and at the same time decoupling from the network layer), I had implemented a quite complicatedd patch.

At that time, in order to avoid additional complexity, I went with a single SSL worker thread.

Now the time has come to extend that to a pool of threads. During the last 2-3 days I worked on a patch, I'm mostly done, but I need to review my own code and identify a bug.

Problem is, this patch will change the core of PSM. It's not a mail patch, but a core patch. So you'd have to use a version of core gecko that contains this patch...

Revision history for this message
In , Kai Engert (kaie) wrote :

Are you still able to reproduce this with Thunderbird 3 and Lightning?

If I understand correctly, this bug is triggered when using both MAIL/SSL and CALENDAR/SSL.

While I saw this problem with TB 2, I can currently not reproduce with TB 3.
Can you?

Revision history for this message
In , Dennis Melentyev (dennis-melentyev) wrote :

hm...
How can I check it? Have no Lightning working at all:

TB3:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090211 Shredder/3.0b2pre (as of yesterday)

Add-ons:
- Lightning 1.0pre (build 20081104031354)
- Provider for Google Calendar 0.6pre (Requires additional items, can live w/o this)
- Quicktext 0.9.9.9

Not compatible with Shredder 3.0b2pre
No updates found.

Revision history for this message
In , Ssitter (ssitter) wrote :

(In reply to comment #83)
> - Lightning 1.0pre (build 20081104031354)

Try a current nightly build instead of sticking with an old one:
http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-trunk/

Revision history for this message
In , Dennis Melentyev (dennis-melentyev) wrote :

Thanks Stefan!

Silly me.
Just installed nightly build.
Works so far.

Revision history for this message
In , Kai Engert (kaie) wrote :

Are you saying the problem is gone?

Revision history for this message
In , Kai Engert (kaie) wrote :

Created an attachment (id=361985)
Patch v1

This is my first attempt to get the multiple worker threads implemented.

It seems to work for me, but unfortunately I see a crash when having the flash plugin installed, so I suspect this patch needs some more reviewing to find the bug.

But before we try to get this in, we must have reliable steps to reproduce the original deadlock/hang problem with Thunderbird 3 nightlies.

Revision history for this message
In , Dennis Melentyev (dennis-melentyev) wrote :

(In reply to comment #86)
> Are you saying the problem is gone?
Not yet - have to setup IMAP folder first.
Those problems described in bug 444537 and bug 458690 are not visible anymore
for me, but definitely were connected to SSL issues.

Revision history for this message
In , Kai Engert (kaie) wrote :

I would appreciate testing and feedback from someone who is able to reproduce this bug with TB 2.

I have been running Thunderbird 3 test versions, I have 3 IMAP/SSL accounts configured, I have 3 remote https calendar configured.

I configured all mail accounts to check for new mail every 1 minute, and to reload all remote calendars every one minute.

What I can see are slowdowns when using Thunderbird. It appears to stall for 1-2 seconds occassionally, probably as calendar data is being processed.

But I don't get any deadlocks, have been running this configuration for over 2 days. Linux.

Revision history for this message
In , Dmose (dmose) wrote :

Simon, is this something you can reproduce on trunk or 1.9.1?

Revision history for this message
In , Simon-at-orcl (simon-at-orcl) wrote :

One IMAPS account and 2 CalDAV accounts with HTTPS URLs. All connecting to the same host through a slow VPN connection (It's harder to reproduce is the network is fast). CPU is Quad core so I get 25% cpu usage when the bug happens.

With TB2:
1) Send a 500k mail to yourself, quit TB main window while it's sending (so TB quits right after the mail is sent)
2) Start TB again, click on the new mail header, quickly go to calendar pane and do a "reload calendars", go back to mail headers and click on the new mail again.

With my setup I can reproduce it about 50% of the time in the first 5 seconds of starting it up (After that it usually never locks up).

The same test with TB3 does not seem to cause any issue.

Revision history for this message
In , Dmose (dmose) wrote :

Since noone has yet been able to reproduce this bug on the trunk, removing [tbneeds]. If someone does manage to reproduce it, please re-add that keyword!

I wonder if the nsIThreadManager changes that happened after 1.8 are working in our favor.

Revision history for this message
In , Larry (larryoleary) wrote :

I appear to be having this same issue in TB version 2.0.0.19 (20090105) and Lightning 0.9.

I am using one IMAPS server and one CalDAVS calendar. On start-up TB goes to 100% CPU and has no network capabilities. If I remove the CalDAVS calendar and restart TB seems to work again.

So, from my perspective, it appears that Lightning doesn't work with CalDAV as it renders TB useless. I do not have non SSL calendar sources available so I wouldn't know that Lightning worked with non-SSL CalDAV.

Revision history for this message
In , Mteixeira (mteixeira) wrote :

I noticed the SSL issue while reading calendars from a Zimbra server.

Is there anyway I can try the latest trunk using Thunderbird 2.0?

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to comment #94)
> I noticed the SSL issue while reading calendars from a Zimbra server.
>
> Is there anyway I can try the latest trunk using Thunderbird 2.0?

Which latest trunk do you refer to?
As you say "latest trunk" with TB2, I guess you are referring to "Lightning trunk".
This combination won't help you, as the bug is in the core TB code.

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to comment #92)
> Since noone has yet been able to reproduce this bug on the trunk, removing
> [tbneeds]. If someone does manage to reproduce it, please re-add that keyword!
>
> I wonder if the nsIThreadManager changes that happened after 1.8 are working in
> our favor.

I found one more difference. We never added the enhancement from bug 363455 to 1.8 branch, so TB 2 does not have it. That patch is meant to improve handling of blocking sockets. It would be interesting to know if it helps for this bug. I backported the patch, you find it in bug 363455 attachment 365683.
Would someone of you who is still using TB 2 be able to try that patch?

Revision history for this message
In , Dmose (dmose) wrote :

Versions have moved since some of the past comments were written.

The case that's really most critical at this point is ensuring that the nightly versions built from the mozilla-central trunk of Lightning <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/> (next Lightning will ship from 1.9.1, but we don't have builds for that yet) and Thunderbird <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.1/> (Thunderbird 3.0 will ship from here) work well together and don't have this problem.

Note that as of this writing, comm-central hasn't yet branched, though it will in the not-too-distant future.

Revision history for this message
In , Dmose (dmose) wrote :

I've filed bug 481685 to track getting mozilla-1.9.1-based builds of Lightning.

Revision history for this message
In , Dmose (dmose) wrote :

It appears that I was confused, and we do already have 1.9.1 builds of Lightning at <http://ftp.mozilla.org/pub/mozilla.org/calendar/lightning/nightly/latest-comm-central>. Bug 481685 has more details for those who wish to keep up.

Revision history for this message
In , Wim-bos-be (wim-bos-be) wrote :

*** Bug 486818 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Huzaifas (huzaifas) wrote :

(In reply to comment #96)
> (In reply to comment #92)
> > Since noone has yet been able to reproduce this bug on the trunk, removing
> > [tbneeds]. If someone does manage to reproduce it, please re-add that keyword!
> >
> > I wonder if the nsIThreadManager changes that happened after 1.8 are working in
> > our favor.
>
> I found one more difference. We never added the enhancement from bug 363455 to
> 1.8 branch, so TB 2 does not have it. That patch is meant to improve handling
> of blocking sockets. It would be interesting to know if it helps for this bug.
> I backported the patch, you find it in bug 363455 attachment 365683 [details].
> Would someone of you who is still using TB 2 be able to try that patch?

Used this patch for TB 2.
The Cal+mail performance is much better now.
Though i see cpu spikes now and then, but it does not hang my TB.

Revision history for this message
In , Dmose (dmose) wrote :

Thanks, Huzaifa, that's very helpful to know. Setting the version field appropriately, since there's no longer reason to believe that this bug applies to the trunk.

Revision history for this message
In , Misha Koshelev (misha680) wrote :

You know I'm honestly wondering if its the same bug as I just turned mail checking back on and it still works. Maybe whatever the offending event in my calendar simply passed...

Revision history for this message
In , Ssitter (ssitter) wrote :

*** Bug 485649 has been marked as a duplicate of this bug. ***

Revision history for this message
In , bvdbos (bvdbos) wrote :

Perhaps someone could deliver 0.9.1 versions of lightning with the backported fix and the fix from bug 363455 comment 16 ?

Revision history for this message
Nick Lally (nick-lally) wrote :

Binary package hint: thunderbird

I am not sure if this should be a duplicate of bug #110836 or bug #144437 or bug #264993 or bug #150578

I am seeing frequent hangs of thunderbird. I have multiple, shared IMAP accounts configured and several times a day thunderbird will completely hang.
An strace shows only futex() activity. For what it's worth I've attached a gdb backtrace from a hung process.

Revision history for this message
Nick Lally (nick-lally) wrote :
Revision history for this message
In , Lee-scalellc (lee-scalellc) wrote :

(In reply to comment #26 and #17)

> does this happen on Windows only?

No, this is also a problem on TB 2.0.0.21 / Lightning 0.9 on Mac OS X 10.5.7

I've found that once it happens the only solution is to delete the https caldav calendar, quit TB then restart it, add the calendar back in.

I'm connecting to a Google Apps calendar.

Revision history for this message
In , Misha Koshelev (misha680) wrote :

You are lucky. I can't get it to work even if I reimport Google Calendar. A fix would be very appreciated.

Misha

Revision history for this message
In , Mschroeder-mozilla (mschroeder-mozilla) wrote :

*** Bug 477088 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mschroeder-mozilla (mschroeder-mozilla) wrote :

*** Bug 483683 has been marked as a duplicate of this bug. ***

Revision history for this message
In , John Pye (jdpipe) wrote :

The bug is present on TB 2.0.0.23 on Linux Ubuntu 9.04.

I recently suffered serious data loss as a result of this bug. PLEASE PLEASE PLEASE could resources be allocated to applying this patch to current Thunderbird versions?

Revision history for this message
In , Philipp-bugzilla (philipp-bugzilla) wrote :

Given bug 363455, I think we can soon mark this bug as FIXED (by that bug). Leaving a couple of weeks grace period, please report if you can reproduce this on Lightning 1.0pre ONLY. We are aware that this is an issue for 0.9, but the only way to fix it would be to drive forward bug 363455's branch approval.

Revision history for this message
In , Wolfgang Sourdeau (wsourdeau) wrote :

(In reply to comment #111)
> Given bug 363455, I think we can soon mark this bug as FIXED (by that bug).
> Leaving a couple of weeks grace period, please report if you can reproduce this
> on Lightning 1.0pre ONLY. We are aware that this is an issue for 0.9, but the
> only way to fix it would be to drive forward bug 363455's branch approval.

I do not really agree with the above, especially since this bug has been marked as a 1.8-only bug. Even when Thunderbird 3 is released, people using Thunderbird 2 will still be stuck with it because it is ignored by its maintainers.

Revision history for this message
In , Vijitnair (vijitnair) wrote :

I have been able to reproduce this problem with TB 3.0b4 and Lightning 1.0pre (2009-10-27) nightly. OS - WinXP. I see the exact same symptoms - CPU usage goes to 50% (on a dual core machine). It takes 10-15 mins for TB to start. Even after this it is very sluggish. CPU usage keep fluctuating between 50%-20%.

If I disable Lightning, then TB starts up just fine. I have tried this over and over (back and forth) and am quite positive that Lightning is causing this hang.

I only have one IMAP account configured (without SSL). When the hang happens, there is not data traffic as connection to IMAP server is not available.

No WebDAV/CalDAV account has been configured. No google account has been configured.

Revision history for this message
In , Ssitter (ssitter) wrote :

(In reply to comment #113)
I don't think you are seeing this bug because you are not using remote calendars and no secured mail server. Lightning 1.0pre test builds have a known issue that causes the calendar database to grow uncontrolled. The big database causes a similar slowdown. See Bug 521408 -> Bug 494140.

Revision history for this message
In , Kai Engert (kaie) wrote :

I convinced the drivers to approve the backported patch from bug 363455 to the thunderbird 2 branch. I'll check it in soon and nightly builds will contain the fix.

I'd like to ask everyone experiencing this problem for a favor. Please get the nightly build and test it. The test manipulates some core communication code, and the decision makes were a bit scared to include this patch on a old stable branch.

So, we need to make this change really works. I'm looking forward to your understanding and testing.

I'll make another comment, once the builds are ready, with the link to the test builds.

Thanks.

Revision history for this message
In , Kai Engert (kaie) wrote :

Could you please test one of the builds named
  thunderbird-2.0.0.24pre.* (nightly prerelease builds)
from
ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/

and let us know how it works for you?

Is this bug solved for you?
Do you see any functional regressions (new bugs) when using POP3/SSL, IMAP/SSL, SMTP/SSL (or TLS)?

Thanks!

Revision history for this message
In , Tmpjjl (tmpjjl) wrote :

The problem is not corrected for me using the build referenced in Comment #116. Installed and started without trouble. Turned on "Check for new messages at startup" on three email accounts. Restarted and Thunderbird hung up using 100% of one processor. Not all calendars had loaded.

Revision history for this message
In , Bugzilla-axel-naumann (bugzilla-axel-naumann) wrote :

Same here: the bug is not fixed by thunderbird-2.0.0.24pre on win32. Enabling lightning 0.9 will make thunderbird 100% busy during startup, disabling it will revert to the expected startup behavior. Please let us know whether we should check for regressions nevertheless, or whether the patch will be reverted.

Revision history for this message
In , Timporter (timporter) wrote :

This has been the bain of my life recently, because my calendar has grown to a substantial size and takes a while to upload any new appointments.
If I edit an appointment too soon after editing another appointment I get 100% CPU, and worse, a truncated calendar is stored because the DAV upload is aborted!

Interupting this DAV upload process with any other kind of SSL traffic aborts the transfer and causes the deadlock. Worse, terminating the TB process then empties the remote ICS file.
TB 2.0.23
L 0.9

Revision history for this message
Nedenom (nedenom) wrote :

If you happen to be using the Lightning add-on with a remote calendar over https together with an e-mail account over https it could be this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=390036.

Revision history for this message
gf (gf-interlinks-deactivatedaccount) wrote :

Hi Nick,
Thank you for having taken the time to report a problem with Ubuntu and Thunderbird. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner.

You made this bug report in 2009 regarding Thunderbird and there have been many changes in Ubuntu since that time. Your problem may have been fixed with some of the updates.

Could you confirm that this is no longer a problem and that we can close the ticket?

Or, if it still might be a problem, it would help us a lot if you could test it on a currently supported Ubuntu version. When you test it and it is still an issue, kindly upload the updated logs by running only once:
apport-collect 386326

and any other logs that are relevant for this particular issue.

G

Changed in thunderbird (Ubuntu):
status: New → Incomplete
Revision history for this message
Paul White (paulw2u) wrote :

Bug report didn't close due to bug watch
No reply to gf's request for information after 6 months
Previously no comments on report since 2010
Initial report fails to mention Ubuntu or Thunderbird versions but clearly both EOL
Marking "Invalid" to close and reduce backlog

Changed in thunderbird (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.