absence of network connection causes high CPU usage

Bug #182923 reported by Jean-Yves Lefort
40
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mail Notification
New
Undecided
Unassigned

Bug Description

See https://savannah.nongnu.org/bugs/?18367 for the original report in the old Mail Notification bug tracker.

Revision history for this message
Jason Heeris (detly) wrote :

Using Debian Etch, kernel is custom 2.6.22.

As per the suggestion on the old tracker, I compiled "mail-notification" (v 4.1.dfsg.1-2) with debugging symbols and profiled it. The results are attached. Let me know if you need any other info.

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

Thanks for the info, but unfortunately it does not shed more light on the problem. By the way, which mailboxes are you monitoring?

Revision history for this message
Jason Heeris (detly) wrote :

I'm monitoring two GMail, one IMAP, one POP.

Is there an additional level of debugging info that this can be compiled with? Since I'm compiling it all from (debian) source anyway, you're welcome to send me any patches that might be useful purely for debugging.

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

MN prints what it is doing when it is run with the -i option. Maybe you could attach that output. Other things to check:

- Is the program still responsive despite of the CPU usage? Can you interact with the properties dialog?
- Is the problem triggered by a particular mailbox type? Does it also happen if you remove all the mailboxes?

Revision history for this message
Jason Heeris (detly) wrote :

To answer: the program is still responsive, I can open the preferences dialog and recheck mail by clicking on the blinking icon (depending on the settings). Oddly enough, I can only replicate the problem when I have three different types of mailboxes to check: POP3, IMAP and GMail. If there are only two, or, say a POP3 and two GMail, the CPU usage remains normal.

I've attached the output from using "-i", (I've altered my email addresses, though, for privacy).

Revision history for this message
Jens Rantil (jens-rantil) wrote :

Hi, I just wanted to say that I am also experiencing the same problem.

I experienced the problem with the mail-notification package that comes with Ubuntu (4.1.dfsg.1-2ubuntu1) and therefor downloaded the latest stable source, but the problem persists.

My system is:
-------------
$ mail-notification --version
Mail Notification version 5.0
Copyright (C) 2003-2008 Jean-Yves Lefort.

Mailbox backends:
  Evolution yes
  Gmail yes
  IMAP yes
  Maildir yes
  mbox yes
  MH yes
  Mozilla products yes
  POP3 yes
  Sylpheed yes
  Windows Live Hotmail yes
  Yahoo! Mail yes

POP3 and IMAP features:
  IPv6 yes
  SASL no
  SSL/TLS yes

$ uname -a
Linux supraflex 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux
-------------
Linux distro: Ubuntu 7.10 'Gutsy Gibbons'.

The program is still responsive, and I'm using two POP3-mailboxes.

When I boot the system I usually kill the process and start it when I have an internet connection and it works fine as long as I have a connection.

I have not tried to remove the mailboxes.

Revision history for this message
cl (flipper98-deactivatedaccount) wrote :

same problem here, 3 pop accounts monitoring every 15 mins.
i'm on gutsy too and compiled mn 5.0 myself.

Revision history for this message
Jens Rantil (jens-rantil) wrote :

I'm attaching a GDB-backtrace here.

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

This backtrace is not useful since it lacks locations. To generate an useful backtrace, you need to build MN with debugging symbols:

  ./jb configure cflags=-g sysconfdir=/etc
  ./jb build
  sudo ./jb install

Then repeat the gdb procedure. Note that to obtain the backtrace of each thread, it is easier to type:

  threads apply all bt

than to manually switch to each thread like you did.

By the way, the above commands are for MN 5.1.

Revision history for this message
Jens Rantil (jens-rantil) wrote :

Ah. Thanks Jean-Yves.

I'm uploading a new backtrace.

The backtrace does still not show show very much info about mail-notification. Am I interpreting the results correctly that the bug might lie within some other library than mail-notification?

PS. You had a typo in your gdb command above. It should be:
     thread apply all bt

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

Yes, it could be a possibility. Maybe you can try to install the GNOME debugging symbols and get a couple of backtraces. Assuming you run Ubuntu:

  sudo apt-get install gnome-dbg
  export LD_LIBRARY_PATH=/usr/lib/debug:/usr/lib/debug/lib:/usr/lib/debug/usr/lib
  gdb mail-notification
  run
  <when it starts hogging the CPU, hit Control-C>
  thread apply all bt
  cont
  <wait a few seconds>
  thread apply all bt
  cont
  <wait a few seconds>
  thread apply all bt
  cont
  ...

Revision history for this message
Luciano Cavalheiro (lucc) wrote :

Hi,

here is my 1 cent contribution: for me the problem seems to be connected with hibernation. I´ve been using mail-notification for a quite while (about 2 years) along with kernel hibernation suport, and the problem seems to be recurrent. Moreover, it often happens after a return from a long hibertnation period.

After googling a little, I've found this post that seems connected to the mail-notification problem:
http://vvaradhan.blogspot.com/2007/04/eds-hibernate-100-cpu.html

My guess is that there is a problem with gdk_threads_add_timeout(), which possibly builds on g_timeout_add() from glib, and hibernation. Specifically, callback re-scheduling code at glib may be getting lost due to the clock skew cause by hibernating.

Hope this would help somehow.

[]'s

          Lucc

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

Interesting. While it might not explain why people experience the problem on startup, this guess seems compatible with the various backtraces and profiling information thus far received. I will investigate.

Revision history for this message
Jens Rantil (jens-rantil) wrote :

Yes, I've only experienced the problem at startup. While I am never hibernating I cannot speak for it.

I have some more information: the CPU usage seemed to start even though I had an internet connection at the time. When I tried to reproduce the bug the other day I disconnected my internet, restarted mail-notification and the bug did NOT show up. I seems that this bug is not related to whether actually having a functional internet connection or not - am I experiencing another bug?

Another thing, there actually seems to be a timeout to the CPU usage for me. I did not time it, but maybe after ~10 minutes CPU usage goesx down to normal.

Jean-Yes: I followed your instructions to get gnome debug symbols but gdb is still unwilling to give any symbols for gnome packages. I've paused the program multiple times and the backtrace given seems to be the same.

Revision history for this message
Luciano Cavalheiro (lucc) wrote :

Another +/- reasonable possibility would be related to a race condition in the gdk lock implementation as follows:

According to the gdk documentation, calls to timeout callbacks are done within the gdk lock.
However the timeout mail-box checker of mail-notification is holding the gdk lock for too long, e.g. due to the absence of conection it may take a long time to fail in resolving dns queries.
I didn't check how the gdk main loop synchrornization is being implemented, however if it does some level of busy-waiting, it would yield the same observed behavior.

This would be consistent with the problems at both start-up and returning from hibertation.

I've tried a gdb session and either threads where at g_main_context_query() or and pthread mutex lock.
I was not able to reproduce the bug since that, so I can't attach the backtrace.

Revision history for this message
Jean-Yves Lefort (jylefort) wrote :

No. MN never holds the GDK lock for a long time, nor while performing a blocking operation (such as name resolution). If it did, the UI would block, which is something the MN multi-threaded design is meant to avoid.

And even if some bug caused a thread to wait for the GDK lock for longer than intended, it would not cause high CPU usage since the GDK lock is a sleep lock, not a spin lock.

Revision history for this message
soro2005 (soro) wrote :

I also get the problem, and I mostly get it after suspend. Not always, but often. As far as I remember, I've always had it since I first started to use MN 1.5 years ago, and I've had it on all the Ubuntu versions I've installed in the course of the time. I now also have it on a new computer with a fresh Ubuntu 8.04 64bit installation. I currently have 4 IMAP, 1 mbox, and 1 Gmail account monitored by it.

My impression is that the high CPU usage occurs NOT when the network connection is unequivocally down, but while network manager is acquiring a connection, or in one of these in-between states where there is a connection, but it is stale. In my university, for instance, the wireless network is open, but you only get out through a vpn server. If I connect to the network but not to the vpn, that's a pretty sure bet to get MN to race I believe. I don't have it close, though, to try it out. Or some routers which let you connect but don't serve you because of an access list restriction.

Thanks a lot, Jean-Yves, for this very useful program!

Revision history for this message
Mike Auty (mike-auty) wrote :

I can recreate this pretty reliably. I have mail-notification-5.4 with NetworkManager installed but not active (and not present on either the system or session bus). It's checking just one IMAP server, but that's not very important since this happens when there's no network connection (so no traffic's sent or received). If you remove the network connection and start mail-notification (-i) it will flash the error image. Right clicking and asking it to update now will trigger 100% CPU usage (on one core) and it'll just sit there like that.

I've attached the back traces from that, and I'm happy to provide more if they're not right, or it'll help solve this problem. I'm also experiencing the random crashes from bug 248125, and trying to isolate the circumstances that it occurs in. For both of these bugs I haven't found a source code repository or any experimental patches or anything? If there are any patches you'd like trying out, or if there's a repository somewhere to test out the latest versions and see if these problems have gone away, just say. I'm certainly happy to try out any testing or experimental patches that'll help get mail-notification rock solid. Thanks...

Revision history for this message
deli.ds (deli.ds) wrote :

replication on Jaunty with Mail Notification version 5.4 is as simple as Mike says. has there been any progress on this since Feb?

Revision history for this message
Richard Schwarting (aquarichy) wrote :

SUMMARY:
 in mn_client_session_run(), avoiding attempting to close(session.s) when we haven't set session.s (but which is initialised to 0 by memset) prevents mail-notification (according to top) from hogging my CPU when offline.

DETAILS:
I got tired of having this interrupt watching movies (my laptop is only fast enough to handle mplayer or mail-notification hogging the CPU, not both :D), so using ltrace, gdb, and sysprof, I decided that the problem was Evil and Trying to Hide.

So, after instrumenting the code with a few hundred mn_info() statements and a couple sleep(10)s (to find out when the CPU went crazy), and repeatedly trying Mike's reproduction instructions, I found a point at which the CPU starts going 100%.

in mn-client-session.c in mn_client_session_run(), session is set and its property session.s gets indirectly initialised to 0.

session.s might get set by client_session_connect(), if we get that far. When my network is disconnected, and this code is run, I don't get past the check after "addrinfo = resolve(&session);", though, so session.s isn't set, and I go to the end label.

The first thing done at the label end: is a check to see whether session.s >= 0 (which it will be, since memset indirectly initialised it to 0) and we call mn_close(session.s).

Now, it doesn't seem that we remain in mn_close in a loop at all, but after calling close(fd), my CPU will go to 100%. (That is, if in mn_close, if I do:
* sleep(10);
* do status = close(fd) while (status < 0 && errno == EINTR);
* // now the CPU goes up to 100%, not during the first sleep period, but visible during the second, after close()
* sleep(10);

My change was to explicitly initialise session.s to -1 in mn_client_session_run(), so that we wouldn't attempt to close it with the value 0 if we never actually set it to a file.

I'm not yet sure why the CPU would go to 100%. I'll poke into that now.

Revision history for this message
Richard Schwarting (aquarichy) wrote :

After almost 24 hours of running mail-notification, with sleeping and disconnecting, I haven't seen 100% CPU usage again, so I'm attaching a lame patch which might be totally wrong for setting session.s to -1. Ah well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.