gam_server consumes lots of cpu time

Bug #36581 reported by Martin J. Bligh on 2006-03-25
92
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gamin
Won't Fix
Medium
gamin (Fedora)
Won't Fix
Medium
gamin (Ubuntu)
Low
Unassigned

Bug Description

gam_server pegs the cpu and drags the whole laptop to a crawl.

This is easy to reproduce - get a compact flash card full of camera pictures.
Put it in CF-PCMCIA adaptor ($10) and plug it into the PCMCIA slot of a
laptop. Mount the resultant drive (/dev/hda1 for me) and copy the files off.
Now watch top.

That level of cpu usage is unacceptable. You can't remove the gamin package
because gnome depends on it for some inane reason.

Description of problem:
For as-yet undetermined reasons, sometimes gam_server starts using
lots of CPU time (~80% on my Athlon XP 1466 MHz)

Version-Release number of selected component (if applicable):
gamin-0.0.9-1
everything else current rawhide

How reproducible:
Not sure what triggers it; I think this is the second time I've seen
it, but I didn't investigate the first time as I needed to boot into
a new kernel then anyway (yes, I know, I'm a bad tester!)

Additional info:
attaching strace to the process yields these same (I verified with
sort/uniq) two lines over and over in an infinitely repeating loop:

stat64("/home/wes/.kde/share/config/ksmserverrc", {st_dev=makedev(9,
0), st_ino=99361, st_mode=S_IFREG|0600, st_nlink=1, st
_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=1492,
st_atime=2004/09/09-17:55:06, st_mtime=2004/09/09-16:20:2
0, st_ctime=2004/09/09-16:20:20}) = 0
stat64("/home/wes/.kde/share/config/kioslaverc", {st_dev=makedev(9,
0), st_ino=163182, st_mode=S_IFREG|0600, st_nlink=1, st
_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=92,
st_atime=2004/09/11-01:37:47, st_mtime=2004/08/25-23:49:21,
 st_ctime=2004/08/25-23:49:21}) = 0

I then sent it a SIGHUP and it went back to normal. Could be
coincidence; does gam_server actualy restart on that signal? (I
stupidly didn't think to leave strace attached when I did it)

I will set up gam_server to run with GAM_DEBUG and --notimeout and
all that; hopefully I can catch the behavior again.

Ok, I'm seeing it again. Unfortunately after two days of trying to
catch it in debug mode, I had rebooted the system for updates and
forgot to put gam_server in debug mode again :(

This time it's looping against the following two (different than
before) files:

stat64("/home/wes/.kde/share/config/knotify.eventsrc",
{st_dev=makedev(9, 0), st_ino=7314, st_mode=S_IFREG|0600, st_nlink=1,
st_uid=500, st_gid=500, st_blksize=4096, st_blocks=8, st_size=1085,
st_atime=2004/09/15-19:41:19, st_mtime=2004/08/29-20:53:09,
st_ctime=2004/08/29-20:53:09}) = 0
stat64("/home/wes/.kde/share/config/kpilot_vcalconduitsrc",
{st_dev=makedev(9, 0), st_ino=163235, st_mode=S_IFREG|0600,
st_nlink=1, st_uid=500, st_gid=500, st_blksize=4096, st_blocks=8,
st_size=57, st_atime=2004/08/25-23:49:33,
st_mtime=2004/08/25-23:49:33, st_ctime=2004/08/25-23:49:33}) = 0

I see there is a new glibc in rawhide today, so I'm going to install
it and hope it helps.

I am watching gam_server on rawhide 9-23-2004 system and it is using
up around 50% of the CPU!

excellent, launch a gdb /usr/libexec/gam_server , attach with the
PID of the process, look at what's happening and report. Knowing
that you look at it or a syscall trace isn't that useful !

Daniel

I'm seeing it here as well. At most, gam_server is eating as much as
50-70% of the cpu.

http://www.gnome.org/~veillard/gamin/debug.html#Debugging1

  debug the problem and provide a trace. Also make sure you
have the latest version installed. What I said in comment #3
is still valid.

Daniel

Ok, caught it again, this time on gamin-0.0.14-1 (which is current
rawhide AFAIK)

Using your fancy new SIGUSR2 debug trick, I get a quickly growing
file with this line repeated forever:

node_remove_subscription()

It's nice that another SIGUSR2 turns it off again, because it was
threatening to fill my disk :O

Interestingly, now strace shows nothing, nada, nichts, rien. (well
it shows the debug prints if that's enabled). Different problem
causing the same high CPU usage, or just difference due to code
changes you've made?

Latest gdb backtrace, this time with debuginfo installed:

#0 0x00135e42 in __i686.get_pc_thunk.bx ()
from /usr/lib/libglib-2.0.so.0
#1 0x00155944 in g_node_is_ancestor (node=0x8123018,
descendant=0x8058498) at gnode.c:413
#2 0x0804af3a in gam_tree_remove (tree=0x80583c8, node=0x8123018) at
gam_tree.c:144
#3 0x0804b7d3 in remove_directory_subscription (node=0x8123018,
sub=0x811c4e8) at gam_poll.c:507
#4 0x0804cd56 in gam_poll_consume_subscriptions () at gam_poll.c:918
#5 0x0804fc64 in gam_dnotify_consume_subscriptions_real (data=0x0)
at gam_dnotify.c:212
#6 0x0014e848 in g_idle_dispatch (source=0x8129f00,
callback=0x8123018, user_data=0x8058498) at gmain.c:3802
#7 0x0014b4fb in g_main_context_dispatch (context=0x8057ee8) at
gmain.c:1942
#8 0x0014cf82 in g_main_context_iterate (context=0x8057ee8, block=1,
dispatch=1, self=0x8053018) at gmain.c:2573
#9 0x0014d22f in g_main_loop_run (loop=0x8059908) at gmain.c:2777
#10 0x0804aa28 in main (argc=1, argv=0xfefffa54) at gam_server.c:330
#11 0x001b7b03 in __libc_start_main (main=0x804a8f7 <main>, argc=1,
ubp_av=0xfefffa54, init=0x8050304 <__libc_csu_init>,
    fini=0xfefff9e0, rtld_fini=0xfefffa54, stack_end=0xfefffa4c)
at ../sysdeps/generic/libc-start.c:209
#12 0x08049fa1 in _start ()

Stepping through it in ddd/gdb, I notice that in gam_tree_remove, the
g_node_is_ancestor sanity check seems to be consistently failing. To
be specific, in g_node_is_ancestor, descendent->parent seems to
always be null (only data and next are non-null). Somewhere the
trees aren't getting built right, or are being systematically
corrupted...

I'm not resetting gam_server for the moment; email me if you want to
telnet in and gdb it or X ddd out to your host to check it out, since
it seems to be difficult to reproduce...

I'm about to go on the road... this helps, but you should not
wait from me.

  thanks,

Daniel

I again had the same problem too....not clear what triggers it. I was installing
the new kernel rpm and it was taking for ever to install....when I did top I
saw gamin taking all the cpu/ This has the potential of causing serious
hangs.

I also got this problem using Fedora Core 3 test 3 on an ProLiant
DL 145 (AMD64, Opteron) using x86_64, but gam_server always uses here
99,9% CPU constant. This causes a default load of ~ 1.5 - very bad.

NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3440 root 25 0 6204 2268 4836 R 99.9 0.0 18:35.44 gam_server

[root@fc3-test ~]# ps aux | grep gam
root 3440 63.2 0.0 6204 2268 ? R 13:40 21:12 /usr/libexec/gam_server
root 5081 0.0 0.0 42308 760 pts/3 S+ 14:14 0:00 grep gam
[root@fc3-test ~]#

[root@fc3-test ~]# rpm -q gamin --qf '%{name}-%{version}-%{release}.%{arch}\n'
gamin-0.0.14-1.i386
gamin-0.0.14-1.x86_64
[root@fc3-test ~]#

This problem really should be solved before the final release. This
issue maybe also should be marked as possible blocker *suggesting*?!

gamin-0.0.14-1, current rawhide, x86_64 dual opteron.
Output from SIGUSR2 shows it's looping through the following list of
files. Looks like a poll_file loop is stuck. Quick source read
points at gam_poll_scan_directory_internal and the for loop:

  for (l = children; l; l = l->next) {

Poll: poll_file for
/usr/share/applications/gnome-accessibility.desktop called
 at 1097770113 delta 0 : 0 Poll: poll_file
/usr/share/applications/gnome-accessibility.desktop unchanged
1097700841 0 : 1097700841 0
Poll: poll_file for
/usr/share/applications/redhat-neat-control.desktop called at
1097770113 delta 0 : 0
Poll: poll_file /usr/share/applications/redhat-neat-control.desktop
unchanged
1096989180 0 : 1096989180 0
Poll: poll_file for
/usr/share/applications/redhat-rhn-up2date-config.desktop called
 at 1097770113 delta 0 : 0
Poll: poll_file
/usr/share/applications/redhat-rhn-up2date-config.desktop unchanged
1095979273 0 : 1095979273 0

And gdb confirms this:

(gdb) bt
#0 0x0000002a95721945 in ?? ()
#1 0x0000000000404701 in poll_file (node=0x523b40) at stat.h:366
#2 0x0000000000404b46 in gam_poll_scan_directory_internal (dir_node=0x0,
    exist_subs=0x0, scan_for_new=1) at gam_poll.c:446
#3 0x0000000000404f33 in gam_poll_scan_callback (data=0x5303d0)
    at gam_poll.c:550
#4 0x00000036cc52942b in ?? ()
(gdb) list gam_poll.c:446
441 }
442 children = gam_tree_get_children(tree, dir_node);
443 for (l = children; l; l = l->next) {
444 node = (GamNode *) l->data;
445
446 fevent = poll_file(node);
447
448 if (gam_node_is_dir(node) &&
449 gam_node_has_flag(node, FLAG_NEW_NODE) &&
450 gam_node_get_subscriptions(node)) {

(gdb) print l
$1 = (GList *) 0x51ea60
(gdb) print l->next
$2 = (GList *) 0x523920
(gdb) print l->next->next
$3 = (GList *) 0x51ea78
(gdb) print l->next->next->next
$4 = (GList *) 0x51ea60
(gdb) p children
$5 = (GList *) 0x53ddd0
(gdb) p *(GamNode *)l->data
$6 = {path = 0x523c10
"/usr/share/applications/redhat-neat-control.desktop", subs = 0x0,
data = 0x530210, data_destroy = 0x404380 <gam_poll_data_destroy>,
flags = 0, node = 0x515f78, is_dir = 0}

This list is not NULL terminated.

Thanks a lot ! This is what I was afraid of. The gam_server is not
multithreaded anymore, so such corruption should not be the result
of unguarded reentrancy. The children list is obtained by
children = gam_tree_get_children(tree, dir_node); which does

    GList *list = NULL;
    [...]
    for (i = 0; i < g_node_n_children(node); i++) {
        list = g_list_prepend(list, NODE_DATA(g_node_nth_child(node, i)));
    }

gam_tree_get_children() cannot loop, it should return a correct list.

the loop in gam_poll_scan_directory_internal() just emits event and
should not modify the list which is built as a temporary structure,
l or related list data are not passed down to the recursive call to
gam_poll_scan_directory_internal()

I'm puzzled that we end-up with some corruption there. Reading
g_list_prepend() code I don't see how this could fail. Except running
gam_server under valgrind to try to track a random memory access
error I don't see how to chase this in a deterministic way.
Annoying, very annoying !

Daniel

gam_server goes nuts here too with rawhide.

Yesterday I looked briefly about at all list manipulation areas,
but didn't see anything glaring either. Doesn't see like random memory
access, however. The l->next pointers are valid, just creating a loop.
I think the directory was being changed while this happened (during
daily rawhide update). Is there any async event which could change
the tree? AFAICT, gam_tree_get_children() expects the parent node to
remain queiscent. You said it's not multithreaded, how about signal
driven? Any way for gam_tree_add() to happen during a
gam_tree_get_children so that the GNode sibling list changes while
building the GList?

The only asynch event is the dnotify signal. It is handled
by dnotify_signal_handler() which pushes the file descriptot number
onto a GQueue and does a write to a local pipe. The pipe is hooked
to the mainloop and pure synchronous processing should be done from
there.
There is a comment that GQueue changes is not signal safe and something
else should be used. That's the only uncertaintie I can detect in the
code, assuming there is only the main application thread running. The
fact that the problem seems to occur frequently on your fast SMP box
makes me wonder if there isn't something which still generate some
kind of reentrancy.
What puzzles me is that even if the node children list was modified
during gam_tree_get_children() the list might get duplicate or wrong
data pointers, but the l->next pointers should still be correct...

Daniel

I rechecked the whole code path for
  children = gam_tree_get_children(tree, dir_node);
and how it is walked. I still can't understand why those data which
are local variables of the subroutines could generating a loop or
modified to that effect.
But to try to make progress I added sample trick code detecting
loop in the children list within gam_poll_scan_directory_internal()
to raise an error and break the loop if this happens.
I released a 0.0.15 version with that workaround
   http://www.gnome.org/~veillard/gamin/sources/
I would be very interested in feedback about this for those who had
troubles with 0.0.14 looping in their environment.
I don't consider the problem fixed though, it is a workaround until
I fully understand the problem.

Daniel

0.0.15 looped for me today. Since I did not have the -debuginfo
package, and yum did not like me, I can not provide further information.

I generated gamin-0.0.16 after fixing a couple of problems including
one in tree handling. I hammered on it seriously and could not reproduce
any kind of problem with it. I would very much appreciate if the
people who managed to get the looping effect could upgrade to 0.0.16
and report if they manage to reproduce the problem again:
   http://www.gnome.org/~veillard/gamin/sources/

  thanks,

Daniel

gamin 0.0.16 still loops

I can confirm this too but for the life of me have no idea what triggers it.
I am running smp kernel with hyperthreading and using kde/kdm as gui.

*** Bug 137439 has been marked as a duplicate of this bug. ***

Created attachment 105898
trace

Created attachment 105899
debug

Comment on attachment 105899
debug

May contain sensitve information, please respect privacy.

Created attachment 105934
another gdb trace

well you would need the debuginfo for the gdb trace.
g_pattern_match is called indirectly from poll_file() or
node_add_subscription() or node_remove_subscription()
Since your log seems to indicate it is looping on
node_remove_subscription. This again seems to indicate an
error looping on a corrupted node list that time a children
list within a directory...

Daniel

I've got the same problem with FC3 final. I am ripping CDs with
grip, running Kdevelop and listening to noatun when it happens.

I've running kernal 2.6.9-1.667smp. This is the first time I've seen
it do this and I've been watching processes quite closely because I
had an issue with artsd going nuts. (Sound problem.)

I just killed gamserver in top. I think there were 2 gam_servers
running. I killed the first PID and a second one jumped to the top
of the list briefly. It had a different PID.

Let me know if there is anything I can do to help.

As with comment#26, I have not seen it previously (FC3-RC5)

FC3 final, x86_64 (Sun W1100z), 2.6.9-1.667 (single CPU)

Usage: KDE desktop (kontact, konqueror, kdevelop etc. K3b --ripping
four sets of FC3 CDs).

/usr/libexec/gam_server can eat up to 99% CPU
When switching on debug (kill -s SIGUSR2 pid), I see this:

# tail -f /tmp/gamin_debug_phCf
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
node_remove_subscription(▒(*)

I will watch it closely from now on.

Same thing here.

FC3 final, i686_32, kernel-2.6.9-1.667
Using KDE desktop.

I noticed that since I updated to FC3 final, many of my email
messages end up being duplicated (I am using kmail with maildir
format mailboxes).

Can it be related to problems with gamin ?

This happens to me if I run more than 1 kmail (say, on 2 different
machines using imap). In this case, nothing to do with gamin.

I have the same setup. Could it be KDE related?

I do not run KDE, and I do see this very rarely (not at all during the
last two (three?) weeks).

But if I recall correctly, k3b (which is about the only kde program I
use) liked to trigger it.

0.16 fixed it for me, thanks very much

I just released 0.0.17 where I have tried to cope with possible loops
in the second place where gam_tree_get_children() is called, and
also made more changes and checkings in that function too.
  http://www.gnome.org/~veillard/gamin/sources/
I would appreciate if people having troubles could try that version
and report !

  thanks,

Daniel

Could you release RPMs in rawhide ?

Thanks,

Philippe

They are built and may show up within a day,

Daniel

I am using GNOME in FC3 Final, and I notice that gam_server is using
99-100% of CPU after I just ran K3B in GNOME.

try 0.0.17 see comment #34,

Daniel

Just a quick "me too."

FC3 release, fully updated as of this post. Running KDE, KMail, 2
instances of Konqueror as file manager, 2 idle command shells and
Firefox 1.0.

Most recent action: some file management stuff (moving them around).
 Also gedit.

I'm not sure where to go to get "rpms in rawhide" (comment #35) but
I'll look for it and install it if I find it.

> I'm not sure where to go to get "rpms in rawhide"
http://fedora.redhat.com/download/updates.html
This page explains the different stages of development and updates
after a release of Fedora Core has gone out:
  - Fedora updates
  - Proposed Fedora (aka testing)
  - Development (aka rawhide)

The 0.0.17 version of gamin is now in Fedora updates
http://download.fedora.redhat.com/pub/fedora/linux/core/updates/3

After upgrading to 0.0.17, I no longer see big hikes in CPU usage
like before.

However, I just noticed that there has also been messages like this
on my syslog for a while:
# grep gam /var/log/messages
Nov 16 20:10:45 foo kernel: gam_server[5241]: segfault at
0000000000000051 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4
Nov 17 07:16:11 foo kernel: gam_server[5844]: segfault at
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4
Nov 17 08:02:32 foo kernel: gam_server[14902]: segfault at
000000000000000a rip 00000000004038a7 rsp 0000007fbfffd278 error 4
Nov 17 11:07:06 foo kernel: gam_server[25002]: segfault at
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd3a8 error 4
Nov 17 23:03:08 foo kernel: gam_server[4699]: segfault at
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4
Nov 18 06:58:10 foo kernel: gam_server[3431]: segfault at
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4
Nov 18 07:03:00 foo kernel: gam_server[3722]: segfault at
0000000000000008 rip 00000000004038a7 rsp 0000007fbfffd2c8 error 4
Nov 18 07:05:08 foo kernel: gam_server[4694]: segfault at
0000000000000050 rip 0000002a9557b920 rsp 0000007fbfffe480 error 4
Nov 19 22:09:06 foo kernel: gam_server[3447]: segfault at
000000000000005c rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4
Nov 19 22:09:26 foo kernel: gam_server[3863]: segfault at
0000000000000013 rip 00000000004038a7 rsp 0000007fbfffd2c8 error 4
Nov 19 22:38:32 foo kernel: gam_server[3935]: segfault at
00000015000003f8 rip 0000002a9557b920 rsp 0000007fbffff6c0 error 4
Nov 19 23:30:51 foo kernel: gam_server[12659]: segfault at
00000006000003f8 rip 0000002a9557b920 rsp 0000007fbffff700 error 4
Nov 20 09:13:34 foo kernel: gam_server[3445]: segfault at
0000000000000061 rip 00000000004038a7 rsp 0000007fbfffd3f8 error 4
Nov 20 20:47:23 foo kernel: gam_server[3411]: segfault at
0000000000000047 rip 0000002a9557b920 rsp 0000007fbfffe470 error 4
Nov 20 22:20:34 foo kernel: gam_server[19964] general protection
rip:4046bb rsp:7fbfffe640 error:0
Nov 20 22:20:45 foo kernel: gam_server[7190]: segfault at
00000060000003f8 rip 0000002a9557b6b1 rsp 0000007fbffff7c0 error 4
Nov 20 22:22:45 foo kernel: gam_server[10136]: segfault at
0000000000000066 rip 0000002a9557c3a4 rsp 0000007fbffff6a0 error 4

These have not gone away with 0.0.17.

I'm using gamin-0.0.17-1.FC3 and found this morning that gam_server
was using 100% of one CPU on my dual-CPU machine. I do not see the
segfaults Philippe posted though.

for crash or 100% cpu usage on 0.0.17 please follow the informations
at http://www.gnome.org/~veillard/gamin/debug.html to try to
provide feedback on what is happening.

Daniel

*** Bug 140701 has been marked as a duplicate of this bug. ***

So this is interesting, I have a huge (5K+ files), unorganized directory of
photographs on /mnt/ata0/www-images, and I don't have any .gamin config, so it's
polling since it's in /mnt/*, and the log file viewed after using the SIGUSR2
signal confirms that.

So I open up Konqueror (I'm using KDE) on that directory and see gam_server
using 20% of one CPU, it's constantly polling. Then I open open one of the
photos with Kuickshow, and use the page up/page down keys to move back and forth
between images. Now gam_server's using 40% of one CPU. I open another Kwickshow
and repeat the previous step and gam_server's utilization goes to 79%. Another
Kwickshow, another 20% utilization.

Is this just an optimization issue? I'm assuming Konqueror and Kwickshow are
both gamin clients. Could gam_server be polling the same directory once for each
client?

Oops, I meant to say that the utilization goes up 20% for each gamin client, I
did not mean to say that it went from 40% to 79%, it always went up 18-20%.

w.r.t. comment #45 and #46, this is totally unrelated to the current bug,
so please open a new bug report if you want feedback on this !

Daniel

OK, after a few days without crash or 100% CPU usage, it happened
again.

gamin-0.0.17-1.FC3
kernel-2.6.9-1.681_FC3 x86_64

KDE-3.3.1 (compiled from sources)

CPU usage goes to the roof, freeze solid (cannot get a console or ssh
into the box, ping responds though), then after 5 minutes goes back
to normal (at this time, 'top' still shows a load of 26.00).

Post-mortem (post-freezem actually) diagnosis:

1. /var/log/mesaages
Nov 26 10:43:26 mybox kernel: oom-killer: gfp_mask=0x1d2
Nov 26 10:43:30 mybox kernel: DMA per-cpu:
Nov 26 10:43:30 mybox kernel: cpu 0 hot: low 2, high 6, batch 1
Nov 26 10:43:30 mybox kernel: cpu 0 cold: low 0, high 2, batch 1
Nov 26 10:43:30 mybox kernel: Normal per-cpu:
Nov 26 10:43:30 mybox kernel: cpu 0 hot: low 32, high 96, batch 16
Nov 26 10:43:30 mybox kernel: cpu 0 cold: low 0, high 32, batch 16
Nov 26 10:43:30 mybox kernel: HighMem per-cpu: empty
Nov 26 10:43:30 mybox kernel:
Nov 26 10:43:30 mybox kernel: Free pages: 1516kB (0kB HighMem)
Nov 26 10:43:30 mybox kernel: Active:181 inactive:236594 dirty:0
writeback:235967 unstable:0 free:379 slab:13567 mapped:2424
pagetables:2311
Nov 26 10:43:30 mybox kernel: DMA free:4kB min:12kB low:24kB
high:36kB active:0kB inactive:9788kB present:16384kB
Nov 26 10:43:30 mybox kernel: protections[]: 0 0 0
Nov 26 10:44:06 mybox kernel: Normal free:1512kB min:1004kB
low:2008kB high:3012kB active:724kB inactive:936588kB
present:1031552kB
Nov 26 10:44:52 mybox kernel: protections[]: 0 0 0
Nov 26 10:45:13 mybox gpm[2410]: *** info [mice.c(1766)]:
Nov 26 10:47:03 mybox kernel: HighMem free:0kB min:128kB low:256kB
high:384kB active:0kB inactive:0kB present:0kB
Nov 26 10:47:07 mybox gpm[2410]: imps2: Auto-detected intellimouse
PS/2
Nov 26 10:47:07 mybox kernel: protections[]: 0 0 0
Nov 26 10:47:08 mybox kernel: DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4kB
Nov 26 10:47:08 mybox kernel: Normal: 108*4kB 3*8kB 4*16kB 1*32kB
1*64kB 7*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1512kB
Nov 26 10:47:09 mybox kernel: HighMem: empty
Nov 26 10:47:09 mybox kernel: Swap cache: add 276677, delete 40720,
find 4408/4496, race 0+0
Nov 26 10:47:09 mybox kernel: Out of Memory: Killed process 23735
(gam_server).

FWIW, I customized /etc/sysctl.conf with these lines added:
# Control shared memory size
# (added for PostgreSQL)
kernel.shmall = 134217728
kernel.shmmax = 134217728

# Do not overcommit memory
vm.overcommit_memory = 2

2. in $HOME/.xsession-errors:
gam_poll_scan_directory_internal(/home/user) loop detected
gam_poll_scan_directory_internal(/home/user) loop detected
gam_poll_scan_directory_internal(/home/user) loop detected
...
465 lines like this

Hope this helps

Philippe

Daniel, the SIGUSR2 trick is very nice, but as comment #6 point out,
it is *very* verbose (can fill 100MB in minutes), so it will exhaust
any reasonable partition pretty quickly, and the fact that it writes
in /tmp means bad consequences for the system when this fills up.

So it is not currently usable as a way to track gamin permanently. I
have to turn it on only for short periods of time, and sure enough,
these are not the times when things go bad.

I suggest two improvements:
  1- Have the log directory configurable (defaults to /tmp)
  2- Configure a MAX_SIZE for a log file, after which logs are
rotated, possibly with compressing old ones automatically.

Thanks,

Philippe

Okay, I have tried to track and change all usage of GList which may
potentially result in the loop we are seeing. Basically the analysis
is that list element are freed, put back in the free pool, reused, and
then the pointer from the location where it was freed is modified.
That's the only explanation I can find to get a loop in the lists.
As a result I generated a new version with a lot of new cleanups
maybe that time I got it for good. Version 0.0.18 is available
as usual from the download page
   http://www.gnome.org/~veillard/gamin/downloads.html

w.r.t. comment #49, the goal really is to find the bug, I don't
think gathering days of logs is a good idea :-\ and since it is
a race condition apparently (but how it is single-threaded) adding
the debugging code is likely to just avoid the problem.

Daniel

Thanks for the quick response.

> Version 0.0.18 is available
Downloaded, built x86_64 RPM and installed.

Side note: in the changelog of src.rpm, there is no entry for
0.0-17.1

> w.r.t. comment #49, the goal really is to find the bug, I don't
> think gathering days of logs is a good idea :-\ and since it is
> a race condition apparently (but how it is single-threaded) adding
> the debugging code is likely to just avoid the problem

Well, currently gathering *any* data is pretty much impossible, given
how fast it writes in /tmp. User has to turn debug off in a hurry, so
the SIGUSR2 feature becomes sort of useless for users to help you.

Besides, the goal of debug is to find any bug, not only this one I
think.

Cheers,

Philippe

OK, it happens again as I speak

gamin-0.0.18-1 consumes all CPU
FC3 x86_64
kernel-2.6.9-1.681_FC3 x86_64
Using KDE

1. Top
top - 17:55:19 up 5 days, 6:29, 6 users, load average: 1.21, 0.62,
0.24
Tasks: 94 total, 2 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.7% us, 1.3% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi,
0.0% si
Mem: 1024700k total, 980336k used, 44364k free, 159164k
buffers
Swap: 1534168k total, 808k used, 1533360k free, 437340k
cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23049 foobar 25 0 6824 2672 5060 R 97.2 0.3 2:52.77
gam_server
 5144 root 15 0 213m 122m 93m S 1.7 12.2 36:01.05 X
19834 foobar 16 0 154m 23m 150m S 0.7 2.3 2:06.04 kdeinit
    1 root 16 0 4736 616 4524 S 0.0 0.1 0:02.16 init
    2 root 34 19 0 0 0 S 0.0 0.0 0:00.39
ksoftirqd/0
    3 root 5 -10 0 0 0 S 0.0 0.0 0:03.10 events/0
    4 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 khelper
    5 root 15 -10 0 0 0 S 0.0 0.0 0:00.00 kacpid
   42 root 5 -10 0 0 0 S 0.0 0.0 0:00.00
kblockd/0

2. kill -SIGUSR2 23049: the debug file is spitting this:
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
Queue Full
769 lines like this, it seems to print them by little groups every
few seconds.

3. Syslog:
$ sudo grep gam /var/log/messages
Nov 29 14:14:29 lw1 kernel: gam_server[3353]: segfault at
00000001000003f8 rip 0000002a95690920 rsp 0000007fbffff6e0 error 4
Dec 1 16:56:22 lw1 kernel: gam_server[3903]: segfault at
00000006000003f8 rip 0000002a956906b1 rsp 0000007fbffff5b0 error 4

Cheers,

Philippe

Queue Full is a report from the signal handler. There is more
than 500 kernel events stacked for processing.
Run gam_server under gdb, possibly started from a vt console
to try to find why there is a segfault or where it is looping.
  http://www.gnome.org/~veillard/gamin/debug.html

I will need a stack trace, this should be possible to find in
your case.

Daniel

Stack trace.

note: when gamin-0.0.18 was built (from the src.rpm), it was linked
with the copy of glib-2.0 that I compiled from sources together with
my KDE.

$ gdb gam_server 23049
GNU gdb Red Hat Linux (6.1post-1.20040607.43rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...gam_server: No
such file or directory.

Attaching to process 23049
Reading symbols from /usr/libexec/gam_server...Reading symbols
from /usr/lib/debug/usr/libexec/gam_server.debug...done.
Using host libthread_db library "/lib64/tls/libthread_db.so.1".
done.
Reading symbols from /opt/kde3.3.1/lib64/libglib-2.0.so.0...done.
Loaded symbols for /opt/kde3.3.1/lib64/libglib-2.0.so.0
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
0x0000002a956905b7 in g_list_last ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
(gdb) where
#0 0x0000002a956905b7 in g_list_last ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
#1 0x0000002a9569069d in g_list_append ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
#2 0x0000000000403fc1 in gam_tree_get_children (tree=0x546000,
root=0x524a40) at gam_tree.c:265
#3 0x00000000004043ba in remove_directory_subscription
(node=0x51b340, sub=0x51d600) at gam_poll.c:559
#4 0x00000000004056b3 in gam_poll_consume_subscriptions () at
gam_poll.c:998
#5 0x0000000000408a73 in gam_dnotify_consume_subscriptions_real
(data=0x546000) at gam_dnotify.c:212
#6 0x0000002a9569423a in g_main_context_dispatch ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
#7 0x0000002a95696617 in g_main_context_iterate ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
#8 0x0000002a956969aa in g_main_loop_run ()
from /opt/kde3.3.1/lib64/libglib-2.0.so.0
#9 0x00000000004037c2 in main (argc=0, argv=0x0) at gam_server.c:340
#10 0x0000002a958314ca in __libc_start_main ()
from /lib64/tls/libc.so.6
#11 0x0000000000402b3a in _start ()
#12 0x0000007fbffff888 in ?? ()
#13 0x000000000000001c in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000007fbffffad1 in ?? ()
#16 0x0000000000000000 in ?? ()

>note: when gamin-0.0.18 was built (from the src.rpm), it was linked
>with the copy of glib-2.0 that I compiled from sources together with
>my KDE.

Just to clarify, this is glib-2.4.7.

Hummm ... If you recompile stuff by yourself, this may raise problems
that others using the pristine distro will not get. On the other hand
you then have the opportunity to rebuild glib with the following
configure flags which would help finding the exact location of the
corruption:

   --disable-mem-pools --enable-gc-friendly

then run again under gdb or valgrind. The problem comes from a
corrupted memory pool.

Daniel

I'm getting the same backtrace as comment #54, i386 up to date FC3 as of 1 Dec
except for having gamin 0.18 installed.

Created attachment 107793
yet another gdb backtrace

I tried the SIGUSR2 trick but absolutely nothing happens. gamin seems to be
sitting stuck yet using CPU...

This also happens in Nahant-1 which has 0.0.9-1. I'm running KDE, have
done little more than read email (evolution then later kmail), report
bugs (epiphany) and do konsole stuff.

The system is an Evectra Pentium III 600EB (so no SMP, no
hyperthreads) 128 Mb RAM (so lots of swapping).

There doesn't seem to be a lot for me to add other than the datapoint
RHEL4 is impacted. I was about to trash the system and was scouting
round for valuables when I noticed it was somewhat sluggish.

Note that I have a similar report (looping) on Evolution. When I saw
Evolution was hogging the CPU this was pretty active too, and it may
be that the real problem was Gamin but I blamed E because it was the
most active at the times I checked.

w.r.t. comment #57 and #58

gam_tree_get_children basically does
    list = NULL;

     for all children
         list = g_list_append(list, children_data);

and the stack trace shows the generated list gets corrupted !
The error is somewhere else, this can only be rationally explained
if the list memory pool gets corrupted, and as I said in #56
running a specifically compiled glib version is the best way to
reproduce the problem and catch it when it happens not the side
effect.

Ouch. I can rebuild glib but I can't constantly run gam_server under
gdb/valgrind. I can attach valgrind/gdb to gam-server after I notice
it's gone loco though.

The set up is that I am administering have multi user machines so I
only see the aftermath - I don't know what steps are actually causing
this nor can I force users to run things in a debug mode all the time.

Sitsofe, I assume you are running a normal Fedora Core kernel and
the glib2 also coming from Fedora Core, right ?
Did you reboot the machine after upgrading to 0.0.18 to be sure that
no process used an old gam_server.
I'm just trying to be 100% sure I'm not chasing something related to
inotify or a different release.

Daniel

Daniel, yes I am running a normal FC3 kernel (kernel-2.6.9-1.681_FC3 )
and normal glib2 (glib2-2.4.7-1 ). No I didn't reboot after upgrading
to gamin 0.0.18 but I know it probably isn't an old gam_server
because it's start time is Dec02 which is after the RPMs install time
of Tue 30 Nov 2004 15:52:50 .

However if you are unconvinced I suppose I can reboot all the machines
and wait too see if this happens again...

Can people try 0.0.19 that I uploaded at
  http://www.gnome.org/~veillard/gamin/downloads.html
I did yet another pass at checking all GList usage which could
lead to any kind of List pool corruption, I minimized the set of
GList API from gam_server to a very minimal set, I added a copy
of GList implementation directly in the gam_server disabling
memory pool, poisonning freed list items.
I have been hammering it for a couple of hours, and I'm still
unable to reproduce any crash or loop.
Please try 0.0.19 and report, as I'm running dry over ideas concerning
what is happening, or how to solve it,

  thanks,

Daniel

Sure thing. I'm going away for a few days so it will be mid next week
before I get back to you on this. Do you want machines to rebooted
before reports are submitted back?

My one and only thought on this is are people using any binary
drivers? I don't think I've seen this 100% CPU usage happen (yet) on
a machine without nvidia binary drivers on it...

I have nvidia drivers but the kernel module was built in my machine.

I have seen this on a machine without nvidia drivers. I do have vmware
drivers (occasionally, not always), but I can not remember any
coincidence between vmware being loaded and gam eating CPU time.

Sitsofe, killall gam_server as root after the upgrade would do.
and it's unrelated to kernel drivers,

Daniel

> Can people try 0.0.19
Updated. Now testing.

I noticed that the /usr/libexec/gam_server process is *not* killed
upon exit from my (KDE) graphical session. In other words, logging
in/out repeatidly ends up using the very same process through
different graphical sessions. Shouldn't _all_ user processes started
with a graphical session be killed upon exit ?

I will compile glib with
 --disable-mem-pools --enable-gc-friendly
in the next few days with KDE-3.3.2.

> Hummm ... If you recompile stuff by yourself, this may raise
problems
> that others using the pristine distro will not get.
Beside the fact that "pristine" distro users _are_ getting it, this
is _good_ as you noticed because non-standard build may help
find/troubleshoot more bugs.

And since you mention the word "pristine", let me tell you that if
Redhat/Fedora's implementation of KDE were indeed pristine and not so
crippled (i.e downgraded menus/apps/configs resulting in a poor man's
common denominator with Gnome, inheriting in the process some of its
_bad_ user-interface-guidelines like the infamous
double-click-by-default), you would have more users running your
distro. Many KDE users have gone away from Redhat at the time
Bluecurve and the bright "unified-desktop" idea came out. Since I
always compile KDE from sources, it does not affect me that much but
I am a somewhat rare case of KDE enthusiast on Fedora.

Last nut not least, did you consider using the C++ STL to replace
glib in gamin ?

Cheers,

Philippe

> I noticed that the /usr/libexec/gam_server process is *not* killed
> upon exit from my (KDE) graphical session.

  the server exits after 30 seconds without client connection.

> I will compile glib with
> --disable-mem-pools --enable-gc-friendly
> in the next few days with KDE-3.3.2.

  Not needed, 0.0.19 has it's own copy of the GLib list code

> pristine and KDE

  Not my business, I don't use it, my point is reproductability
of report. I learnt for example that depending on the automake
version something as simple as gamin 0.0.18 get compiled completely
differently. Pristine mean that the bug report is valid of others
using the distro.

> did you consider using the C++ STL to replace glib in gamin

  the client side does *NOT* use glib. The client side of FAM
was using C++ STL forcing all client to load the library :-(
that's one of the reasons we rewrote the package altogether.

The server side is a standalone program, based on glib because
    - we know glib well
    - I don't want to code in C++

Daniel

I have not seen a problem since installing 0.0.17 (see comment #39).
Since I have nothing further to report, I am removing myself from the
CC: list.

I've just been running 0.0.19 for about a half-hour, trying to
reproduce this as well as bug 140920. So far it seems to be
dramatically better than 0.0.18. I see at most 8% CPU utilization,
even with many clients.

Hmm, I was able to trigger the 100% (well, 98-88%) CPU usage case
again this morning. Not sure what I did, and I can't get it to happen
again, but I did have to do a killall gam_server to recover. This is
with 0.0.19.

>> I noticed that the /usr/libexec/gam_server process is *not*
killed
>> upon exit from my (KDE) graphical session.

> the server exits after 30 seconds without client connection.

Definitely not in my case. When I logout from
KDE, /usr/libexec/gam_server does not exit (I watched it for 10
minutes before killing it). I have verified from a root shell that
the user has no more processes on the machine
(except /usr/libexec/gam_server).

PS: Haven't reproduced the high-CPU usage yet with 0.0.19

Oops, after double checking, I *had* one stale process somewhere
which apparently established contact with gam_server after I logged
out.
Killing it allowed gam_server to exit by itself now.

How about kdm? Are you running it?

> How about kdm? Are you running it?
If the question is to me, the answer is yes (as root of course).

In versions up to and including 0.0.19, I've been able to trigger the
100% CPU useage problem by following these steps: I open a single
Konqueror window on a directory that contains a lot of photographs.
Then I open a photo with Kuickshow (right click image icon-> open with
-> Kuickshow) and use the mouse scrollwheel over the Kuickshow image
to move to the next or previous image in the directory. I usually open
about 10 individual Kuickshow applications and use the mouse wheel to
move between images in each one (not sure this moving between images
is important, but I think it may be.) Finally I close all the images
by selecting "Close All" from the KDE panel (having enabled the "Group
Similar tasks" panel option)

I, and other people at the organization I contribute my time to, have
 performed these same steps with RH9 and FC1 without seeing this problem.

I'm using gdm

Please provide the informations about the process state,
gdb stack trace, and fragment of log generated using SIGUSR2 as
pointed out previously if you reproduce this with 0.0.19 .

  http://www.gnome.org/~veillard/gamin/debug.html#Debugging1

If you have a reproducable way to trigger this, then switch
gam_server before the problem to debugging mode with SIGUSR2,
make it hang following your recipe, kill it and provide the output
debug found in /tmp as an attachment to this bug.

  thanks,

Daniel

About comment#80
I cannot trigger the 100% usage by your recipe.
Using KDE-3.3.2 compiled from sources (gcc-3.4.2) on FC3.

I tried on two marchitectures (i386: P4/768MB-RAM and x86_64:
Opteron/1GB-RAM), with a
directory containing 1010 images (all of them ~100kB 600x400 JPEGs):
open >10 kuickshow instances by right-clicking, scroll a bit in
each, close all -> exit fine.
Is this "a lot of photographs" by your standards ?

This has happend only twice after moving to 0.0.19, previously it
would happen consistently. In my case, "a lot" == about 10000 files.
(Don't ask me why they like to 'organize' their photos this way)

I don't think the architecture should matter, but the machine in
question is a dual PIII-800Mhz w 1G RAM running KDE on a vanilla,
up-to-date, FC3 install.

Created attachment 108323
debug output from /tmp from before the 100% utilization occurred

w.r.t. comment #84

I looked at the logs, you are doing 2 bad things:
  1/ you ask FAM to watch a directory with 10,000+ files in it
  2/ that directory is under /mnt

 1/ means that when gam server needs to check for modifications it
need to stat all files in the directory to check for changes, which
amounts to 10,000 stat() call and checks
 2/ gamin does not use the kernel notification API for directories
which may be temporary mount files like /mnt/... so it uses a 1 second
timeout and recheck every time for changes.

  the conjunction of 1/ and 2/ means gam_server spend its time
checking your files. It's not really a software loop, not a bug
but how it was designed at the moment.
  It is not the same problem as why this bugzilla entry was opened.
You can probably avoid the problem by removing either 1/ or 2/
but I can't find a fix to your problem, based on the fact that
kernel dnotify must not be used on /mnt/... files and that maintaining
the FAM semantic on a 10,000 entry directory need to stat all entries
in that directory if the kernel can't tell they were not modified.

   You're pushing the FAM API to the limit your computer can handle
it, so this doesn't work well...

Daniel

Yes, I believe I mentioned the conditions before. However, this
problem didn't exist in FC1, it's only after upgrading to FC3 that
I'm seeing it (all drives mounted under /mnt are exactly the same as
what I was using in FC1, I didn't modify them when I installed FC3.)
Also, the 100% utilization persists after I close the client, which I
think must be a bug.

For some reason I'm now able to reproduce this consistently using just
one Konqueror session (no Kuickviews). I haven't changed any
configuration or updated since upgrading to 0.0.19. I did reboot to
see if that had any effect, but it didn't.

For my part, I can easily disable gamin on /mnt/*, however this
problem seems like a regression from FC1

I've got gam_server taking 100% of the cpu as well.

It looks like it is triggered by a combination of a process that I
wrote that collects data from a machine and puts it in the directory
and accessing the same directory from Konqueror. This is just a guess
on my part as I haven't thoroughly tested it.

The directory now has 17,500 files, each about 250KB in size.

I don't have time to run tests on this, I just thought I'd share that
I've got a similar problem.

I've just seen this problem
maui ~ 1001# uname -a
Linux maui.ee.port.ac.uk 2.6.9-1.667 #1 Tue Nov 2 14:41:25 EST 2004
i686 i686 i386 GNU/Linux
maui ~ 1002# uptime
 20:43:49 up 17 days, 10:26, 3 users, load average: 1.60, 1.51, 1.09
Not seen until today !!!!
Full FC3 install
No help tying it down I'm afraid

I just experienced this on Fedora Core 3 Test 3, folloing a full "yum
update" (gamin-0.0.17-1.FC3).

Earlier in this bug, dnotify was mentioned. About 2-3 weeks ago I was
experiencing some problems with Courier's IMAP server when running on
very large Maildirs, and my searches lead me to some posts about that
implied there were some basic deficiencies in dnotify on Linux.
Perhaps there is some issue with these system calls?

Doesn't look like there is any resolution of this issue from reading
the comments above. I have the same problem when dealing with large
amounts of files (up to 100,000). It is pretty reproducible. Note
that I am not trying to view the files themselves, but just look at
directories which contain many files. I understand that, according to
comment 85, using FAM/gamin on a directory with this many files is not
advisable. But, I have seen no comments on how to turn it off. Is
that possible? If so, what are the repercussions?

Steps to reproduce:

1. Unzip archive with 10,000 - 100,000 files in it, into a folder.
2. View the folder (the machine thinks for a while, then reports the
number of files in the folder. Shortly after this gam_server starts
to take 100% of one CPU on a dual Xeon machine).

I am using gamin-0.0.15-1.x86_64 on a RHEL4-B2 machine, and I just had
gamin max out. This is with the x86_64 install on a Athlon64 3000+
Processor.

I don't currently have time to troubleshoot, but as I didn't see
Athlon64 mentioned before, I thought I would add the comment.

I suggest people try 0.0.20 as it has a potential fix for most
corruptions raised so far.
   http://www.gnome.org/~veillard/gamin/downloads.html

Daniel

I don't see any difference between .19 and .20 related to this bug. I
did 'killall gam_server' before testing .20 and verified that I had a
new PID before testing. When I closed the gam client, gam_server
continued to run using 100% of one CPU until I killed it after about
two minutes.

Ok, I (original reporter of this bug) haven't seen this bug in quite a while
now. But now that I think about it... around the time FC3 went final, I
rearranged my drive setup.

1. Everything (including / and /home) had been on a slow RAID 5 array of 5400
RPM IDE drives. When I installed FC3, I did a fresh install on a 10k SCSI drive.

2. I didn't move any of my junk over, just mounted the old array on /slow.
$ ls -R /slow/home/wes | wc -l
31647
$ ls -R /home/wes | wc -l
2516

3. The old install had been continuously hand-upgraded (using rpm, not
anaconda) since rhl8 or so.

Now, looking at comment #85 from Daniel... I wonder if maybe the original bug I
reported is indeed fixed. Now that I think about it, the later occurrences that
I saw (and didn't bother adding to here, because I saw nothing new in them at
the time), while it was still looping, it was looping over a large number of
files, not the small number I saw at first. My slow drive array, coupled with
gamin for some reason using the timer-based rescan instead of dnotify, might
explain it. (But in my comment #6 above I note dnotify in the backtrace, don't
remember whether that was a small or large # of files loop.)

So questions for Daniel:
A) Exactly what logic does gamin use to decide if it can use dnotify or not?
Has that changed at all? (I wonder if something about my setup due to item 3
above caused it to misidentify paths on my home directory and not use dnotify)
B) When gamin starts watching a directory, does it always do so recursively?
(/slow/home/wes only has a few hundred entries itself, it's all the subdirs of
stuff that makes it big)

Happy New Year, everyone. Let's see if we can't get this bug closed before we
hit 100 comments ;-)

I experienced the problem with gamin-0.0.24-1.FC3 in Fedora 3 on an HP compaq
workstation with an Intel Celeron processor. For an unobvious reason gam_server
starts to take 100% of the CPU time until it is killed along with nautilus.

I've just been subject to this bug. I'm using FC3+all updates. rpm installed is
gamin-0.0.25-1.FC3 and I'm on an 64-bit AMD platform. I fixed the problem by
restarting my X server. Killing the process on its own just resulted in it
respawning and, after a short period, using nearly 100% cpu again.

Unfortunately all I thought to do was strace the offending gam_server process.
It was spinning trying to read the file

/usr/local/share/applications/mimeinfo.cache

and a couple of other files from that directory, which don't exist on my system.
Their correct path does not have 'local' therein.

gamin sucks, how the **** do I turn it off??

I have this issue on my system too. gamim 0.0.25 on Fedora Core 3/AMD64.

My workaround is sending a SIGSTOP (killall -19 gam_server), which will freeze
the gam_server thread. As soon as the copy process has finished, I send a
SIGCONT (killall -18 gam_server) again. gam_server then works just as expected,
consuming just very few CPU time.

I hope this bug will be found soon. It's rather annoying. :)

I'm running 32-bit fedora Core 3, with all the latest updates according to
'yum', and was getting the 99% CPU usage by gam_server after I added some
pictures to one directory, and added some symlinks to some scripts in
.gnome2/nautilus-scripts/ which I then used to play with those images. All the
tricks above didn't work (including restarting it several times), but then I
upgraded to 0.0.26-1, restarted, and everything went back to normal.

I got the upgrade from:
http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/

$ rpm -qf /usr/libexec/gam_server
gamin-0.0.25-1.FC3.i386

I'm seeing the issue described in comment 98. I'll attach the debug output
generated by sending the daemon SIGUSR2.

Created attachment 114906
gamin debug output

The strace output is looping like as below:

stat64("/usr/local/share/applications/defaults.list", 0xbfffe70c) = -1 ENOENT
(No such file or directory)
open("/usr/local/share/applications/defaults.list",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)
stat64("/home/jorton/.local/share/applications/mimeinfo.cache", 0xbfffe70c) =
-1 ENOENT (No such file or directory)
open("/home/jorton/.local/share/applications/mimeinfo.cache",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)
stat64("/home/jorton/.local/share/applications/defaults.list", 0xbfffe70c) = -1
ENOENT (No such file or directory)
open("/home/jorton/.local/share/applications/defaults.list",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or
directory)

try to update to 0.0.26 which fixes crashes and CPU consumption, or even
better 0.1.0 which also fixes a bunch of other problems.

Daniel

Do you have a copy of this built for FC3 somewhere?

> Do you have a copy of this built for FC3 somewhere?
There should be an FC3 update if gamin-0.1.0 fixes problems. However, there is
not fot the moment. You can just do an rpmbuild from the src.rpm.

For Daniel: I noticed that the following files disappeared from gamin-devel in
version 0.1.0:

/usr/lib64/libfam.la
/usr/lib64/libgamin-1.la

and this breaks things with libtool.
Therefore, upgrading FC3 to 0.1.0 for me is a no-no.

At the risk of being just another "me too", I'm getting this for the first time
after upgrading to FC4 (from 3). gamin-0.1.3-1.FC4.i386.rpm

Not sure what ths USR2 trick everyone's on about is; didn't do anything for me.

In a week, I've caught it twice, once I strace'd it cuycling through an infinite
loop of a directory's contents, the second gave zero output in strace.

Should this bug still be tagged "devel"?

I am seeing excessive CPU usage often with the Ubuntu packes of 0.1.5
(0.1.5-0ubuntu1) also. gam_server regularly goes fubar and consumes 99% CPU
time. It's highly irritating, it's come to that point that I have cron job that
kills it every 15 minutes automatically.

Anyway, I tried the SIGUSR2 trick and got the ouput, is there any other useful
info I should attach here? I can't attach with gdb to the running process since
everything is compiled without the debug option.

Here's the output when debugging is turned on for gam_server,
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_Op1icj
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_Oszxi0
http://albin.abo.fi/~ninylund/dump/gamin-debug/gamin_debug_dFcUS5

Interesting... Are non Fedora/Red Hat gamin bugs OK here? Nikalus - did you
really reproduce this problem with Red Hat's rawhide or did you just choose that
because there was nothing else?

Oh bummer, didn't think about that this is redhat's bugzilla. There was just a link on gamin's homepage
that took me here.

I guess I didn't use rawhide, since I've never heard of such a thing.

I see this as well, on my one processor Thinkpad laptop. I haven't noticed a
pattern that triggers the excessive CPU consumption, I just occasionally
notice the CPU usage meter in my system tray get pegged at 100% (either all
user time or all system time, the latter is probably gam_server calling poll
in a tight loop).

This bug is "interesting".

Originally there was a bug that caused gamin to really go into a tight 100% cpu
loop, looping over a circular list. However, we now believe that the circular
list bug is fixed, and other reports of this is mainly about gamin using lots of
CPU when polling a large directory or when getting lots of change events from
dnotify (e.g. when downloading something fast).

It would be nice if people seeing this could try to determine what sort of
problem they are seeing. I.E. When this happens, attach to gamin (with debuginfo
installed) and see if its just spinning over one particular list forever.

What version do you think it was fixed in? Last one I investigated was comment#107

This happenned to me in FC4 with gamin-0.1.0-1.1. CPU usage would peg for a
while, then be normal for a while, on the order of 30 to 90 seconds.

I was running an application that reads and writes a couple large files at a
time for several minutes. I was watching the directory in File Browser and
clicked Reload in the browser several times, and deleted a few files at a time,
a few minutes before I noticed the CPU usage. The directory has about 220 files,
and is in the /data partition which is mounted under / It is a dual processor
system.

When I did kill -SIGUSR2 the CPU usage immediately went back to normal and
remained normal. I'm attaching the beginning of the debug file.

Created attachment 123024
beginning of gam_server log file

from FC4 gamin-0.1.0-1.1

I am seeing this problem with gamin-0.1.7-1.1 (FC5T2 + current rawhide).
I didn't have the debug package installed at the time. I will try to reproduce
and send gdb info.

I am seeing this bug a lot when I use konqueror under Gnome to browse for files
(ie: as a file manager) on my big NFS server. On a 2.4 P4 with 2GB of RAM it'll
go up to 70-90% CPU and stay there for dozens of seconds. It'll do this even
when I'm not doing anything with the folders that Konqueror is browsing.

It definitely is Konqueror triggering it because I don't use KDE apps normally,
and didn't used to, and never had this problem before. I'm not sure if the fact
I'm browsing a 2TB NFS server is exacerbating the problem.

FC3 gamin-0.1.1-3.FC3

This isn't just a FC bug:

http://www.gnusolaris.org/cgi-bin/trac.cgi/ticket/60
http://www.irclogs.ws/freenode/kde/30Oct2005/13.html

I just renamed and killed the gam_server binary and now all my gnome apps are
going nuts using 100% CPU:

27852 trevor 25 0 17664 3616 3120 R 16.7 0.2 0:44.11 gnome-settings-
30113 trevor 22 0 232m 65m 32m R 16.4 3.2 30:49.73 soffice.bin
10130 trevor 25 0 26320 8692 5360 R 14.4 0.4 1:34.74 gnome-panel
27932 trevor 25 0 19260 2484 2152 R 13.8 0.1 0:14.60 gnome-vfs-daemo
10674 trevor 25 0 142m 59m 15m R 13.1 2.9 13:41.91 galeon
27918 trevor 25 0 44808 5988 4832 R 12.8 0.3 1:00.80 nautilus

Seems to be stuck this way. As I kill those apps one by one the others take up
the slack to use up 100% CPU. Guess I have to restore the file.

The other thing, is I swear that over a week or two of an uninterrupted X
session that gam_server goes nuts more frequently. It may just be my perception
though.

Why is this 'file modification detection server' polling directories? This seems
to be suicide if there are large numbers of child nodes... why doesn't it just
listen to events fired by the kernel?

Some filesystems (i.e. NFS) don't fire events.

I saw this problem using 2.6.15-1.1833_FC4, gamin-0.1.1-3.FC4 . I was using
firefox-1.0.7-1.2.fc4 to download a 132 MB file from a server on our local 100
Mb lan. I am in gnome at the time. Here is a snippet from top:
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11222 dacker 16 0 123m 52m 20m R 30.2 5.2 2:27.62 firefox-bin
11067 dacker 16 0 2464 1208 872 R 19.6 0.1 0:01.93 gam_server
11083 dacker 15 0 35116 17m 11m S 6.0 1.7 0:03.45 nautilus

This happens if I save the file to the desktop. If I save it to my home, things
get better:
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11222 dacker 15 0 123m 52m 20m S 30.3 5.2 3:19.01 firefox-bin
11067 dacker 15 0 2464 1208 872 S 10.6 0.1 0:26.35 gam_server

Sitsofe Wheeler (sitsofe) wrote :

Ah this old chesnut. For reference see this bug over in Red Hat's bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=132354

Martin:

Could you follow http://www.gnome.org/~veillard/gamin/debug.html to get a debug log of gam_server when it happens and use Add Attachment to add it to this bug?

Matt Zimmerman (mdz) on 2006-04-12
Changed in gamin:
status: Unconfirmed → Needs Info
Frank Siegert (fsiegert) wrote :

Running Kubuntu Dapper up to date as of 2006-04-13, I see gam_server constantly at 15% CPU usage and echoing a lot of messages into ~/.xsession-errors (more than once a second):

    Failed to process 26 bytes from server
    invalid length 24902
    invalid length 24902
    invalid length 24902
    invalid length 24902
.... many many times ... and then:
    end from FAM server connection
    invalid length 24902
    invalid length 24902
    invalid length 24902
    invalid length 24902
    invalid length 24902
... many many times

Funnily enough, after a reboot I was not able to reproduce it (using a VFAT storage device, browsing pictures, whatever...). But then doing nothing while writing this bug report (except for a look at "less ~/.xsession-errors") suddenly gam_server went mad again. I don't know, if accessing ~/.xsession-errors could cause this problem, sounds too weird.

I followed the debugging instructions and sent gam_server a SIGUSR2, which produced a big file in /tmp, of which I will attach a few hundred lines, which seem to repeat afterwards. maybe it is of help to track this annoying bug.

Btw... There seem to be a few duplicates of this: #30868, #31329, #3814

These are some lines from the gamin_debug_* file that is created in /tmp when you send a SIGUSR2 to gamin_server.

geofs (geof) wrote :

When I use ktorrent (kde bittorent client), gam_server uses up to 25% cpu (athlon 1200Mhz, 512MB).

I also have the same .xsession-errors messages

9172 geof 15 0 3280 1948 896 S 21.4 0.4 2:20.12 gam_server
23409 geof 16 0 50784 23m 18m R 9.1 4.7 0:04.48 ktorrent
 5939 root 15 0 62000 43m 5076 S 5.8 8.6 10:22.64 Xorg
 6028 geof 15 0 31972 15m 12m S 1.9 3.1 4:24.30 kded

I can provide the /tmp/gamin_debug_* file on request.
I contains interesting things like this

inotify recieved 1 events
inotify: moved 1 events to event queue
inotify: got MODIFY on = /home/geof/.xsession-errors
inotify: Emitting Changed on /home/geof/.xsession-errors
inotify: Emitting Changed on /home/geof/.xsession-errors
Event to ktorrent : 1, 1, .xsession-errors Changed
Event to kded [kdeinit] : 194, 1, .xsession-errors Changed
inotify recieved 1 events
gam_incoming_conn_read called
accepted incoming connection: 8
Created connection 8
gam_client_conn_read called
failed to read() from client connection
Shutting down client socket 7

and

inotify recieved 68 events
inotify: moved 69 events to event queue
inotify: got MODIFY on = /home/geof/.xsession-errors
inotify: Emitting Changed on /home/geof/.xsession-errors
inotify: got IGNORED event for unknown wd 414637
inotify: got IGNORED event for unknown wd 414636
inotify: got IGNORED event for unknown wd 414635
(...)

But i can't say what all this stuff means yet.

geofs (geof) wrote :

Not very interesting but i saw the cpu usage jump to 30% for gam_server. The torrent download is now finished and just seeding one file. Perhaps it's a ktorrent related bug. btw gam_server shouldn't consume that much.

Frank Siegert (fsiegert) on 2006-04-30
Changed in gamin:
status: Needs Info → Confirmed
Daniel Hahler (blueyed) wrote :

I'm also experiencing this problem with Kubuntu-Dapper: it uses constantly about 40% CPU and produces the same .xsession-errors.

I don't know, what has been the cause, because a reboot would probably fix it again; killing gam_server does not: it gets restarted and produces the same errors.

I'm using a vanilla kernel (2.6.16.11), because of hibernation.

I am seeing this on fc5 - and I don't use NFS at all. It is just
the default(?) LVM2 root with ext3 and boot with ext3, and a few sshfs/fuse
mounted.

Haakon Nilsen (haakonn) wrote :

I too am being plagued by this. Using Kubuntu/Dapper (i386) on my AMD Athlon 64 3700+ system with 2GB ram, gam_server constantly consumes around 11.6% CPU according to top. It doesn't matter what applications are running or not. I tried to switch it to do polling on /* instead, which only tripled the CPU consumption. My .xsession-errors fills up with this:

invalid length 24902
end from FAM server connection
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
end from FAM server connection
invalid length 24902
invalid length 24902
invalid length 24902
...

I really hope this can get fixed in time for Dapper release, because it is quite annoying.

swoke (swoke) wrote :

I got the same problem.

This is quite boring.

I hope it will be fixed soon.

Daniel Hahler (blueyed) wrote :

It seems to be related to some KDE app, at least here. Perhaps KMail or Akregator, because twice already I've just closed some apps and CPU usage went down.

I'll look closer at which app fixes it next time.

Haakon Nilsen (haakonn) wrote :

>It seems to be related to some KDE app, at least here. Perhaps
>KMail or Akregator, because twice already I've just closed
>some apps and CPU usage went down.

I do run such apps as aKregator and amaroK. Closing amaroK did not change anything, but I didn't try aKregator. It's too late to try now, since I simply replaced the gam_server executable with a noop shell script to shut it up :-)

FroZtY (p-tommassen) wrote :

I also have this problem; shutting amaroK down solves it. According to the log files, amaroK is monitoring three paths:

/home/frosty (my home dir)
/etc/security
/etc/samba/smb.conf

The last two paths are monitored by nearly every KDE application, and they don't seem to create any problem. However, amaroK is the only program monitoring my ~, so it seems safe to assume that the problem lies here.

When monitoring the gamin debug output from amarok, it seems to report a change with .xsession_errors every time. This seems correct, as .xsession_errors is changed a lot, but it is changed by gam_server itself, filling it with the familiar messages:

invalid length 24902
 end from FAM server connection
 invalid length 24902
 invalid length 24902
 invalid length 24902
 invalid length 24902
 end from FAM server connection
 invalid length 24902
 invalid length 24902
 invalid length 24902
....

It seems that the solution can be found by stopping those messages from appearing, resulting in less frequent updates to .xsession_errors and thus, amusingly, in a lower CPU usage.

I hope this helps anybody in solving this bug.

Greetings.

Sebastien Bacher (seb128) wrote :

that would be an issue from amarok that should not monitor .xsession-errors, a music player has no need for that

Daniel Hahler (blueyed) wrote :

You are right, of course.

But somehow this seems to point out a design problem: any other app might monitor "~" and cause the same problem.

Sebastien Bacher (seb128) wrote :

so that app would be to fix too, stoping writing to logs because apps might monitor the log is not a reasonable option

Daniel Hahler (blueyed) wrote :

What about if gam_server (or whatever get the I-want-to-monitor-this requests) would ignore requests to ".xsession-errors"?

Anyway, is this the cause of the initial error ("invalid length 24902") anyway?

Martin J. Bligh (mbligh) wrote :

You can't expect to be able to isolate (and make exceptions for) every log file.

Gamin needs to do something sane (ie rate limit itself). Otherwise it will always be flaky as hell.

Sebastien Bacher (seb128) wrote :

gamin could use some work but note that's not used by Ubuntu on dapper so it's not a priority for the Ubuntu team at the moment, probably something that could be worked upstream though

Wolfgang Hoffmann (woho-woho) wrote :

Could you elaborate on that? http://packages.ubuntu.com/dapper/base/ubuntu-desktop lists gamin as required. I installed a kubuntu daily dapper-live-i386.iso of last week and have gamin installed by default, showing the above bug.

I cannot tell what triggers it, I tried to reproduce but failed, it happens randomly. I have neighter KMail nor Akregator running on this system.

Besides the CPU load, filling up .xsession-errors is a serious problem, mine is 114 MB right now.

mvisa (mikko-puolikuu) wrote :

Dapper Release Candidate:

 5274 foo 15 0 2940 1356 752 S 18.6 0.1 5199:33 gam_server

(constant 20% cpu)

-rw------- 1 foo bar 1679968547 2006-05-29 13:11 .xsession-errors

(1.6 GB)

Frank Siegert (fsiegert) wrote :

Seems to me like a major bug: Has a severe impact on a small (?) portion of Ubuntu users. If developers don't consider this as major, I apologize, and you can set it back to normal.

Sebastien Bacher (seb128) wrote :

as written before dapper doesn't use gam_server, but maybe kubuntu still uses it?

FroZtY (p-tommassen) wrote :

The only programs registered to gamin on my computer are KDE programs. While it indeed impacts Kubuntu users the most, I assume Ubuntu users that run some KDE programs are also affected by this.

An ugly workaround is to make .xsession-errors write-protected; it prevents the file from changing, and gam_server doesn't detect it as changing anymore. Of course, this solution is far from desirable if you need the log.

Frank Siegert (fsiegert) wrote :

What do you mean by "dapper doesn't use gamin"? edubuntu-desktop depends on it, as do many KDE apps in Kubuntu (like k3b, which is probably used by a few non-Kubuntu users as well). After all it is in main and many main-packages depend on it.

Is it possible that all these packages can use another file alteration monitor then gamin?

Sebastien Bacher (seb128) wrote :

I mean that gam_server is not used when you do an Ubuntu installation (ie: the one with GNOME), it doesn't mean the bug should not be fixed but that's just not a priority for the Ubuntu desktop team right now

Wolfgang Hoffmann (woho-woho) wrote :

I don't quite understand. Does the Ubuntu desktop team care only about Ubuntu, not Kubuntu?

If you do a standard Kubuntu 6.06 install, you're hit by this bug. This will concern many users, and should be a priority, I think. Otherwise Kubuntu CDs will ship with this bug ...

Sebastien Bacher (seb128) wrote :

what I call "desktop team" is people working on the GNOME desktop, that issue should be worked by the kubuntu team if that's a priority for them

Vyacheslav Rodionov (bepcyc) wrote :

This is a very annoying bug for laptop users ;(

I use latest packages of Kubuntu Dapper, but still have this issue.

Now I can't leave my laptop powered on for the whole night ;(

I guess that's a gamin's bug only not other programs, because sometimes I doesn't run any program but gam_server makes an activity.

Hop,e that bug will be fixed in Release. I don't really want to switch distro because of such a small bug ;(

This is debug output of gamin on a fresh kubuntu dapper drake flight 7 install:
root@rhee:/tmp# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=6.06
DISTRIB_CODENAME=dapper
DISTRIB_DESCRIPTION="Ubuntu (The Dapper Drake Release) Development Branch"
root@rhee:/tmp# uname -a
Linux rhee 2.6.15-21-386 #1 PREEMPT Fri Apr 21 16:43:33 UTC 2006 i686 GNU/Linux
root@rhee:/tmp#

I'm running into high CPU utilization by gamin, about 30%. KDE definitly seems to want to monitor a lot of stuff, but it really drags down performance.

Fuji (rfujimoto) wrote :

--I mean that gam_server is not used when you do an Ubuntu installation (ie: the one with GNOME), it doesn't mean the bug should not be fixed but that's just not a priority for the Ubuntu desktop team right now

It is in main, this software is supposed to be tested. Can this be reassigned then to whomever is in charge of "admin" or whomever this software is a priority for (Kubuntu devs maybe...)?

I've found I have to kill all konqueror's (or any application that will poll $HOME). When killing konq, remember to search for any preloaded ones.

Rocco Stanzione (trappist) wrote :

I now know why this is happening, at least to me. All my music is on a remote samba share, mounted under /mnt. AmaroK monitors the directory for changes using gamin. Gamin expects things under /mnt to be temporarily mounted, so instead of the MUCH more efficient kernel dnotify method of watching for updates, it actually polls all the files in the mounted directory every second. In my case, it probably can't even list all the files in one second, so it's constantly working as hard as it can, in userspace, to do its thing. I'm going to request (upstream) a compile-time option to specify the directory or directories to poll without dnotify, so we can package it as /media, where we expect our temporary mounts to be (the gamin author, I think, is a redhat guy, and expects them under /mnt). Changing to "needs info" because I wonder how many of the rest of you have a similar setup (directories with lots of files under /mnt) that would allow us to close this bug if we get the new compile-time option and a new package for it.

Changed in gamin:
status: Confirmed → Needs Info
Edu (martinez-bernal) wrote :

I have tons of mp3 in a fat32 partition, but amarok is not the only thing that makes gam_server eat cpu (in my case). It happens appartently with no reason. Of course, running amarok gam_server goes mad for sure.

Andre Heßling (ahessling) wrote :

I have a similar configuration.
Under /media I have mounted 3 ntfs partitions which have lots of files on it.
My entire music collection is located on one of the ntfs partition which is accessed by amarok and therefore apparently by gamin.
I am quite sure that I noticed high cpu usage one time after I copied a (small) file from one of my ntfs partition to my root partition using Krusader.

Tom Lawton (tlawton) wrote :

Brand new install of Kubuntu
No networked devices, nothing under /mnt at all
Root is xfs, /home is xfs, /boot is ext2 - all partitions on a single HD (400Gb Maxtor)

gam_server is constantly using ~16% CPU and repeatedly writing to .xsession-errors

Failed to process 26 bytes from server
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
Failed to process 26 bytes from server
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902

etc

OR

invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
end from FAM server connection
invalid length 24902
invalid length 24902
invalid length 24902
invalid length 24902
end from FAM server connection

This file is growing at a rate of about 1Mbyte every 10 minutes

I think this may be a separate bug to the original 'lots of files under /mnt' bug, but it's the same as a few people have mentioned above.

From my fresh-install kubuntu system with a rapidly-filling ~/.xsession-errors and gam_server on a constant 16% CPU

Gard Spreemann (gspreemann) wrote :

The same thing happens here (relatively fresh Kubuntu Dapper install). It seems to be brought on either by KTorrent, amaroK or both.
I replaced the gam_server executable with an empty shell script, so now my system behaves normally. Of course, various applications complain about a lacking FAM, but other than that, it seems fine.

I will be out of the office through Wednesday June 21st, 2006. I will
be back in the office on Thursday, June 22. If you require assistance,
please use the Systems Help Request Form:
http://intranet.mse.jhu.edu/forms/helprequest.php.

If you need immediate assistance please contact Mercy Anaba (x65306,
<email address hidden>), or phone the IT@JH Help Desk (x6-HELP). If it is an
emergency, please instruct the IT@JH Help Desk to contact the Library
Systems On-Call Pager.

>>> "<email address hidden>" 06/14/06 14:42 >>>

The same thing happens here (relatively fresh Kubuntu Dapper install).
It seems to be brought on either by KTorrent, amaroK or both.
I replaced the gam_server executable with an empty shell script, so now
my system behaves normally. Of course, various applications complain
about a lacking FAM, but other than that, it seems fine.

--
gam_server consumes lots of cpu time
https://launchpad.net/bugs/36581

matt_hargett (matt-use) wrote :

This bug is killing my laptop's battery life, generating heat, and making drive accesses slow. I'm not running amarok, ktorrent, or anything like that. I've got kontact and konq open to a web page and that's it.

 I'll give $50 to whomever fixes it and gets it into the KUbuntu 6.06 main repository.

matt_hargett (matt-use) wrote :

Adding the following lines to the /etc/gamin/gaminrc fixed this problem for me.

none /var/log/*
fsset ext3 notify
fsset ext2 notify

If anyone is using NFS, they'll have to uncomment the config line about NFS in there as well.

jmspeex (jean-marc-valin) wrote :

Until the problem is fixed, I think Ubuntu should include an option to just disable gamin without having to remove the binary. Considering the little benefit it provides (haven't really noticed a difference after removing it), gamin causes far more harm than it helps. Actually disabling it by default would be even better.

See also bug 196444 with regards to constant 4% CPU usage for no reason and
*huge* mem leak. Probably related. Bugs confirmed in FC3, 4 and 5. gam_server
blows.

(In reply to comment #120)
> Some filesystems (i.e. NFS) don't fire events.

Perhaps, but even NFS must call though the kernel I believe.

Unless there is really some exception to this (even on newer kernels), it seems
like gam_server needs to switch into some kind of a callback mode and notify its
client via those messages... instead of polling directories. Retain the polling
mode for really old kernels, that's fine..

I ran a few tests. Killing all konqueror and any other KDE apps I could find
(there were none) didn't help. gam_server still eats a constant 3.5-4.5% CPU on
my 2.4 P4. I tried killing/restarting gam_server after that and it immediately
starts up again and still eats 3.5-4.5%. I can't easily umount my NFS as it's
used for important server processes.

Lews (dorrin) wrote :

I'm getting this problem on Edgy, running pure GNOME. Adding "fsset ext3 notify" to the bottom of /etc/gamin/gaminrc also fixed it for me. Alacarte and gnome-panel both set this off.

Another possible common thread: another reporter says they have a 2TB fs
(local). I have a 2TB fs (non-local) mounted over NFS. Do other reporters have
large fs's? Also, I run SMP kernel.

Nuno Lucas (ntlucas) wrote :

Same problem as everyone else, but noticed a strange thing on doing a "ps -x".

As shown in the attachement, esd and some evolution programs are running too.

I'm using a fully updated Kubuntu Dapper and never started evolution on my own will, so I fell that very strange...

Nuno Lucas (ntlucas) wrote :

Just to add that I never had noticed this before, except today, after installing eclipse (from the eclipse site and installed in my home directory) and "toying" around with it some time.

After closing it (and the CPU settled down because I don't have enough memory nor CPU for eclipse) that's when I noticed the CPU constant usage at 30-40%.

My machine: P4@1.7GHz - 512MB

Nuno Lucas (ntlucas) wrote :

grrr... forgot the gamin_debug file...

It's >3MB uncompressed in just a few seconds, so hope it helps...

arty (spamtrap-paradise) wrote :

This is my case, not sure what my point is really , except that Ubuntu/Kubuntu Dapper is not usable for me because the performance is way too slow.

If I mount an encrypted drive with a large mp3 collection on it ,and then start Krusader , cpu usage goes from the 10-20 range to the 95-100% range on a duron 800.
My root,home ,and swap are also encrypted so there will be some overhead there as well.

I am running Ubuntu with KDE installed (cos the Kubuntu install wouldn't let me partition my drive the way I wanted to...or I wasn't using it properly ...I'm a linux noob)

I am running KDE partly because Gnome was slow and the file browser would crash when I deleted a directory (and other applications crashed), and I liked the look of krusader. I did a new install in case my crashes were self inflicted.

If I select gamin in adept manager to uninstall, and preview the changes, removing gamin will remove the ubuntu desktop along with 11 pages of other apps. There is lots of Gnome stuff there.

Ubuntu was very slow for me even before I added KDE, but I wasn't aware of this bug so can't say for sure that this was the cause....but if I was a betting man...

Same here. Using reiser filesystem for partitions. Laptop with pentium M processor. I can see that gam_server uses 99.9% cpu then I start amaroK but after some seconds it drops to normal 0.4% during playtime. Here comes the weird part: sometimes gam_server uses more than 80% for no reason (e.g. amarok is in system tray and not used) and quitting amaroK doesn't help. During this odd behaviour gam_server apparently writes repeatedly "invalid lenght <some numbers>" into ~/.xsession-errors.

arty (spamtrap-paradise) wrote :

I renamed gam_server to gam_server.disabled and my system is now running well.

I don't know what the consequences of this action are. Someone else said that they had done it and they had had few problems.
..from this page...
http://www.ubuntuforums.org/showthread.php?t=123132

So do it at your own risk.

For me, the choice was to rename gam_server or stop using ubuntu/kubuntu.

we're slowly removing gam_server from kubuntu.

Wolfgang Hoffmann (woho-woho) wrote :

That's good. It's still a problem for me on Dapper.

Just to let you know that, although the bug is pending for a long time now, work on this still is appreciated :-)

I want to add a "me too":
RHEL4 on quad-cpu x86_64 server, I have a 2TB+ volume (over lvm2) as well,
gamin-0.1.1-3.EL4
gamin-0.1.1-3.EL4
I run NFS server, but there are also some directories NFS-mounted to this server.

I see I put myself in the cc-list in May, but honestly I haven't seen the problem
recently (on FC5, still only local LVM volumes < 80GB + sshfs/fuse mounts).
I assume that the bug has either been fixed since, or that I have somehow managed
to work around it - I have switched off nautilus in the session manager, for
example, since I don't use nautilus at all anyway.

I have all my music and oder data under a fat32 partition, also mounted under /media, but I think nothing monitors it (I don´t know if banshee uses gamin, but I have it closed). The problem starts just after the logon.

 it´s gnome-panet which gets CPU hungry. However, killing gam_server solves temporarily the problem. But whenever I change something with the Menu editor, the panel gets CPU hungry again (and the applications menu flickers)

Another "me too" here.
TwoDual core CPUs x86_64 workstation running 2.6.9-34 kernel (Red Hat Es 4
Workstation) with 8Gb RAM.
I have a couple of 2+Tb LVM arrays.
gam_server using 85-95% of one CPU.

Not sure when this issue started or what set it off.

Anders Aagaard (aagaande) wrote :

what is gamin being replaced with? Fam?

Using ktorrent and tracker at the same time my cpu usage with gamin stayed low, but it was using over 1gb memory. So I'd love to know :P

Another "me too".
Dual Xeon fileserver running 2.6.9-34.0.1.ELsmp with 4Gb RAM. Attached to FC
SAN (multi-TB). Many other servers NFS-mount and CIFS-mount to this box.
gam_server using 80-95% of one CPU constantly.

Sebastien Bacher (seb128) wrote :

Do you still get the bug with Ubuntu 7.04?

mvisa (mikko-puolikuu) wrote :

Seems not to occur on feisty anymore. .xsession-errors is 70KB. Although there's no gami* process running either.

Well, it does. I recently upgraded my Xubuntu installation to Feisty and yesterday I noticed my gam_server taking 60-99% of the cpu constantly. But kierfayt's workaround seemed to do the trick for now.

me too.
gamin-0.1.1-4.EL4
thunderbird-1.5.0.10-0.1.el4.centos
kernel-2.6.9-42.0.8.EL

i used to see this alot on a prior system running courier-imap. running dovecot
on this one. wasn't seeing the gam_server hang much recently until recent yum
update when among others thunderbird upgraded (to above) from
thunderbird-1.5.0.9. now i frequently find gam_server eating all available CPU.
 kill it, and it immediately gets respawned. kill thunderbird, and gam_server
quiets down. relaunch thunderbird and things are fine again for a couple days.
 if you want more info let me know what would be helpful.

Changed in gamin:
status: Needs Info → Unconfirmed

This is not a "Medium" problem - it is a very severe problem. In years of
running Linux, I have never seen a package perform like this - its practically
a virus.

gam_server constantly causes problems, and I must renice it. It seems to have a
terrible interaction with KDE. This has been going on for too long, a package
needs to be created to remove this software or it needs to be fixed!

This is not a medium bug - just do a google search on gam_server!!!!!

i have reduced the effects of this bug by (1) every 15 minutes launch a cron job
to renice all gam_server processes to bottom priority, and (2) backout
thunderbird from 1.5.0.10 to 1.5.0.9, which seems for whatever reason to far
less frequently encounter the bug. but of course, the bug remains.

Logged into the gnome desktop with my homedir mounted from NFS, gam_server is clocking up more and more processor time, and is the 3rd highest cpu-time user (behind firefox and Xorg).

It's storms the NFS server with file and dir stat() calls, more than ~100, every ~3 seconds...that's getting expensive!
I have no nautilus windows open, except (of course) the background desktop.

Let me know if you'd like any debugging. Distro is Ubuntu Feisty 7.04 and gamin is version 0.1.8-1ubuntu3 with 6 days of uptime.

A quick strace shows:

$ strace -p `pidof gam_server`
poll([{fd=3, events=POLLIN}, {fd=4, events=0}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=8, events=POLLIN}, {fd=11, events=POLLIN}, {fd=4, events=POLLIN}, {fd=14, events=POLLIN}], 12, 1000) = 0
poll([{fd=3, events=POLLIN}, {fd=4, events=0}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=8, events=POLLIN}, {fd=11, events=POLLIN}, {fd=4, events=POLLIN}, {fd=14, events=POLLIN}], 12, 0) = 0
stat("/home/daniel/.config/menus/preferences-merged", 0x7fffcf91eff0) = -1 ENOENT (No such file or directory)
open("/home/daniel/.config/menus/preferences-merged", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No such file or directory)
<other menu files>
stat("/home/daniel/Desktop", {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0
open("/home/daniel/Desktop", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 15
fstat(15, {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0
fcntl(15, F_SETFD, FD_CLOEXEC) = 0
getdents64(15, /* 58 entries */, 32768) = 1960
getdents64(15, /* 0 entries */, 32768) = 0
close(15) = 0
stat("/home/daniel/Desktop/diff", {st_mode=S_IFREG|0664, st_size=107008, ...}) = 0
<many stat calls>
stat("/home/daniel", {st_mode=S_IFDIR|0755, st_size=229376, ...}) = 0
open("/home/daniel", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 15
fstat(15, {st_mode=S_IFDIR|0755, st_size=229376, ...}) = 0
fcntl(15, F_SETFD, FD_CLOEXEC) = 0
getdents64(15, /* 107 entries */, 32768) = 3688
getdents64(15, /* 0 entries */, 32768) = 0
close(15) = 0
stat("/home/daniel/.metacity", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/home/daniel/.profile", {st_mode=S_IFREG|0664, st_size=850, ...}) = 0
<many many stat calls>

Ivo Jimenez (ivotron) wrote :

I bought an enclosure for a 200 GB HDD and same thing happened. The workaround from klerfayt solved it.

Martin J. Bligh (mbligh) wrote :

This is still broken. Closing the bug doesn't fix it.

Paul Hoell (hoellp) wrote :

Since today I have this issue on Ubuntu gutsy.
I think about deactivating it, but if needed, I could get some info before. If you need anything, mail me.
<-

Gamin is up to version 0.1.9 now in F8 and rawhide; F7 has 0.1.8, and even RHEL4
has been updated as far as 0.1.7. Older distro releases are EOL/in maintenance
support at best (i.e. go grab an SRPM and update it yourself) There have been
no new reports added to this bug in half a year. I personally haven't seen it
in *years*. Is anyone experiencing this with a semi-current version of gamin?
Or can we finally close this one?

We have just begun upgrading our compute farm to RHEL4 and we are seeing this
problem.

We have a fairly complex NFS setup with several netapps volumes.

I'm not the sysadmin, so I don't have root access.

To give a little insight, we have set up a single machine with freenx and are
using it for our local site session server. I am only paying attention to this
machine right now, but I know that other RHEL4 machines have had issues prior to
this when users were starting VNC sessions before the freenx transition.

We did not have this problem with RHEL3.

lngl0116:/home/kbingham-> rpm -qf /usr/libexec/gam_server
gamin-0.1.7-1.2.EL4
gamin-0.1.7-1.2.EL4

Here are the contents of the gaminrc file:
lngl0116:/home/kbingham-> more /etc/gamin/gaminrc
# configuration for gamin
# Can be used to override the default behaviour.
# notify filepath(s) : indicate to use kernel notification
# poll filepath(s) : indicate to use polling instead
# fsset fsname method poll_limit : indicate what method of notification for the
filesystem
# kernel - use the kernel for notification
# poll - use polling for notification
# none - don't use any notification
#
# the poll_limit is the number of seconds
# that must pass before a resource is polled again.
# It is optional, and if it is not present the
previous
# value will be used or the default.

fsset nfs poll 10 # use polling on nfs mounts and poll once
every 10 seconds

Not all users are seeing this run out of control:

lngl0116:/home/kbingham-> ps -eaf | grep gam_server
ssirun 1096 1 0 2007 ? 00:05:31 /usr/libexec/gam_server
hsales 2055 1 0 Jan02 ? 00:00:56 /usr/libexec/gam_server
szanatta 2882 1 0 Jan03 ? 00:00:11 /usr/libexec/gam_server
dreed 3710 1 0 Jan03 ? 00:00:22 /usr/libexec/gam_server
jkoller 3801 1 0 2007 ? 00:00:12 /usr/libexec/gam_server
jlawson 6022 1 0 Jan04 ? 00:01:40 /usr/libexec/gam_server
wstrickl 8332 1 36 Jan05 ? 11:33:38 /usr/libexec/gam_server
nphillip 9248 1 0 2007 ? 00:00:41 /usr/libexec/gam_server
nmysore 13140 1 55 Jan04 ? 1-04:11:33 /usr/libexec/gam_server
rkhan 13352 1 0 2007 ? 00:01:54 /usr/libexec/gam_server
bonfanti 23065 1 0 Jan03 ? 00:00:21 /usr/libexec/gam_server
bcruiksh 24066 1 0 Jan03 ? 00:00:08 /usr/libexec/gam_server
mbarnes 24673 1 0 Jan02 ? 00:00:20 /usr/libexec/gam_server
bgreiner 24736 1 54 Jan04 ? 1-01:59:05 /usr/libexec/gam_server
kbingham 25283 1 0 Jan05 ? 00:00:01 /usr/libexec/gam_server
kbingham 25419 21283 0 15:28 pts/4 00:00:00 grep gam_server
mfalkinb 26903 1 0 Jan05 ? 00:00:01 /usr/libexec/gam_server
lphillip 27510 1 0 Jan04 ? 00:00:38 /usr/libexec/gam_server
jkeefer 29207 1 48 Jan05 ? 18:10:39 /usr/libexec/gam_server

Any suggestions?

UPDATE to my comment #135:

We did not see this previously:

http://kbase.redhat.com/faq/FAQ_85_11914.shtm

Our sysadmin is doing the upgrade to gamin-0.1.7-1.4.EL and we will see if we
have any additional issues.

2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 09:40:21 EST 2007 x86_64 x86_64 x86_64
GNU/Linux

Sorry, Wes - I'm running 0.1.9 on a production server and experience the runaway
problem.

gam_server will behave for a few hours - sometimes a few days.

I wrote a custom daemon that uses the fam-2.7.0 library. gam_server is, of
course, required by fam.

As a temporary solution a cron job now stops my daemon. Doing this is not
enough, however. I still have to kill gam_server, and then restart my daemon.
(I'm a little worried about the effect of killing gam_server in the midst of
some operation).

Is there a better alternative? My daemon monitors a few directories and
triggers actions when files appear. gam runs in polling mode because I don't
want to re-build the kernel on the production box. Maybe this isn't a problem
if it runs from the kernel?

I've tried the config file trick:

/etc/gamin/gaminrc:

fsset ext3 poll 5

Still, no joy.

Changed in gamin:
status: Unknown → Fix Committed
Changed in gamin:
status: New → Triaged
Changed in gamin:
status: Unknown → New

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Sebastien Bacher (seb128) wrote :

the bug has get no activity for a year and no new duplicate it's not a high importance one

Changed in gamin (Ubuntu):
importance: High → Low

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Changed in gamin (Fedora):
status: Fix Committed → Won't Fix
Changed in gamin:
importance: Unknown → Medium
Changed in gamin (Fedora):
importance: Unknown → Medium
Changed in gamin:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.