Hi load average using in 8.04a4 NIS/Autofs setup copied from 7.10

Bug #190737 reported by robogeek
14
Affects Status Importance Assigned to Milestone
Sun Java
Invalid
Undecided
Unassigned
Tracker
Fix Released
Medium
autofs (Ubuntu)
Confirmed
Low
Unassigned
tracker (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I have a NIS/Autofs setup which works on 7.10. I require NIS/Autofs to access filesystems in Sun's internal network. With the setup, if I cause automounted file systems to be mounted then the system quickly goes into high load average, becomes extremely unresponsive, the System Monitor app says the CPU is pegged at 100%, memory use does not go high but remains modest, and "top" reports the hald and automount processes using most of the CPU.

/etc/defaultdomain: Has the appropriate NIS domain name

/etc/yp.conf: Has no config

/etc/auto.master has:
--------------------------
/misc /etc/auto.misc --timeout=60
#/smb /etc/auto.smb
#/misc /etc/auto.misc
/net /etc/auto.net
+auto.master
--------------------------

Revision history for this message
Martin Pitt (pitti) wrote :

Disclaimer: I have never used autofs before, so sorry for sounding dumb: can you please give a complete description of your setup, so that we can replicate it locally and debug?

Alternatively (if you prefer that), can you get an strace -p <pid of wild automount process> output?

Revision history for this message
robogeek (david-herron-sun) wrote :

I'm doing some debugging.. The above setup is sufficient however it is dependant on the NIS setup we have here inside Sun. That is really comprehensive, spanning across all of Sun's systems and offices, and we have many NFS mounted file systems that are described in NIS maps. It would be difficult to give much about the setup ...

The key thing I found this morning is /var/log/syslog was getting spewed with messages like in the picture.

A behavior I did not mention is that as soon as I cause an automount (e.g. "ls /net/sqe1") the file browser (nautilus) pops up multiple file browser windows, one for each exported directory.

Later on it gets other messages saying '>> mount.nfs: mumble:/export/jestest failed, reason given by server: Permission denied' There are a lot of those messages with different mount points, each time saying 'Permission denied'.

I eventually managed to recover the system by 'killall nautilus' ... an earlier 'killall automount' did not do it, only when I killed nautilus did the system recover its load average to normal.

Revision history for this message
robogeek (david-herron-sun) wrote :

Next thing I found to turn off is the Searching and Indexing.. I think that's the preference. It's a new choice under Preferences and I turned off all the options. At least one of those preferences has to do with autoindexing mounted file systems.

The current thing I'm looking at is there are a large number of processes where "ps auxfwww" reports the command line being: /usr/lib/gvfs/gvfsd-trash --spawner 1.9 /org/gtk/gvfs/execute_spaw/1

The system load average is still very high

FWIW to help you understand the extent we use NIS/Automounted-NFS if I run "mount | wc -l" it gives me a number around 3091. I simply caused a few (approx 10) automounts, and those brought out a number of other automounts automatically.

Revision history for this message
robogeek (david-herron-sun) wrote :

Next things...

"mount | wc -l" is returning an ever-increasing number.

I found the Searching and Indexing preference dialog and it looked like it would be a related feature. I've turned off or turned down all the options, but this caused no difference in behavior.

Something I didn't mention before is that for every automounted directory a desktop icon appears.

Revision history for this message
Paul Smith (psmith-gnu) wrote :

I was just coming here to report the issue of the .Trash directories as well. However, I only have about 15 automount entries on my system so I don't see the same kinds of load problems seen by robogeek. I just noticed that my daemon.log file is filling up with autofs errors:

Apr 4 16:14:41 psmith-ubeta automount[3093]: failed to mount /export/autofs/.Trash
Apr 4 16:14:41 psmith-ubeta automount[3094]: failed to mount /export/autofs/.Trash-10490
Apr 4 16:14:41 psmith-ubeta automount[3099]: >> /sbin/showmount: can't get address for .Trash
Apr 4 16:14:41 psmith-ubeta automount[3099]: lookup(program): lookup for .Trash failed
Apr 4 16:14:41 psmith-ubeta automount[3099]: failed to mount /net/.Trash
Apr 4 16:14:41 psmith-ubeta automount[3105]: >> /sbin/showmount: can't get address for .Trash-10490
Apr 4 16:14:41 psmith-ubeta automount[3105]: lookup(program): lookup for .Trash-10490 failed
Apr 4 16:14:41 psmith-ubeta automount[3105]: failed to mount /net/.Trash-10490
Apr 4 16:14:41 psmith-ubeta automount[3111]: failed to mount /opt/net/.Trash
Apr 4 16:14:41 psmith-ubeta automount[3112]: failed to mount /opt/net/.Trash-10490

etc. Note that 10490 is my UID. This seems to happen every time automount tries to mount something.

It looks like some kind of bad interaction between the new trashcan capabilities (maybe gvfs) I've heard about, and autofs. It might not be an autofs problem at all, but I don't know what part of the system is trying to access the .Trash directory on every filesystem, and where that process, whatever it is, gets its list of filesystems to test.

This is a big problem for people trying to use Ubuntu in any kind of traditional enterprise environment, where autofs is ubiquitous.

Revision history for this message
robogeek (david-herron-sun) wrote :

Oh, I seem to not have mentioned this. It appears the real culprit is trackerd ... it appeared to me trackerd was trying to mount all automounts and index everything everywhere.

Revision history for this message
Paul Smith (psmith-gnu) wrote :

This is impacting me quite a bit as well. I don't have as many automount points as robogeek but I see the same issues: my logs are full of trackerd (I suppose) trying to read <mount>/.Trash where <mount> is an automount point configured in auto.master.

You might check this Gnome bug: http://bugzilla.gnome.org/show_bug.cgi?id=511474 it contains a patch (although it's not an attachment so it's not reasonable to apply it). I haven't tried it but it seems to be on-point for this bug.

Revision history for this message
Claudio Bernardini (claudiob) wrote :

Same happening to me on every Ubuntu 8.04 clients with automount NFSv4 maps coming from an LDAP server.

May 31 10:29:39 myhost automount[6637]: failed to mount /home/.Trash
May 31 10:29:39 myhost automount[6638]: failed to mount /home/.Trash-1535

The load of the system gets high and after that the pc hangs and needs reboot. Still investigating if the cause is this behaviour.

Revision history for this message
Paul Smith (psmith-gnu) wrote :
Download full text (10.6 KiB)

I'm seeing the exact same behavior as Claudio. I did a bunch of investigation and here is what I found:

For the last week or so, almost every morning when I come into work my system is hung up in a strange way. I can move my mouse but I never get asked for my password to unlock my screen. I can C-A-F1 etc. to get back to a console but after I type my username at the login prompt, I never get asked for a password and then that console is locked up. If I have a console session already logged in from the day before, then I can use it for a while but eventually some command will lock hard; can't ^C, can't ^Z, can't kill -9, nothing.

If I try to C-A-D to reboot the system starts to come down but then hangs, hard, trying to bring down automount. Reset just tries to reboot again and hangs in the same place. I have to power off/on the system completely. Bummer.

So, I tried debugging this problem. First, I logged in as root on every console (F1-F6). The next morning when the system was hung, I found a command that hung (just "ls") and then I ran it in another console under strace.

It turns out what's happening is it's opening /proc/mounts, which succeeds, then trying to read(2). The read system call never returns and there's no way to kill that process, at all, once it's in that state. Also I note the load on the system is very high: typically over 7. However top shows no processes chewing CPU. I also note that there are some "duplicate" automount processes running (that is, more than one for the same map). After I reboot, of course, everything is fine.

Last night I started all the consoles and in one of them I wrote a little shell script that ran `date`, then did cat /proc/mounts, then slept for 15 seconds, then did it again. I piped the output to a file.

I found that the hang happened last night at 22:51 EDT. There was nothing interesting in the messages log, but in syslog I find a lot of messages right around that time trying to do this silly .Trash stuff:

Jun 2 22:36:02 psmithub -- MARK --
Jun 2 22:50:57 psmithub automount[29205]: failed to mount /opt/net/.Trash
Jun 2 22:50:57 psmithub automount[29206]: failed to mount /opt/net/.Trash-10490
Jun 2 22:50:57 psmithub automount[29207]: failed to mount /export/autofs/.Trash
Jun 2 22:50:57 psmithub automount[29208]: failed to mount /export/autofs/.Trash-10490
Jun 2 22:51:00 psmithub automount[29217]: failed to mount /nfs/.Trash
Jun 2 22:51:00 psmithub automount[29218]: failed to mount /nfs/.Trash-10490
Jun 2 22:51:00 psmithub automount[29219]: failed to mount /mnt/.Trash
Jun 2 22:51:00 psmithub automount[29220]: failed to mount /mnt/.Trash-10490
Jun 2 22:51:00 psmithub automount[29221]: >> /sbin/showmount: can't get address for .Trash
Jun 2 22:51:00 psmithub automount[29221]: lookup(program): lookup for .Trash failed
Jun 2 22:51:00 psmithub automount[29221]: failed to mount /net/.Trash
Jun 2 22:51:00 psmithub automount[29228]: >> /sbin/showmount: can't get address for .Trash-10490
Jun 2 22:51:00 psmithub automount[29228]: lookup(program): lookup for .Trash-10490 failed
Jun 2 22:51:00 psmithub automount[29228]: failed to mount /net/.Trash-10490
Jun 2 22:51:00 psmith...

Revision history for this message
Millard73 (miturria-eecs) wrote :

I'm also seeing the log entries. We're about to deploy Ubuntu in an environment with a good number of users (about 3300) and a couple of hundred desktops.All their home areas and a large number of other file systems are automounted. I haven't seen the slowdown and high load averages yet, but I've also not "loosed" Ubuntu Hardy on the masses yet. Does anyone know the status of this bug? I'd really appreciate it.

Changed in sun-java:
status: New → Invalid
Changed in tracker:
status: Unknown → New
Chuck Short (zulcss)
Changed in autofs (Ubuntu):
importance: Undecided → Low
status: Incomplete → Confirmed
Changed in tracker:
status: New → Fix Released
Changed in tracker:
importance: Unknown → Medium
Changed in tracker (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.