strigidaemon needs exclude lists, defaults eat up too many resources

Bug #137753 reported by Christian Vogler
18
Affects Status Importance Assigned to Milestone
strigi (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Gusty x86-64, on a laptop

I have an existing home directory with over a quarter million files. The default setting of strigi to index everything under the user's home directory is far too broad and eats all available disk bandwidth, considerable CPU time, and takes up multiple gigabytes of disk space, as well. I let it run for more than four hours, and even then the daemon did not finish its automatically started indexing run.

Restricting strigi to just everything under .kde fares somewhat better, but even then there are more than 50,000 files to consider. Moreover, every time a file changes, such as the arrival of a new IMAP e-mail, strigi starts eating up CPU time for several seconds. For a laptop on battery power this is not desirable. In addition, there is little reason to index directories such as CVS, .svn and so on.

All things considered, it seems to me that strigi urgently needs include and exclude lists, and that furthermore such lists should be configured with sensible predetermined defaults, before it can be considered ready for prime-time in Ubuntu. At the moment, the user's view is more or less that the system frequently slows to a crawl for no apparent reason.

Revision history for this message
Jan de Visser (jan-de-visser) wrote :

Another problem: It clobbers your file-system cache. If you're doing e.g. development, it is quite common to have your complete source tree in the kernel filesystem cache, resulting in fast builds. Strigi will load whatever it is indexing in the fs cache, happily evicting everything I just recently put in there.

Revision history for this message
gpothier (gpothier) wrote :

Note that by default, strigi also indexes ~/.strigi... that should be on a exclude list.

Revision history for this message
David Miller (djmdave) wrote :

looking at the ~/.strigi/daemon.conf)shows that any file or folder beggining with a dot will not be indexed, so (in theory) gpothier's comment shouldn't be true.

CVS folders should be a case of adding:

 <filters>
   <filter pattern='CVS/' include='0'>
   </filter>
 </filters>

Revision history for this message
PerJensen (per-net-es) wrote :

I agree that the default directories to search, include too much. Also it would be nice if the daemon was running with at nice higher nice value. That way it would get out of the way when other programs need the cpu. The high cpu usage btw. makes my Compaq Presario very loud!

Revision history for this message
sun-wukong (sun-wukong) wrote :

Same for me here with Gutsy beta 64 bits on a Core2Duo, 2Gb RAM system.

Strigi slowly eats up CPU% up to 100%, stealing one core for itself and none else. Thanks for those new dual-core chips otherwise the system will hang on.
But this is not the way it should be. Indexing should start when the PC is idle, and stop when user activity starts again. At least, this should be the default behaviour, probably with a warning for the user when he searches for something and everything hasn't been indexed yet.

Revision history for this message
Cameron Garnham (da2ce7) wrote :

I've been trying to use strigi to index a quite large amount of files eg around 150000 files over 800GB, and all it seems to do is make the computer completely sluggish! I think that strigi dose not scale very well to large indexes, and thus is very annoying (for real use).

Revision history for this message
Michael (michaeljt) wrote :

Agree. I tried out Strigi on Gutsy RC, and found that it slows down the system unacceptably, including foreground processes (re it runs at nice 0). I could imagine someone trying out Kubuntu starting it by mistake and not knowing why their system was so slow. That would not make a very good impression.

Revision history for this message
Harald Sitter (apachelogger) wrote :

Got exclude filters (at least in combination with nepomuk).

Changed in strigi:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.