recoll slows system to a crawl

Bug #718427 reported by gpk on 2011-02-13
72
This bug affects 15 people
Affects Status Importance Assigned to Milestone
recoll (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: recoll

I have a lot of files, and recoll slows the system to a crawl when indexing. Opening up a terminal takes 8 seconds. Getting the "file" menu on a web browser requires you to hold the button down for 4 seconds. Basically, it's hammeringmy disk. CPU usage is not high, but the I/O is. This is a reasonably capable system: a 2-core CoreDuo, 4 gig of memory, a 7200 RPM disk. It ought not be slow.

Recoll should make use of ionice to reduce the I/O probability. And, if that's not enough, it ought to sleep in between operations that use the disk. It should be really really easy to make it a good citizen instead of a resource hog.

Basically, if it continues to annoy me while indexing, off it goes into the bit bucket.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: recoll (not installed)
ProcVersionSignature: Ubuntu 2.6.32-28.55-generic 2.6.32.27+drm33.12
Uname: Linux 2.6.32-28-generic x86_64
Architecture: amd64
Date: Sun Feb 13 22:19:09 2011
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.1)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: recoll

gpk (gpk-kochanski) wrote :

I should add that the .recoll directory is on a compressed btrfs file system. Don't know if that has anything to do with it, but even so, recoll needs to be a better citizen and not hog the disk.

gpk (gpk-kochanski) wrote :

I just experimented: running

ionice -c2 -n7 nice -19 recoll

does the trick beautifully. It means that the indexing is no longer annoying.

Note: This is not an entirely satisfactory work-around. because it can make the search user interface slow. The correct solution is for the recoll code to call ionice and nice when it spawns off the indexing process. I'll provide a patch, if anyone cares.

Miles Colman (mcolman) wrote :

Thanks, this helped me. Half a year late, but if you just run ionice on recollindex, the user interface should still be fast, right?

Before I tried mounting my disk with noatime and that helped a bit.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in recoll (Ubuntu):
status: New → Confirmed
Alfagulf (alfagulf) wrote :

I just removed tracker and installed recoll on my ubuntu 11.10 64Bit system.
Following the instructions regarding Periodic Indexing, I have indexed my system once and then I added "recollindex -m" to my Application Start-up list to enable real time indexing.

I noticed that as soon as the indexing start, memory usage climbs to 100% (I have 6GB), and as if that is not enough, swap start to increase as well to about 20%, during this time, the system becomes very slow and not responsive!!

After few minutes, memory usage drops! and the system response improves, but rather slow, it take more time than usual to start applications!

I am placing a lot of hope on recoll , especially after Google Desktop search application is not compatible any more with Oneiric and the tracker is rather useless!

Can some one help?!!

Thanks

Alfagulf (alfagulf) wrote :

OK, I found this link which answered the memory usage problem:
http://www.freelists.org/post/recoll-user/Resource-use-GNOME-integration,1

Basically, I enabled logging and found the large file (About 5GB) which caused recollindex to use so much memory.
After skipping the file, memory usage is reasonable now.

Manual adjustment isn't really a practical solution. It's nice to
know that
the problem seems to be triggered by individual large files, but in
order to
be really useful, recoll needs an algorithm that can handle that case.

On Fri, 09 Dec 2011 13:37:58 -0000, Alfagulf wrote:
> OK, I found this link which answered the memory usage problem:
>
> http://www.freelists.org/post/recoll-user/Resource-use-GNOME-integration,1
>
> Basically, I enabled logging and found the large file (About 5GB)
> which caused recollindex to use so much memory.
> After skipping the file, memory usage is reasonable now.
>
> --
> You received this bug notification because you are subscribed to the
> bug
> report.
> https://bugs.launchpad.net/bugs/718427
>
> Title:
> recoll slows system to a crawl
>
> Status in “recoll” package in Ubuntu:
> Confirmed
>
> Bug description:
> Binary package hint: recoll
>
> I have a lot of files, and recoll slows the system to a crawl when
> indexing. Opening up a terminal takes 8 seconds. Getting the
> "file"
> menu on a web browser requires you to hold the button down for 4
> seconds. Basically, it's hammeringmy disk. CPU usage is not high,
> but the I/O is. This is a reasonably capable system: a 2-core
> CoreDuo, 4 gig of memory, a 7200 RPM disk. It ought not be slow.
>
> Recoll should make use of ionice to reduce the I/O probability.
> And,
> if that's not enough, it ought to sleep in between operations that
> use
> the disk. It should be really really easy to make it a good
> citizen
> instead of a resource hog.
>
> Basically, if it continues to annoy me while indexing, off it goes
> into the bit bucket.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 10.04
> Package: recoll (not installed)
> ProcVersionSignature: Ubuntu 2.6.32-28.55-generic
> 2.6.32.27+drm33.12
> Uname: Linux 2.6.32-28-generic x86_64
> Architecture: amd64
> Date: Sun Feb 13 22:19:09 2011
> InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64
> (20100816.1)
> ProcEnviron:
> PATH=(custom, user)
> LANG=en_GB.UTF-8
> SHELL=/bin/bash
> SourcePackage: recoll
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/recoll/+bug/718427/+subscriptions

Please note that the big text file issue is, as far as I know, fixed in newer Recoll versions. Backports are provided from the recoll PPA:
https://launchpad.net/~recoll-backports/+archive/recoll-1.15-on

jf

Alfagulf (alfagulf) wrote :

Dear Jean,

I am using Recoll 1.15.9 + Xapian 1.2.5 from ubuntu repository.
The link you provided has a newer version (1.16.2-0~ppa1~oneiric1)

Souldn't ubuntu be alerted to this issue so that they include the fixed version in their repository?
Thanks

Gilles Schintgen (shigi) wrote :

I think I'm seeing the same behaviour as the original reporter: recoll is causing huge amounts of IO, which really doesn't play nice with the rest of the system. I recently moved my /home from an SSD (unencrypted) to a classic HDD (luks-encrypted) and now my system often slows down to a (temporary) halt: programs being grayed out while actively working with them.

Whenever this happens I see massive iowait caused by recoll.

I'm running Ubuntu 14.04 on an i5 with 8GB of RAM and a Seagate Momentus 7200.4 drive.
(The system itself resides on a first-gen Sandforce SSD)

Felipe Castillo (fcastillo.ec) wrote :

I'm running Ubuntu 14.04 on an i5 with 16Gb of RAM with and 7200rpm drive, and I also experience this problem. I thought that with 16Gb there shouldn't be any issues, but I get to the point that even my Swap has to be used, and I can't do anything, my systems collapses. I try moving the mouse and it takes forever (up to a minute) to just moved a little bit.
I see the spiky in memory when recoll idexing says it's under "Purge".

Is there something we can do to fix this?

I am trying again to use recoll after a more than a year, but it seems that I will have to remove it again.
It is making my system (i5/16G ram) slow as molasses.

Anyone can suggest an alternative indexing system?

Georg Lipps (georg-lipps) wrote :

I also observed heavy system usage. I used the indexer to run realtime but this slowed down the usage of the system too much especially at startup. As a simple work around I now run the indexer only once an hour.
There should be a possibility that the indexer really runs in the background (ionice) but the GUI not. Any suggestions?

Hi,

All: what recoll version are you running ?

The indexer is niced and ioniced in recent versions, not the GUI, but it does not do any significant work (except possibly during an actual search).

Depending on the recoll version, one thing to try would be to configure down the number of threads that the indexer can use.

In my case, I ended up to put the recollindex in the daily jobs of anacron. The problem is that, whatever I do, when recollindex is running I clearly notice "hiccups" (mostly under the second, sometime of a one-two seconds) when doing things that are normally instantaneous. It happens for example when editing (big) files with vim, or accessing/reading mboxes in thunderbird, or a lot of other things.
I do not really think is a problem in recoll, by the way; there must be some locking/delay going on when inotify signals recoll to read the files for indexing, and Linux with ext4 has never been good on parallel disk loads especailly if you have several disks around. I am at a loss trying to debug this thing...
Version: 1.20.1-1~ppa2~trusty1 on Ubuntu 14.04 up-to-date.

@romano: did you check the recollindex process size when the problem occurred ?

Paging/swapping would have been the most probable explanation for what you were seeing.

It seems unlikely that "ordinary" I/O from the ioniced recollindex could cause problems, but eviction of your applications memory pages by excessive memory usage from recollindex or its helpers quite possibly could.

I never experience this on my system by the way, there must be something in my usage pattern or configuration which prevents it, but I'd be quite interested in fixing the problem anyway.

In any case, getting rid of the problem by using batch indexing is a sane approach, it's quite rare that you would need to search for very recent files.

I have version 1.20.1-1~ppa2~utopic1 on Ubuntu 14.10, and for a long time I have seen this problem.

With ionice I checked that all the recollindex processes had the 'idle' priority but the problem persisted. Then I read that ionice only works for the CFQ kernel scheduler (http://serverfault.com/questions/485549/ionice-idle-is-ignored#602384), and in my system the scheduler was 'deadline'. So I changed the default to CFG (http://askubuntu.com/questions/78682/how-do-i-change-to-the-noop-scheduler#82452). With this change the IO usage for recoll is still close 100% (since the disk is idle then this makes sense). Now my applications still block, although it seems that they recover in less time, anyway it is still very annoying.

I think that maybe my problem is that as recoll reads files in the disk, these get cached by the OS, and thus files from other active applications are taken out of the cache. Could this be it? Is there any way that the files read by recollindex do not get cached by the OS?

By the way, how can I reduce the number of threads of the recollindex?

> By the way, how can I reduce the number of threads of the recollindex?

Short answer: adjust thrTcounts in config file.
See http://www.lesbonscomptes.com/recoll/usermanual/usermanual.html#RCL.INSTALL.CONFIG.RECOLLCONF.IDXTHREADS
or the comments in the default recoll.conf in /usr/share/recoll/examples

About preventing caching for files read by indexing, this can't be fully controlled because a large part of the data is not read by recollindex but by the helper applications.

For files read by recollindex itself, there was an attempt to prevent caching, based on an O_STREAMING flag which does not seem to be implemented on any current system. I guess that fadvise() could now be used instead.

In steady state, little indexing should take place, so a file would have to be really huge to have a significant effect on the page cache. I guess that a really big mbox file could fall in this case though.

What would be really useful would be if someone could correlate recoll activity with the system perturbation. At level 4, most relevant information will be printed to the log file. I'd gladly do it, except that I can't seem to be able to reproduce the issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers