Comment 16 for bug 131983

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

This version is improved, and clearly a lot of work has gone into it. But I still see problems with it affecting the rest of the system.

I deselected "Enable Indexing" and "Enable Watching" in the Indexing Preferences (first tab, General), but it still indexed, even after rebooting. Because of this, I had to ununstall tracker completely, to get my system usable with the old version.

I decided to give this new version a try, as it's clearly had a lot of work done. So I installed it, and this is what I found. I'm using tracker-0.6.2-0ubuntu3:

1. I see it still insists on indexing even though the "Enable Indexing" preference is deselected. This is either a bug or a misleading UI. It means if I don't want indexing, even temporarily, I either have to "killall -9 trackerd" or uninstall the package. It would be much better to be able to disable indexing, and re-enable it when I don't mind the impact.

2. The first time it ran (the new version), my laptop slowed to a crawl after about 5-10 minutes. I thought this might be the old disk I/O problems, but it turned out to be excessive VM usage. trackerd was using 900MB virtual memory, of which 704MB was RSS. I have 1GB RAM total, and a Gnome desktop plus Firefox needs about half of that, and so everything ran very slowly. I think needing 704MB RSS is excessive for an indexer, and probably indicates a bug. I got out of this by killing trackerd (-9).

3. The next it ran was after a power cycle. This time, for a couple of hours it stayed quite small (10M RSS), nice. But it was using 100% CPU, and hardly any I/O according to the Disk Usage monitor. A quick strace shows it doing repeated SQLite commits (and creating and unlinking a temporary log file with each commit). But crucially: it's doing this and no other system calls. This means it's doing a lot of commits, but not indexing anything. It's not reading my filesystem at all. Also, presumably it should never use 100% CPU for a sustained long time, if it's on the maximum throttle setting (which it is).

4. Despite the low I/O activity according to Disk Usage monitor, it's actually I/O bound. What's happening is that every small SQLite write create a log file on /tmp, writes to that, calls fsync, writes the main file, calls fsync on that, then unlinks the log file. Perhaps it isn't obvious: those sequential fsyncs on the main database will be causing tracker to run a lot more slowly than usual, and they also force the disk head to remain close to the filesystem logging area (fsync only has to commit the log, nothing else). I noticed with the earlier version that _this_ is sometimes the cause of "kills disk I/O", not the reading for indexing, not the inotify watching, but the continuous rapid rate of fsync calls on the database. The solution to this is to aggregate many db writes into single transactions, to reduce the fsync rate safely. For an indexing application like this, you can use a timer to decide when enough writes have been gathered and a commit should be done, so that fsyncs are rate limited by time.

So, I still find that I cannot use tracker on my laptop for now. But I have these suggestions, which might make it possible to use in future:

1. Fix the occasional massive memory usage. I suspect this is a bug you would want to fix anyway because tracker is advertised as a small, low footprint program.

2. Fix the Indexing Preferences so that deselecting "Enable Indexing" actually does disable indexing, until you turn it on again. I could understand if the preference didn't have any effect until trackerd is restarted (although that would not ideal), but this doesn't turn off indexing even after a reboot, which makes no sense.

3. Fix the state where it's spending 100% CPU doing lots of small writes to the database with fsync commits, without apparently doing any filesystem indexing. Is this caused by the SQLite incremental BLOB writes, perhaps? Perhaps this would be fixed by the next item:

4. Don't do a full commit after every write to the database. Aggregate them in transactions, so that a disk commit (fsync) happens at a limited rate. Ideally limit the rate using a timer, plus a limit on the amount of uncommitted data. This will make a big difference to disk I/O for other applications in some circumstances because of interactions with disk seeks, even when it looks like there's very little I/O caused by trackerd in statistics. But even better: it will probably make trackerd much faster at writing to the database, and use much less CPU and less power, all of which can only be good.

Finally, here's a couple of suggestions which aren't showstoppers for me, but may be useful:

5. I noticed that trackerd says "Tracker version 0.6.1" but the package installed is 0.6.2-ubuntu3.

6. The SQLite log file is created in /tmp, which is volatile: it's empty after a reboot. This seems to defeat the purpose of a log file, which is to be able to recover the structure of the database file, and committed data, after a system crash. If the log file is not there after a crash and reboot, then the database file may have a corrupt structure. Or does SQLite not need the log file to ensure the database structure after a crash? In which case, why is it created? ;-)

Thanks for all your work so far. It's great to see improvements have been made in response to earlier feedback, and you've obviously put a lot of work in.

Though I will always disagree that "power users can just disable tracker", especially with the UI not doing that, and also power users, and people with lots of documents and text, are surely the people who would find tracker most useful! Much better would be if it worked well for everyone, and it looks like that may be the case eventually :-)