Rebuilding collection eats up too much memory

Bug #133554 reported by Fernán González
8
Affects Status Importance Assigned to Milestone
amarok (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: amarok

When I rebuild my collection, there's a process exploring my collection which takes around 250MB of RAM. I only have 512 MB so this really hinders the system's performance. In fact I left the room and when I came back, after using the touchpad, my screen didn't "turn on" until 5 or 10 minutes later.

I have around 20,000 tracks on an external hard drive (NTFS).

Revision history for this message
Jeff Mitchell (jefferai) wrote :

This is most likely not an issue with Amarok's scanner but with your NTFS driver. What NTFS driver are you using?

You should *NOT* be using Captive. If you are, stop using it immediately (besides extremely bad performance it can do bad things to your filesystem).

The best thing to do is install FUSE and use NTFS-3G. Let us know what your setup is like.

Revision history for this message
David Jaša (dejv) wrote :

I can confirm big CPU usage even with ntfs-3g, when files are modified (i.e. if I retag them). It's good to have a look to some system monitor and see which process is real culprit.

Revision history for this message
Fernán González (fernangonzalez) wrote :

Jeff Mitchell: How can I tell which NTFS driver I'm using? I only checked the boxes in ntfs-config.

David Jaša: I am experiencing that problem too.

Revision history for this message
Jeff Mitchell (jefferai) wrote :

I don't know what ntfs-config is. A quick Google shows that it has something to do with a GUI way to set up ntfs-3g, since it's so difficult to set up. You could have Googled this yourself. Or, learn how to use "mount" or /etc/mtab.

What version of ntfs-3g are you using? Try "man dpkg"

You should try some other things. Doing the collection scan reads a bit of each file, so maybe ntfs-3g on your system isn't performing well. Try figuring out what's really eating up your CPU; "htop" may help as it's much more friendly than "top".

Revision history for this message
Szabolcs Szakacsits (szaka) wrote :

Please note that the original bug submission was about high memory, not CPU usage. High CPU usage can happen with any file system in certain cases during heavy file activity.

So, what is the process whihs uses 250MB during running Amarok?

The "screen off" effect is a known kernel problem, probably you have too much swap and slow disk and CPU.

Revision history for this message
Fernán González (fernangonzalez) wrote :

Jeff Mitchel: I just checked Synaptic to see what my NTFS driver was. Package "ntfs-3g" is in it's latest version (1:1.328-1) and the description says it's for FUSE.

Szabolcs Szakacsits: the process using so much memory is "amarokcollectionscanner".

I have tried rebuilding my collection now, I hope this adds more information to this report: amarokcollectionscanner was only taking 11 MB in the beginning. When it had scanned 50% of the collection it was using 223 MB. The system was unusable. Around 59%, it became usable again and it kept scanning my collection but the process had disappeared, only amarokapp remained open, using quite much CPU but memory usage was good.

Revision history for this message
Jeff Mitchell (jefferai) wrote :

That's a very old version of ntfs-3g. They've had six stable releases since then, fixing bugs ranging from metadata issues, permissions problems, and even corruption. Again: Google is your friend.

What version of TagLib do you have installed? When was the package built?

Amarok won't keep scanning the collection if there's no amarokcollectionscanner process going, so if amarokcollectionscanner disappears at 59% it maybe died and didn't actually scan your whole collection.

You should look in the directory ~/.kde/share/apps/amarok/ and see what's there. There are normally some collection scanner-related files in there. If you watch that directory when you start a full scan, and you watch those files (i.e. with "tail -f"), you might see it getting stuck and using huge memory on a particular file or group of files. If this is the case, try moving that file(s) out of your collection directory and try the scan again, and see if it gets past that point without the huge memory usage issue. If it does it may hang up on another file...rinse and repeat.

See how that goes; if it does help, then we should find a way for you to get that file(s) to me so that I can do some debugging on it.

Revision history for this message
Fernán González (fernangonzalez) wrote :

Thanks for your assistance Jeff. Further down the post you can find results from tailing during the scan, I hope they're helpful.

The problem about the newer versions of ntfs-3g is that I can't get them through Synaptic or Ubuntu updates. I am not a very experienced user in installing these things on my own, so I'll have to wait.

I think Amarok kept scanning the collection despite amarokcollectionscanner's crash, since the progress bar kept going on, advancing and the computer became ununsable again at some point. Maybe it relaunched the process?

About my TagLib, I have 1.4-4build1, but I can't find the date when it was built.

The files I found in the amarok folder you mentioned regarding collections are these:

collection.db
collection_scan.files
collection_scan.log

-First try: I tried using "tail -f collection_scan.log", but only a line showed during the scan, which was already in the file there before beginning the scan, it was an m4a file.

-Second try: I moved that m4a file to another folder not being part of the collection, but I would still have the same problem. This time, a .torrent file appeared in that log file.

-Third try: I didn't move this .torrent file, and tailed collection_scan.files (I emptied it first), and it got stuck at 52%, when it found this .torrent file according to tail.

Does it give any hint? Tell me if you would like to have these two files so you can test it yourself and how I can get them to you.

Revision history for this message
Jeff Mitchell (jefferai) wrote :

Fernán,

You should pester the Ubuntu packagers for updated ntfs-3g. I'm really surprised they're not on top of it, considering the risks of bugs in filesystem drivers.

Anyways, Amarok will indeed relaunch the scanner process a number of times.

As for the .m4a and .torrent files...

Amarok distributes addons to TagLib that should handle m4a files, but there's always a chance it's buggy...that, or the Ubuntu maintainers may remove that capability from the build, in which case the collection scanner can't handle those files...might want to check with the package maintainers about that.

So, what appears to be happening is the the scanner is hanging on non-music files, or music files it can't parse for some reason. I've seen this before, but I'm not sure of the cause...as far as I'm aware, it's supposed to detect files it can't handle and skip past them...but I could be wrong, as I didn't write the scanner code. Sometimes there are legit music files that may have had tags improperly written...or sometimes, legit music files or files that TagLib should skip over but some bug or another in TagLib causes it to act haywire. Unfortunately there hasn't been a TagLib official release since 1.4, even though there have been a ton of patches and bugfixes to the source code, so how many of those bugfixes you have depend on when the package was built, and what patches they added (or if they took a snapshot of the Subversion tree). So another thing you could try is pestering the TagLib packager to make a package of a current Subversion snapshot, and if they do it, see if that helps anything.

But in the end, I think your best bet is to clean out the files that the scanner seems to be stopping on. Each time you hit a file that it seems stuck on, move it to an alternate directory and try again. If you do this, I'd like to try to get some of those files from you so that I can test them out here and try to fix the problems (if these are the cause).

Hope that helps.

Revision history for this message
Fernán González (fernangonzalez) wrote :

Thanks. I've already requested newer versions of these packages. I will keep the rinse-and-repeat method, I hope it's not many files. So far I have 2 (it takes a lot of time everytime I do this), when I get some more I'll get back to you if you want to check.

So our possibilities here are: ntfs driver, taglib or amarok (collection scanner) bug.

I think we can leave the ntfs driver out of this since I just realised that the .torrent file I mentioned was in my home directory, not in my NTFS external harddrive.

I was wondering if you had a newer TagLib version/build than I do (1.4-4build1), so you could check my files out to see if this was just a TagLib problem already fixed or still a possible amarok bug. If you are interested in doing this, tell me how many of these files you would like to get to test this and how to get them to you.

Revision history for this message
Jeff Mitchell (jefferai) wrote :

I don't use Ubuntu so I can't really comment on whether my taglib is newer, but it's a distinct possibility. I'd be happy to check them out. Best way is to contact me on IRC at ferai or jefferai...let me know if that doesn't work for you.

Revision history for this message
Jeff Mitchell (jefferai) wrote :

Okay, I took the nine files you gave me (m4a, mp3, and 6 .torrent files) and put them in a directory (along with the .tar.gz file too), and had Amarok scan it. No problems whatsoever. Scanning took an instant and the MP3 file showed up in the collection (the m4a file didn't, but it appears to have no metadata, or else I couldn't read it...it did play correctly though and I do know people with collections with many m4a files so I think that's a local issue).

So here's what I would do:

1) Move these files off of the ntfs-3g partition, and onto a normal ext3/reiser/etc partition. Put them in a folder there, configure Amarok to scan that folder as well, and see what happens. If it suddenly works, and moving them back to a folder on the ntfs-3g partition makes them not work again, then that narrows the problem down significantly.
2) Upgrade Amarok to 1.4.7 if you aren't running it already, or if no package is available pester your distro. Just cause, the latest version of Amarok is always the greatest :-)
3) If it still persists, try bugging for a new taglib snapshot...or check out the sources from SVN and build it yourself and see if that helps.

If you've done all that and the problems not solved, I don't have any good answers for you by this point (use Gentoo? :-) )

Revision history for this message
Fernán González (fernangonzalez) wrote :

Nice that you checked that out so fast.

1) Ironically, none of those files I sent you was in the NTFS partition, they were in my home directory. The rest of my music is in the NTFS partition, though. That's why I think this doesn't have to do with ntfs-3g.

2) Yes, I have Amarok 1.4.7 :).

3) Already tried that, so let's hope they answer :)

So, I guess I can't do much but wait until I get a newer TagLib, try again and come back with the outcome. Thanks for your assistance.

Revision history for this message
David Jaša (dejv) wrote :

Do you still experience difficulties with NTFS drives in such situations? As for me, this got at most cases settled in Gutsy and Hardy (I'm using ntfs-3g)

Revision history for this message
Fernán González (fernangonzalez) wrote :

Unfortunately I am not using Linux on my laptop at the moment because of issues with my hard drive. But thanks for asking anyway!

Revision history for this message
Lydia Pintscher (lydia-pintscher) wrote :

Marking as fixed since the original reporter is unable to reproduce any longer and others report it as fixed.
Please reopen if it still is a problem for you.

Changed in amarok:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.