Scanning media crawls or stalls with large number of files

Bug #1303072 reported by Claire Dechon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Basenji
New
Undecided
Unassigned

Bug Description

I like the Basenji program, it being a bit quicker than CDCAT and it has the thumbnails feature. The only downside I've experienced on a couple od systems is that the programs really crawls or completely stalls when I am scanning my DVD disks of data. It generally stops at about 750 files . I don't know code but is there a cache limitation somewhere that is causing this that I may be able to configure for a larger cache?? How can I get past this for unlimited media scanning?

Revision history for this message
Patrick Ulbrich (pulb) wrote :

Could you try to launch Basenji from a terminal (basenji --debug) and check if there are any debug messages?
Please also try to disable thumbnail extraction and disable symlinks in the prefs dialog.

Revision history for this message
Claire Dechon (c-dechon) wrote :

Thanks for the alternative way to start. I did n't have any errors ( aside from an occasional unable to open file , permisision denied and a few other font mentions which have to do with the tango setup. It would seem that disabling the symlinks did the trick so far. What is the symlinks and what does it do?

Revision history for this message
Patrick Ulbrich (pulb) wrote :

Symlinks (symbolic links, see http://en.wikipedia.org/wiki/Symlink) are special files that have no content and point to other files. With symlink support enabled, Basenji tries to resolve those links to their target files by
1) storing all scanned files in a in-memory lookup table during scanning
2) storing all symlinks in an array in memory during scanning
3) doing the actual symlink -> targetfile lookup and database storage *after* scanning completed successfully.

It's important to note that the more symlinks your scanned media contains the longer step 3) needs to complete. Unfortunately there is no progress indication for this step. So it's very likely that your DVDs contain a lot of symlinks and Basenji appears to be stalling while it actually resolves and stores all the symlinks. It's also possible (but not that likely) that Basenji runs out of memory while filling the in-memory lookup table during scanning . You can verify that easily by monitoring Basenji's memory usage in your system monitor application.

If you disable symlink support, Basenji simply skips all symlinks, i.e. they won't be included in your index database.

Revision history for this message
Claire Dechon (c-dechon) wrote :

OK..so it happened again. one of the larger disks. Preferences set for making thumbnails , hash no, metadata, no, sound no,
symlinks disabled. There was no error in the terminal during the freeze up. The scan never showed completed and it scanned about 1/2 of the disk. I checked the system tasks and it showed RSS 47.9 mb and VM 947.8mb. There must be something else apaprently of memory size it works with . I have 16 GB of ram on my system. Is it a swap space or evince thumbnail database size or something else??

I just tried it again with thumbnails ruled out and it progressed. Before it had stalled on a pdf with thumbnails generated. The files after that were simple jpgs..nothing it's had problems with before on both accounts. It is not worth not having the thumbnails as its really important for the many pics I have...so I'm back to square one trying to figure out why it stalled.

Revision history for this message
Patrick Ulbrich (pulb) wrote :

What is the last "Indexing file XYZ" output if you rescan the media with basenji --debug?
Can you reproduce the problem with a newly created database (File menu -> New Database)?

Revision history for this message
Patrick Ulbrich (pulb) wrote :

Stupid question: do you have enough free harddisc space?

Revision history for this message
Patrick Ulbrich (pulb) wrote :

Please also check if there are any evince or totem thumbnailer processes running and keeping the cpu busy.

Revision history for this message
Hanni Hernandez (medusa569) wrote :

OK guys 3 answers for all.......i have a new 1TB HD and have used only about 9.7%. I did recreate the last problem disk with both the old database and a new database. The results were the same, freezing on the same file with no error notice from debug....
the same task was running at both time in task manager and that was evince. Below is the file name where the program froze and the task showing in task manager at the time in both conditions.

[VolumeDB DBG]: Indexing file '/media/laptemp/BACK/RCA RemoteControl.pdf'

evince-thumbnailer -s 128 file:///media/laptemp/BACK/RC%RemoteControl.pdf/tmp/.gnome_thumbnail.M7JFEX

Revision history for this message
Hanni Hernandez (medusa569) wrote :

BTW I suppose I should mention that my memory usage was about 610 MB out of 15967MB. and the cpu usage was about 3% max at the time of the freeze. Swap is ) and 1630GB cache.

Revision history for this message
Patrick Ulbrich (pulb) wrote :

Thanks for your tests. The mem usage is pretty high but I suppose that's because your media has many files which need to be cached in memory for the symlink lookup.

The actual problem seems to be the evince thumbnailer process which is utilized by Basenji to create the pdf thumbnails. I'm afraid that's something I can't do much about. One workaround would be to remove the problematic file temporarily, onother woud be to try killing the process. You could also disable thumbnails completely for that specific media, but as you mentioned already that'd be no option for you.

Could you report your observations in the Evinve bugtracker? It's available here: https://bugzilla.gnome.org/enter_bug.cgi?product=evince

Revision history for this message
Hanni Hernandez (medusa569) wrote :

I will look into that bug site you gave me. If i find a solution I'll let you know however I'd like to also cast a vote for the recommendation of having a selection of file types for which to produce thumbnails. Someone else suggested that and I was thinking about that for a long time. Let me again say thank you for your program Basenji! It howls :-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.