Include the HASH cache to speed future searches

Bug #242699 reported by Kapis
6
Affects Status Importance Assigned to Milestone
SubDownloader
Confirmed
Wishlist
Kapis

Bug Description

Reported by capiscuas, Jun 04, 2008

the hash values of the avi files will get saved in a cache so next time it
will speed up the future searches of the same AVI's.

This was implemented in the old 1.2.9 by some fan.
http://forum.opensubtitles.org/viewtopic.php?t=145&postdays=0&postorder=asc&start=150

Revision history for this message
Rolf Leggewie (r0lf) wrote :

Is this really worth the trouble of more code? The way I understand it, several GB worth of video take only a few seconds to hash (not the complete file is being hashed)

Changed in subdownloader:
status: New → Incomplete
Revision history for this message
Kapis (capiscuas) wrote :

I agree, currently Subdownloader does a very fast hashes for many GB of videos, coding this will take more effort and complexity to the software.

Changed in subdownloader:
importance: Undecided → Wishlist
status: Incomplete → Won't Fix
Revision history for this message
James (owyjxnimlbcm) wrote :

I add this feature to the previous version. I dont get term "trouble of more code". Coding change what I've done was sent to author, so code already exists.

Hashing ~100GB movies are long (1.7GHz, 1GB Ram, 7200 ATA disk), and search in stored .txt files is a lot quickier. Anyway, user can choose between Enable and Disable this option..

I vote 100% yes for this feature.

Revision history for this message
Kapis (capiscuas) wrote :

Hi James, SD2.0 won't use relative text files to store information, we are using QT preferences for that now, so we'll have to modify about your code to make it fit with the new policy.

I agree that this feature may be useful for SD fans.

Can you help us to integrate it?

Thanks.

Revision history for this message
James (owyjxnimlbcm) wrote :

I would like, but i have reinstalled computer, and dont have python /qt installed, unfortunatelly I lost source codes too, and http://www.edisk.cz/stahni/44900/Modified_files.zip_13.13KB.html doesnt work anymore..

But changes was very simple, and you maybe have source codes which I once send to you.. I had problems with the whole python/qt thing only (file handling,compiling, final file size.. etc).

I woud like help, but I dont know, if (and how many) time I will have to write this.. I must finnish my Diploma work, and I have free time for it only on weekends..

Can try this (must download some py/qt and sources i guess?), but experienced py/qt programmer can implement this so much quicker than me.

It is only about one additional variable, if true, then before hashing check filename.hash, if exists then skip hash and load filestring, if doesnt exists then hash and after that store hash to filename.hash.

Revision history for this message
James (owyjxnimlbcm) wrote :

Have anyone my modified getHash function?

Revision history for this message
James (owyjxnimlbcm) wrote :

Ok, I install py/qt, Eric4 IDE and type some stuff.
Modified files are:
gui\preferences.py
gui\preferences_ui.py
videofile.py

All my changes are commented with word James. For me saved hashes works..

http://img232.imageshack.us/img232/9342/snap1dk5.jpg

Kapis (capiscuas)
Changed in subdownloader:
assignee: nobody → capiscuas
importance: Wishlist → Low
status: Won't Fix → In Progress
Revision history for this message
James (owyjxnimlbcm) wrote :

But I cant compile sources :( windows_installer.py doesnt work for me:

line 34 error:
ImportError: No module named subdownloader..

Any help? :)

Revision history for this message
opensubtitles (j-admin-opensubtitles-org) wrote :

I like idea of storing hashes. Some users use SD for network drivers, where they have 500+ GB movies, so if it is offline, it is a lot faster. Also I am not sure how hashes thing is implemented - it is written in each movie dir, or it is in one file stored where is SubDownloader ?

Revision history for this message
eduo (eduo) wrote :

James: Latest code revision should fix all the troubles with "module named subdownloader" (path issues in the way modules were being called).

I have all my movies in network drives. Scanning 500GB of movies, in dozens of directories, for a thousand or so files (TV episodes) takes around 1 minute. That's BLAZINGLY fast, all things considered. And that's, also, a fringe use that is not common (nor should it be).

There are some problems with caching hashes, which are not easily solvable from a User Interface point of view:

-How is the cache built? How can you know what's in the cache or not? Can you add to the cache? Delete from the cache?

-How does the program deal with changed videofiles? Checking the cache against the existing file defeats the purpose of the cache but if you download a better version of a movie the cache is not valid any more.

-How does the program handle moved/renamed movie files? How does it handle if the network drive is mounted slightly different?

-What happens if subtitles are found against the cached hashes, but the network drive is not mounted and the preference is there for downloading to the same path as the movie?

I like the idea of caches, I've always liked caches. But I think it's not simple or straightforward. Especially from a User Interface point of view (which is what I'm referring to in all cases, I already know how to implement it technically, that's not the issue).

Revision history for this message
Kapis (capiscuas) wrote :

Hi james, I think we need to improve a bit the way of handling the hashes. as eduo mention, the full filepath of the video cannot be used as index for our hashes because it's common of users to move their files.

Also having 1 .hash file per file it's not very clean i think, I'm thinking this way.

I propose use Qt Settings for a faster storage of the hashes,
the indexes can be the filesize+filename, the information will be the hash.

filename is weird that it's gonna be changed, so this double key can assure us that we are talking about the same videofile.

A second possibility is just having 1 hash file and the line will have our filesize+filename index followed by the hash value. I think this will be an slower procedure because it requires to read 1 by 1 all the lines.

Revision history for this message
eduo (eduo) wrote :

I don't believe using the settings is a good idea, settings should be just that and not change once set.

If I wante this functionality (I don't) I'd vote for a separate database (sqlite) for this.

Also, it would be a good idea to consider an expiration date for cached hashes. If there's been more than, say, a month since the hash was capture into the cache then try to refresh it. If it can't be refreshed then it is deleted.

I think a note needs to be made. We're talking about two different things here:

1.-Cached hashes: This is storing the cache (the time-consuming part of the process) so it doesn't need to be generated every time for existing videofiles. This caching assumes the videofile is available but speeds up the process.

2.-Offline access: This is not caching AT ALL. This means keeping a separate database with the necessary information to check for subtitles without having the video files available at the moment. An example would be storing the videofiles in a "multimedia portable drive". You may have it connected to the TV most of the time, but you want to check if subtitles have become available for the files in it.

The processes for both may be similar but they are two very distinct situations that need to be dealt separately. The first focuses on overall speed, the second on availability.

Kapis (capiscuas)
Changed in subdownloader:
importance: Low → Wishlist
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.