Comment 22 for bug 919424

Revision history for this message
eMTee (realprogger) wrote : Re: /rebuild does not update HashData

It's been a long time but it looks like now finally I've been able to figure out this problem.

Regarding the original report, the part of the complain about hashdata.dat is invalid. It is rebuilding correctly but the initial size of the binary file is 1 MiB so it'll never shrink below that. With hashdata file larger than that it works as expected.

The reasons why rebuilding is more effective after a restart is greatly explained by maksis and restart is still the best practice since when shared files getting removed, their hash information are not removed from the internal memory map containing the hash indexes. At restart the memory map is freshly synced with what has been found in the filesystem so then checking for what items used and what are obsolete is much more effective at that point.

Therefore if you remove files from the share and rehash and do a rebuild then, unless you have obsolete data of removed items in previous sessions, a rebuild operation will not make your hashindex or hashdata file any slimmer.

Regarding what is removed from hashindex and what isn't, david.son is right, as well as his recommended logic of the solution. Currently only items with an unshared TTH are getting removed.

The puzzling thing is that the code responsible for this is pretty logical and would easily allow to do what is expected by david.son.

The first version of rebuild code that is actually doing something with hashindex (and not just with hashdata) is added in https://bazaar.launchpad.net/~dcplusplus-team/dcplusplus/trunk/revision/545
It has been refactored and simplified several times since but they have never added a feature we miss here - even though it'd have been a very small and logically fit change...

Why? I'm not sure. Maybe a simple overlook or rather, back in the days, the exceptionally talented people who created and shaped DC++ thought it is better to have a bit larger index file with items kept for possible reuse than possible re-hashings. This might have been pretty logical 20 years ago thinking about CPU and storage speeds and capacities of consumer computers of the time. An average share consisted of a few thousand files back then...

I'll test the fix on larger shares and most probably add to the next version of DC++. Testers are welcome if there's still anyone who cares...