Comment 0 for bug 2009492

Revision history for this message
eMTee (realprogger) wrote : Certain type of changes in the share do not trigger a Bloom filter update which makes such changed files temporarily unsearchable

<eMTee> So not getting a result for a changed file (same path/different content) in the share after re-hashing is because the hub requesting a new bloom filter only if the number of shared files are changed in the INF coming from the client. In common examples like when you share an updated binary or change a text file and reindex this would not happen at all.
<eMTee> Bloom request is only triggered by an SF and not SS in the INF. See https://sourceforge.net/p/adchpp/code/ci/default/tree/plugins/Bloom/src/BloomManager.cpp#L98
<eMTee> And with adding SS to the check there we're still not completly out of water since if the share change is a same path, same size, different content change then it still sucks. Minor editing of a text file or change of a fix-sized metadata e.g. an MP3 IDv1 tag resulting exactly this scenario.
<eMTee> You can change even all of your share in this special way and if you don't change the sizes and number of files then you won't provide hits at all until you do some other kind of share change or reconnect the hub.

[2023-02-28 09:03] <eMTee> So Blom request is based on an inadequate signal that's not enough for all cases.
[2023-02-28 09:07] <eMTee> SS also should be hooked on at the very least but a perfect solution would be something that is signalling the share change in general or the number of re-hashes in the current client session. Or the last rehash timestamp. These signals would be adequate for requesting a new Bloom filter in all cases when it is needed to.
[2023-02-28 09:11] <eMTee> Of course the client could force to send an INF SF after all rehashes in case it supports Blom, but it's pretty ugly to implement in DC++ and, more importantly, it is against the protocol since you send INFs only if some values change and in these special cases we investigate this would mean sending multiple INF SF's with the same value.
[2023-02-28 09:13] <eMTee> "Each time this is received, it means that the fields specified have been added or updated." in https://adc.sourceforge.io/ADC.html#_inf
[2023-02-28 09:17] <eMTee> If an extension is allowed to specify new INF fields then a last rehash timestamp field would probably be the cleanest solution for this both protocol and implementation wise...

Within the currently defined standards another possibility is to do some client side trickery, an ugly hack to slightly fake SF or SS (eg. by incrementing one of them by 1) in each of this special share change case so then that triggers a Bloom update.