Return more relevant search results

Bug #1265889 reported by maksis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
AirDC++
Fix Released
Medium
Unassigned
DC++
Confirmed
Medium
Unassigned

Bug Description

Currently the client returns results for the items that are first found. It could prefer returning results from different folders and select the returned results in a better way (compare the match positions and the number of matched words, possibly the modify date and download counts and so on).

Night (night.)
Changed in airdcpp:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Night (night.) wrote :

The fact that it would even prefer to return results that actually match the search term instead of what lies under the folder that matched, would be a nice improvment for starters.

Revision history for this message
maksis (maksis) wrote :

It seems quite clear that the client should move on to other locations when the list of search terms to match gets empty after matching a directory name. But what should happen if there aren't enough matching items to reach the maximum result count? Should it then enter those fully matched directories and return items from them?

In a way this is also a response to http://forum.dcbase.org/viewtopic.php?f=55&t=695

Google doesn't support wildcards/regexp and it's still quite widely used, so making things harder for the users doesn't feel like the preferred alternative to go with (given the issues with the current implementation).

The worst searching case I can think of is that when the user wants to find items where the title consists of plain number(s) or some really common word(s) that match lots of shared items. Preferring matches where the search terms are found from the beginning of the item name should help with this. The client could check that the order of the matched words is the same and that they are matched right after each others (words can be repeated in the name though). Word separators among the shared items may differ so including those directly in the search string may not be enough.

poy (poy)
Changed in dcplusplus:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
maksis (maksis) wrote :

https://github.com/airdcpp/airgit/commit/c395d994b87f55c35a4584551d65a2c34df574a2

My implementation will also ignore partial matches from the parent directory names if they are all less than 3 characters long (I can't cases where such matches would be wanted). It won't go deeper when all words are found from the directory name, unless the "file" type is used for searching (or extensions are set). There may be changes in regards of that though....

Statistics: http://pastebin.com/yvrJ3n09

Changed in airdcpp:
status: Confirmed → Fix Committed
Revision history for this message
maksis (maksis) wrote :

Only changing the search responses wasn't that complete idea as clients will still less relevant replies if there is nothing better available :p I also made it order the displayed search replies based on a combination of the word matching scores and the number of hits. This also makes it much easier to test the scoring system. The current version will also check word boundaries when counting the relevancy scores.

Changed in airdcpp:
status: Fix Committed → Fix Released
John Olsen (jroart-c)
Changed in dcplusplus:
status: Confirmed → Fix Released
eMTee (realprogger)
Changed in dcplusplus:
status: Fix Released → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.