similar words could cause redundant results

Bug #354265 reported by Michael Wayne Goodman on 2009-04-03
2
Affects Status Importance Assigned to Milestone
glot
Low
Unassigned

Bug Description

Thanks to a tip from Francis, we might see similar head words causing redundant results. For instance, we might have:

きれい
綺麗
奇麗
綺麗な
綺麗だ
etc.

All of those represent the same word "kirei" (pretty / clean) The first three could (and likely will) be listed as alternates in the same dictionary. The 4th and 5th have different grammatical features (the first has an connector な, and the second has a copula ("is pretty")). These may appear in other dictionaries. With these, a search for "pretty" could return all of them.

Similar effects could be noticed if we allow case sensitivity in the words, allowing "rose" (flower) and "Rose" (proper name) to be distinct. This may be preferred, but it also may be the case that a dictionary capitalizes all of its words (regardless of whether it is a proper or common noun).

Rather than trying to sanitize or massage the data on import (or as postprocessing), we should seek to filter the search results. This is because we do not want to change the data in any way, as this would break consistency with the source. Also, with filtering, we could take the Google-search approach and only show the top results, along with a link (or button) that says "show omitted results".

Changed in glot:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers