similar words could cause redundant results
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
glot |
New
|
Low
|
Unassigned |
Bug Description
Thanks to a tip from Francis, we might see similar head words causing redundant results. For instance, we might have:
きれい
綺麗
奇麗
綺麗な
綺麗だ
etc.
All of those represent the same word "kirei" (pretty / clean) The first three could (and likely will) be listed as alternates in the same dictionary. The 4th and 5th have different grammatical features (the first has an connector な, and the second has a copula ("is pretty")). These may appear in other dictionaries. With these, a search for "pretty" could return all of them.
Similar effects could be noticed if we allow case sensitivity in the words, allowing "rose" (flower) and "Rose" (proper name) to be distinct. This may be preferred, but it also may be the case that a dictionary capitalizes all of its words (regardless of whether it is a proper or common noun).
Rather than trying to sanitize or massage the data on import (or as postprocessing), we should seek to filter the search results. This is because we do not want to change the data in any way, as this would break consistency with the source. Also, with filtering, we could take the Google-search approach and only show the top results, along with a link (or button) that says "show omitted results".
Changed in glot: | |
importance: | Undecided → Low |