Comment 5 for bug 1350831

Revision history for this message
Srey Seng (sreyseng) wrote : Re: Browse index punctuation causing multiple entries

I am not sure where to make the indexing normalization changes as implied in the comments in the back-end.

But, was able to fold "duplicate" entries into one by modifying the re-ingest (for when deciding whether to insert a new browse entry or not) to compare only on sort_value from the browse_entry table, instead of on both the value and the sort_value.

With the original comparison, because the insertion criteria is based on both the actual value and the sort_value, even if the sort_value (normalized version) is the same, the value would be different and cause a new insertion into the browse_table, resulting in similar entries appearing in browse results.

With this workaround however, as long as the sort_value or normalized version is the same, the entries are considered the same and will not result in a new insertion into the browse table. However, a potential downside is if, for example, you have three similar entries differing on punctuations, the one that gets ingested first will be the one that displays in browse results (as the rest will get folded into that).

This workaround requires at the very least a re-ingest of the browse entries (if not a total wipe of the browse entries + the re-ingest).