Did you mean: diacritics cause erroneous search suggestions, resulting in no hits
Bug #1931625 reported by
Michele Morgan
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
New
|
Undecided
|
Unassigned |
Bug Description
Search suggestions are not derived properly when there are diacritics.
In a concerto database, records exist with the following terms with diacritics:
Bartók, Béla
Dohnányi, Ernst
Konzertstück
Élegie
In the marcxml, the entries for these terms are:
Bartók, Béla,
Dohnányi, Ernst
Konzertstück
Élegie
Searching using the following keyword search terms offer the following suggestions:
Search term - Suggestion
bartock - bart
dohnini - dohn
konzertstock - konzertst
alegie - legie
These suggestions lead to no hits
tags: | added: didyoumean |
To post a comment you must log in.
At first blush, that looks like broken marcxml content in concerto. Those aren't UTF8 characters, but XML entity encoded Latin-1 code page values. We should only be storing actual UTF8 in the database (with the exception of the Famous Five that need to be escaped in XML).