Search highlighting not working for words with diacritics

Bug #1762363 reported by Linda Jansova
46
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Evergreen
Confirmed
Undecided
Unassigned

Bug Description

When words such as "veda" (ASCII-only) are entered into TPAC search box, Evergreen 3.1.0 correctly highlights the search term in search results.

However, when words such as "sborník" (containing a non-ASCII character "í") are entered, the word fails to appear highlighted in search results.

Example:
https://eg-test.osvobozena-knihovna.cz/eg/opac/results?query=sborn%C3%ADk&qtype=keyword&fi%3Asearch_format=&locg=1&detail_record_view=0

Revision history for this message
Linda Jansova (skolkova-s) wrote :
Revision history for this message
Eva Cerninakova (ece) wrote :

I have tested this in EG 3.1.1 and in EG master. In both cases the problem still exist.
Words containing non-ASCII characters are not highlighted in search results

If the phrase is being searched containing both word without non-ASCII character and word with non-ASCII characters, only the "non-ASCII part" of the search query highlighted in search results (see the image).

Kathy Lussier (klussier)
tags: added: displayfields highlighting
Revision history for this message
Mike Rylander (mrylander) wrote :

This is a sided effect of long-standing (and, until highlighting) user-invisible search code.
 Specifically, it's because we force the use of search_normalize() against both indexed and search input terms, which strips diacritics. This is addressable, but unfortunately fairly non-trivial.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Michele Morgan (mmorgan) wrote :

This is a problem for catalogs of all languages. Words and names with diacritics are not uncommon in all catalogs. Here are a few examples of words/names containing diacritics from our English language catalog:

Authors, artists, etc.:
Charlotte Brontë
Carlos Castañeda
Federico Garćia Lorca
François Rabelais
Mary GrandPré
Paul Cézanne
Camille Saint-Saëns

Names and loan words from French and other languages:
Esmé
Renée
exposé
fin de siècle
Les Misérables

I'm attaching screenshots of searches of some of the above terms to illustrate the issue.

Revision history for this message
Michele Morgan (mmorgan) wrote :

Catalog search for Charlotte Brontë

Revision history for this message
Michele Morgan (mmorgan) wrote :

Catalog search for Paul Cézanne

Revision history for this message
Michele Morgan (mmorgan) wrote :

Catalog search for exposé

Revision history for this message
Michele Morgan (mmorgan) wrote :

Catalog search for Les Misérables

tags: added: opac
removed: tpac
Revision history for this message
Eva Cerninakova (ece) wrote :

Still the isssue in Bootstrap OPAC, see the attachment

Revision history for this message
Linda Jansova (skolkova-s) wrote :

It is still the case in the most current Evergreen; confirmed using a Mobius test server available at https://bugsquash2.mobiusconsortium.org/eg/opac/home. Please see the attached screenshot.

Revision history for this message
Christine Morgan (cmorgan-z) wrote :

Confirming that this is still an issue. Testing on https://terran-main.gapines.org/ during Bug Squashing Week.

Revision history for this message
Linda Jansova (skolkova-s) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.