Acquisition entries starting with uppercase letter of extended ASCII character won't be found using lowercase letter

Bug #1968196 reported by Eva Cerninakova
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Confirmed
Undecided
Unassigned

Bug Description

In Angular acquisition search interface, when Title of work, Author or any other search entry starts with extended ASCII character (i.e. the first entry character is written by uppercase letter), the entry won't be found if the search query contains only lowercase letters.
E.g. entry "Čestina" won't be found if the "čeština" is searched.
Some more details gives Gallen in the comment to the bug 1843951.

Tags: acq acq-search
tags: added: acq-search
tags: added: acq
Revision history for this message
Linda Jansova (skolkova-s) wrote :

I have just tested the behavior on Blake's bugsquash2 test server using https://bugsquash2.mobiusconsortium.org/eg2/cs-CZ/staff/acq/po/8 (a purchase order entitled "Čeština - test". Using the search interface (https://bugsquash2.mobiusconsortium.org/eg2/cs-CZ/staff/acq/search) I can confirm that when one searches for a purchase order name starting with Čeština (uppercase Č), the line items are successfully found but when čeština (lowercase č) is searched for, no results show up.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Jeff Davis (jdavis-sitka) wrote (last edit ):

It looks like acq search uses the Postgres ILIKE operator, which is supposed to do case-insensitive string matching. I did a search where Line Item Attribute - Title of work contains "Čestina" and it produced this cstore call:

CALL: open-ils.cstore open-ils.cstore.json_query.atomic {"order_by":{"jub":{"id":{}}},"from":{"jub":{"acqlia":{"field":"lineitem","type":"left","fkey":"id"}}},"limit":10,"offset":0,"select":{"jub":[{"column":"id","transform":"distinct"}]},"where":{"-and":[{"+jub":{"id":{">=":"0"}}},{"-exists":{"select":{"acqlia":["id"]},"where":{"-and":[{"lineitem":{"=":{"+jub":"id"}}},{"definition":"1"},{"attr_value":{"ilike":"%\u010cestina%"}}]},"from":"acqlia"}}]}}

I tested ILIKE directly (using the psql command-line client in a UTF-8 environment) in an Evergreen database that uses C collation, and it didn't handle case folding properly for the example provided:

evergreen=# select case when 'čestina' ilike 'Čestina' then true else false end;
 case
------
 f
(1 row)

evergreen=# select case when 'hello' ilike 'Hello' then true else false end;
 case
------
 t
(1 row)

However, specifying a collation directly in the query seemed to work:

evergreen=# select case when 'čestina' collate "C.utf8" ilike 'Čestina' collate "C.utf8" then true else false end;
 case
------
 t
(1 row)

(Comment edited with new results since there was a typo in my previous tests.)

Revision history for this message
Linda Jansova (skolkova-s) wrote :

Thank you very much for looking into this issue, Jeff!

Revision history for this message
Jeff Davis (jdavis-sitka) wrote :

One solution would be to add a tsvector search index column for searchable acq fields containing normalized values. That's what we do for patron name searches: you're not searching the various name fields directly, you're searching the normalized values in the actor.usr.name_kw_tsvector column. With patron names the normalization is done using the evergreen.unaccent_and_squash function; that might suffice for acq search as well, but there are probably more robust approaches, e.g. https://www.2ndquadrant.com/en/blog/unicode-normalization-in-postgresql-13/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.