ISBN search only finds records with both 10 and 13 digit ISBNs

Bug #1743613 reported by Jason Boyer
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Invalid
Undecided
Unassigned

Bug Description

Eg 3.0.2

After receiving reports of records not showing up for ISBN searches I noticed that the SQL query generated for ISBN searches becomes a search for the 10 digit and 13 digit ISBNs in a single record because these lines:

to_tsquery('simple', COALESCE(NULLIF( '(' || btrim(regexp_replace(translate_isbn1013(E'9780899978482'),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), '')) AS tsq,
      to_tsquery('simple', COALESCE(NULLIF( '(' || btrim(regexp_replace(translate_isbn1013(E'9780899978482'),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), '')) AS tsq_rank

turn that ISBN into this ts_query:

'9780899978482' & '0899978487'

Which does exactly what it looks like, requires both values to be considered a match.

This can be corrected by changing the regexp_replace call to use a pipe rather than an ampersand:

to_tsquery('simple', COALESCE(NULLIF( '(' || btrim(regexp_replace(translate_isbn1013(E'9780899978482'),E'(?:\\s+|:)','|','g'),'&|') || ')', '()'), '')) AS tsq,
      to_tsquery('simple', COALESCE(NULLIF( '(' || btrim(regexp_replace(translate_isbn1013(E'9780899978482'),E'(?:\\s+|:)','|','g'),'&|') || ')', '()'), '')) AS tsq_rank

I assume this change should be limited to ISBNs because of their unusual nature but I don't have time to look into that today and wanted to get this down while it's fresh in my mind.

Revision history for this message
Mike Rylander (mrylander) wrote :

I don't think we want to do a blanket change there, as that is a construct used by all normalizers.
 It would be wrong to join the output of, say, the date range normalizer with | instead of &. I think we'll need to teach the normalizer mapping table and the code that constructs this clause about how the output should be joined.

Revision history for this message
Mike Rylander (mrylander) wrote :

Hrm... hold on a second here. If the normalizer was properly applied to the data coming out of the record, the index_vector column in mife would contain both versions, and the search would match.

Jason, did you perhaps recently add the modifier without performing a reingest after?

Revision history for this message
Elaine Hardy (ehardy) wrote :

I did an ISBN search for a record in the PINES database with just a 10 digit ISBN and was able to retrieve the record.

Revision history for this message
Jason Boyer (jboyer) wrote :

Nothing has been changed on our normalizers or the isbn field recently and there was a reingest after the last upgrade that required it but I don't recall ISBN handling changing in ages. (Manually forcing full reingests didn't help with the examples I've been given.) I'll double check some things.

Revision history for this message
Jason Boyer (jboyer) wrote :

Ah ha.
What has changed recently is our definition of oils_tsearch2. (applies normalizers to metabib.identifier_field_entry to create the index_vector.)

I changed it to this shrimpy version from the seed data (which I *thought* was current at the time but must have been out of date.)
CREATE OR REPLACE FUNCTION public.oils_tsearch2()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
NEW.index_vector = to_tsvector((TG_ARGV[0])::regconfig, NEW.value);
RETURN NEW;
END;
$function$

Which obviously knows nothing about any normalizers. I'll update our function with a proper definition and start a reingest to mop up.

So this bug is invalid, but the reason I was looking for possible oils_tsearch2 issues is that a table was getting null values from the trigger... So maybe I'll file another bug soon? Have to see. :-/

Changed in evergreen:
status: New → Invalid
Revision history for this message
Mike Rylander (mrylander) wrote :

Ah, yep. that is a long-dead version that is superseded in the baseline schema by a script that follows the one defining that version.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.