alternative title indexing is broken in 2.5

Bug #1233343 reported by Ben Shum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned

Bug Description

Evergreen master as of 2013-09-07

When the browse search code was added to master as a new feature for 2.5, it looks like we accidentally broke alternative title indexing. This was especially noticeable for MARC tag 246 alternate title information which "disappeared" from the title index and could no longer be found via title searches.

Some examples in the concerto dataset that ships with new master:

id 34 for Italienisches Konzert (245) and Italian concerto, BWV 971 (246). So if you search for "Italian concerto, BWV 971" in title search, it doesn't show up. Looking at the metabib.title_field_entry for that source bib, I can only see 5 (uniform title), 6 (proper title), and 31 (browse title). The one missing is 4 (alternative title).

This only affects master/2.5. 2.4 and previous are unaffected.

Revision history for this message
Ben Shum (bshum) wrote :

Thanks to Mike for working branch that seems to look at this issue: working/user/miker/get-all-alternative-titles

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/get-all-alternative-titles

I ran this in my test server running master bibs. While the reingest in the upgrade SQL did not help flesh out the alternative titles, doing a full reingest with:

UPDATE config.internal_flag SET enabled = TRUE WHERE name = 'ingest.reingest.force_on_same_marc';
UPDATE biblio.record_entry SET id = id WHERE id = 34;
UPDATE config.internal_flag SET enabled = FALSE WHERE name = 'ingest.reingest.force_on_same_marc';

Doing that full reingest did create a field 4 alternative title entry for the bib record example I mentioned from the concerto dataset.

So maybe we just need to have a fuller reingest action to fix the issue.

Changed in evergreen:
status: New → Confirmed
importance: Undecided → Medium
milestone: none → 2.5.0-rc
tags: added: indexing metabib
Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

Ben, how's this for a more appropriate reingest? working/collab/senator/get-all-alternative-titles

That one's broader (as in, across search, browse, and facet) on one axis but narrower on another (only deals with the alternative title field, not all fields). Would you be able to test its performance on a largeish dataset you might have handy?

I'm pretty sure it should be faster than a plain full reingest, but I might have missed something. If it is faster, it also makes a strong suggestion about some code in metabib.reingest_metabib_field_entries() that should be divided into a new, separate function.

Thanks!

Dan Wells (dbw2)
tags: added: 2.5-release-blocker
Remington Steed (rjs7)
summary: - alternative title indexing in broken in 2.5
+ alternative title indexing is broken in 2.5
Dan Wells (dbw2)
Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
Revision history for this message
Dan Wells (dbw2) wrote :

Not sure about the speed, but the updated upgrade script does work, and the number of production sites who will need to run it is vanishingly small (since the 2.5 upgrade requires reingesting anyway).

More generally, I would be interested to know whether the function is appreciably faster than a full reingest (like Lebbeous, I am betting that it is). But that investigation can take place outside this bug if anyone wants to take it on.

Thanks, Ben, Mike, and Lebbeous!

Changed in evergreen:
status: Confirmed → Fix Committed
assignee: Dan Wells (dbw2) → nobody
Revision history for this message
Ben Shum (bshum) wrote :

To Dan,

The broader reingest that Lebbeous suggested cut our reingest time from an estimated previous time of 16 hours or so to maybe 8 hours or less. So it helped, but it still took awhile to complete. I didn't get exact timing on the test server though.

Might be worth exploring further at a later date.

Dan Wells (dbw2)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.