Use all subfield values to link authority records to bibs

Bug #1245944 reported by Mike Rylander
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Critical
Unassigned
2.3
Fix Released
Undecided
Unassigned
2.4
Fix Released
Undecided
Unassigned

Bug Description

Given an Evergreen instance with two authority records loaded, one being a more specific than the other via a repeated subdivision subfield, we must make sure that we use all the bib-supplied subfield values when attempting to auto-link to the correct authority. Otherwise, the "shorter" authority record may be selected as appropriate, and data in the bib record would be lost.

I consider this pretty serious, as bib data is changed in a way that makes reverting particularly difficult, and the problem can go unnoticed until an authority ingest (forcing authority propagation) mangles a ton of data.

Additionally, if a previously linked "short" authority record has not yet asserted itself, and a re-run of the linking script would not find the previously linked record (it won't in the case described above) then the linking script does not remove the old $0.

Here's a branch that (1) considers all subfield values when linking and (2) adds a --refresh flag to the authority linking script to strip target bib records of all $0 subfields before searching for a best match.

Top 2 commits of: http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/link-using-all-subfield-values

Tags: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

My commentary on the proposed branch upon eyeballing it: using all of the subfields in the bib field to look up a matching authority record is an improvement over the status quo.

It's still not perfect, though: the order of subfields in the bib and authority headings are ignored, as is checking that the subject thesauri of the bib and authority heading match. Those concerns are long-standing, however, and reasonably the topic of a separate bug.

Revision history for this message
Galen Charlton (gmc) wrote :

Mentioning a comment Mike made in a separate discussion -- a better approach in the long run is base authority lookup on authority.simple_heading() or the like. Of course, that's tantamount to rewriting a good chunk of the linking script, so ... Captain! NEED MORE TUITS!

Revision history for this message
Mike Rylander (mrylander) wrote :

Head's up! I force-pushed a tiny bug fix for a think/type-o spotted by Dan Wells during testing. Pull again if you're planning to test, please and thank you.

Dan Wells (dbw2)
Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
Revision history for this message
Dan Wells (dbw2) wrote :

Tested with some help from Remington S., looks good. Pushed from master through rel_2_3. Thanks, Mike!

Changed in evergreen:
status: New → Fix Committed
assignee: Dan Wells (dbw2) → nobody
Dan Wells (dbw2)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.