Use all subfield values to link authority records to bibs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Critical
|
Unassigned | ||
2.3 |
Fix Released
|
Undecided
|
Unassigned | ||
2.4 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Given an Evergreen instance with two authority records loaded, one being a more specific than the other via a repeated subdivision subfield, we must make sure that we use all the bib-supplied subfield values when attempting to auto-link to the correct authority. Otherwise, the "shorter" authority record may be selected as appropriate, and data in the bib record would be lost.
I consider this pretty serious, as bib data is changed in a way that makes reverting particularly difficult, and the problem can go unnoticed until an authority ingest (forcing authority propagation) mangles a ton of data.
Additionally, if a previously linked "short" authority record has not yet asserted itself, and a re-run of the linking script would not find the previously linked record (it won't in the case described above) then the linking script does not remove the old $0.
Here's a branch that (1) considers all subfield values when linking and (2) adds a --refresh flag to the authority linking script to strip target bib records of all $0 subfields before searching for a best match.
Top 2 commits of: http://
Changed in evergreen: | |
assignee: | nobody → Dan Wells (dbw2) |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
My commentary on the proposed branch upon eyeballing it: using all of the subfields in the bib field to look up a matching authority record is an improvement over the status quo.
It's still not perfect, though: the order of subfields in the bib and authority headings are ignored, as is checking that the subject thesauri of the bib and authority heading match. Those concerns are long-standing, however, and reasonably the topic of a separate bug.