auth-to-auth linking broken when there are multiple 5xx entries

Bug #1312945 reported by Srey Seng
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.5
Fix Released
Medium
Unassigned

Bug Description

EG Master

On a clean install of master, I load in three authority records (shortened below to only show the 100/400/500 tags). I also loaded in some sample books that contain records written by all three authors.

Auth #1:
100 1\$aPeters, Elizabeth,$d1927-2013
400 1\$aPiters, Ėlizabet,$d1927-2013
500 1\$wnnnc$aMertz, Barbara

Auth #2:
100 1\$aMichaels, Barbara,$d1927-2013
500 1\$wnnnc$aMertz, Barbara

Auth #3:
100 1\$aMertz, Barbara
500 1\$wnnnc$aMichaels, Barbara,$d1927-2013
500 1\$wnnnc$aPeters, Elizabeth,$d1927-2013

I run the following script, in order listed: (1) authority_control_fields.pl and (2) authority_authority_linker.pl.

Note that Auth #3 has two 500 entries. However, after the linking process, when I view the authority record for Auth #3 via the staff client, I see the first 500 entry linked to Auth #1. But, there are two potential issues I see here:

(1) Only one 500 entry for Auth #3 is linked? Given some research, I was expecting to see both 500 entries linked to their associated named authority records.
(2) The first 500 entry for Auth #3 references Michaels, Barbara, from Auth #2, not Peters, Elizabeth from Auth #1. So, the only 500 entry for Auth #3 that is linked, appears to be linked to the wrong See Also reference.

Attached are the bibs and authorities I loaded into the empty database. All records were obtained from Library of Congress database (bibs from Z39.50 Client, authorities from online website). I have also attached screen captures of the resulting authority records after the linking process, as well as a screen capture of the authority.authority_linking table.

------
Bib/Auth records loaded into DB
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097067/+files/01_bib_peters_michaels_mertz.mrc
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097068/+files/01_auth_peters_michaels_mertz.mrc

Screen Captures of Authority records after linking
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097063/+files/authority_authoirty_linking.png
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097064/+files/mertz_barbara_linked.png
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097065/+files/michaels_barbara_linked.png
https://bugs.launchpad.net/evergreen/+bug/1312945/+attachment/4097066/+files/peters_elizabeth_linked.png
------

Again, I am new to authorities in general, and the findings above might be due to my misunderstanding or incorrect system setup. But, just wanted to put this out there, in case it is a real issue.

Revision history for this message
Srey Seng (sreyseng) wrote :
tags: added: authority
Revision history for this message
Srey Seng (sreyseng) wrote :
Revision history for this message
Srey Seng (sreyseng) wrote :
Revision history for this message
Srey Seng (sreyseng) wrote :
Revision history for this message
Srey Seng (sreyseng) wrote :
Revision history for this message
Srey Seng (sreyseng) wrote :
description: updated
Revision history for this message
Srey Seng (sreyseng) wrote :

Update on findings:

In the file authority_authority_linker.pl, when the target marc is retrieved from the database like so...

my $target_rec = ($per_src_target_cache->{$src} ||=
$e->retrieve_authority_record_entry($target)) or
die $e->die_event;

...the linking indicators (denoted by the $0) in the authority marc are incorrect.
However, when the target marc is retrieved like so...

my $target_rec =
$e->retrieve_authority_record_entry($target) or
die $e->die_event;

...the linking indicators (denoted by the $0) in the authority marc are correct. The difference between the two calls to retrieve the marc is the removal of "($per_src_target_cache->{$src}" in the working call.

I do not know the purpose of "($per_src_target_cache->{$src}", any thoughts? In the meantime, I am moving forward with the assumption that its removal is safe.

Revision history for this message
Mike Rylander (mrylander) wrote :

I ran into this bug as well. In fact, there are two. The cache is one bug, and there's another whereby different uses of one term would fail to be linked. Here's a branch that addresses both:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp1312945_auth-auth-linking-fix

tags: added: pullrequest
Changed in evergreen:
importance: Undecided → Medium
milestone: none → 2.6.1
Revision history for this message
Srey Seng (sreyseng) wrote :

Hi Mike,

That's great! I was only able to fix the cache bug in the authority_authority_linker.pl itself by using the working call in the previous comment. But it does work in terms of modifying the authority record to show the linking indicators. I am not familiar with the second bug you found (different uses of one term would fail to be linked) and not sure how to test that aspect w/o further guidance and/or examples.

However, the updated authority_authority_linker.pl does not appear to take into consideration the function authority.calculate_authority_linking. The function authority.calculate_authority_linking gets called from authority.indexing_ingest_or_delete() each time an authority entry is modified (which happens when authority_authority_linker.pl runs), and currently is not capable of dealing with situations where there are multiple marc tags that are linkable (i.e, several 500 entries). The function will only process one link per marc tag (the very first one it encounters) and populate the authority.authority_linking table with that one result.

I am currently working on a solution for fixing the function "authority.calculate_authority_linking", as that was where I was seeing the additional failure to properly return the correct linking rows for insertion into the authority.authority_linking table.

Here's my WIP branch with the modified function (based off of your branch):

Branch:
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/sreyseng/lp1312945_calculate-authority-linking-multiple-tags

I am not sure how to commit SQL changes, but I've taken the function from the db and modified it with the changes.

The thought of this being a bug is based on my assumption that the authority.authority_linking table should show the following entries given the fact that Auth #3 Mertz, Barbara (id 3 ) contains 500 links for Auth #2 Michaels, Barbara (id 2) and Auth #1 Peters, Elizabeth(id 1 ) :

source | target | field
------------------------
     3 | 2 | 21
     3 | 1 | 21

As is, without modifying the function "authority.calculate_authority_linking", the authority.authority_linking table would only have entry for the first linked tag when running through the auth linking process or when updating the auth record:

source | target | field
------------------------
     3 | 2 | 21

-----------------
 --reference--
-----------------
Auth #1:
100 1\$aPeters, Elizabeth,$d1927-2013
400 1\$aPiters, Ėlizabet,$d1927-2013
500 1\$wnnnc$aMertz, Barbara

Auth #2:
100 1\$aMichaels, Barbara,$d1927-2013
500 1\$wnnnc$aMertz, Barbara

Auth #3:
100 1\$aMertz, Barbara
500 1\$wnnnc$aMichaels, Barbara,$d1927-2013
500 1\$wnnnc$aPeters, Elizabeth,$d1927-2013
-----------------

I am still actively testing the modified authority.calculate_authority_linking function but more eyes and all pointers appreciated!

Revision history for this message
Mike Rylander (mrylander) wrote :

Srey,

Indeed! Good catch. I'll clean up your addition (leaving your authorship) and sign off, and add the change to the baseline schema.

WATCH THIS SPACE ...

[ NOTE: The inter-authority linking script is really meant only to make a best-effort at linking authorities ... a cataloger may still be needed for tricky cases. The stored proc is a separate, core bit, though. Again, good work. ]

Revision history for this message
Mike Rylander (mrylander) wrote :

I changed your implementation a little to simplify the loop construction, but left your authorship and signed off. I made it a collab branch, so we can add new commits to the same branch, but if it tests well for you (it does for me) then a separate signoff branch can be built by cherry picking with signoffs.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/miker/lp1312945_auth-auth-linking-fix

Now I get the appropriate contents in authority.authority_linking:

  id | source | target | field
------+--------+--------+-------
 6262 | 13740 | 13964 | 25
 6263 | 13740 | 13742 | 25

Revision history for this message
Srey Seng (sreyseng) wrote :

Hi Mike,

I did a clean install from master with the three commits cherry-picked in (Baseline schema update | LP#1312945: authority.calculate_authority_linking and.. | LP#1312945: Cache less agressivelly and look for all... ).

They test well for me in terms of modifying the authority record itself and inserting in the correct amount of links into the authority_linking table.

My signoff branch:
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/sreyseng/lp1312945_auth-auth-linking-fix-signoff

Thanks!

Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master, rel_2_6, and rel_2_5, along with a regression test for the change to the stored function. Thanks, Mike and Srey!

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.