Authority Records with Long Subfields Fail to Load

Bug #1922567 reported by Jason Stephenson
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
3.6
Fix Released
Medium
Unassigned

Bug Description

Evergreen version: 3.2.10+
OpenSRF version: N/A
PostgreSQL version: 9.6

CWMARS recently received a batch update of bibliographic and authority records from a vendor. The following error was output while loading the authority records:

DBD::Pg::db do failed: ERROR: index row size 2984 exceeds maximum 2712 for index "authority_full_rec_value_index"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
CONTEXT: SQL statement "INSERT INTO authority.full_rec (record, tag, ind1, ind2, subfield, value)
SELECT record, tag, ind1, ind2, subfield, value FROM authority.flatten_marc( auth_id )"
PL/pgSQL function authority.reingest_authority_full_rec(bigint) line 4 at SQL statement
SQL statement "SELECT authority.reingest_authority_full_rec(NEW.id)"
PL/pgSQL function authority.indexing_ingest_or_delete() line 72 at PERFORM at /home/opensrf/migration-tools/eg_staged_bib_overlay line 625.
DBD::Pg::db do failed: ERROR: index row size 2984 exceeds maximum 2712 for index "authority_full_rec_value_index"
HINT: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.
CONTEXT: SQL statement "INSERT INTO authority.full_rec (record, tag, ind1, ind2, subfield, value)
SELECT record, tag, ind1, ind2, subfield, value FROM authority.flatten_marc( auth_id )"
PL/pgSQL function authority.reingest_authority_full_rec(bigint) line 4 at SQL statement
SQL statement "SELECT authority.reingest_authority_full_rec(NEW.id)"
PL/pgSQL function authority.indexing_ingest_or_delete() line 72 at PERFORM at /home/opensrf/migration-tools/eg_staged_bib_overlay line 625.
Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Pg::db handle host=db1;dbname=evergreen;port=5432 at /home/opensrf/migration-tools/eg_staged_bib_overlay line 625.

Investigation revealed that this error is caused by 1 authority record with a 670$b that is 5,090 characters long. This error causes the remaining 6,445 to also not load. They had to be loaded after the troublesome record was identified and removed from the input.

Looking in the database, I have determined that a potential fix is to change the authority_full_rec_value_index and authority_full_rec_value_tpo_index definitions to match the analogous indexes on the metabib.real_full_rec table.

I will add a branch with schema changes, db upgrade, and release notes shortly.

summary: - Aurhotiry Records with Long Subfields Fail to Load
+ Authority Records with Long Subfields Fail to Load
Revision history for this message
Elaine Hardy (ehardy) wrote :

Jason,

What authority record is this? Perhaps it needs to be turned into the authority police?

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Branch pushed to https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1922567-long-authority-full-rec-values-fix

It includes the db ugprade, schema changes in the base files, a SQL regression test, and a release note.

You can use the regression test to verify and to test the fix. To verify the bug with the regression test, build the schema with a branch other than the above, then run the regression test. It should fail under these conditions. To test the fix, run the db upgrade script, and then run the regression test again. It should pass this time.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
no longer affects: evergreen/3.7
Changed in evergreen:
milestone: none → 3.7-beta2
tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

CWMARS is using this branch in production as of 2021-04-07.

Changed in evergreen:
milestone: 3.7-rc → 3.7.1
Changed in evergreen:
milestone: 3.7.1 → 3.7.2
Revision history for this message
Galen Charlton (gmc) wrote :

For testing purposes, I've attached a copy of the authority record from the PgTAP test case.

Elaine, in this case it looks like the authority police would be LC's internal affairs division - the record in question is https://id.loc.gov/authorities/names/n50030508.html

Revision history for this message
Elaine Hardy (ehardy) wrote :

Since the record came via OCLC, I will report as an error to them and ask if the 670 an be cleaned up a little. It looks like someone cut and pasted the entire wikidata entry for Erenburg.

Reporting to OCLC might get a faster fix than to LC

Revision history for this message
Jennifer Weston (jweston) wrote (last edit ):

Loaded successfully! (Thanks, Galen, for the authority record file).

Edited to add testing interface note -- Tested on: https://festivus.evergreencatalog.com/eg2/en-US/staff/cat/vandelay

I have tested this code and consent to signing off on it with my name, Jennifer Weston, and my email address, <email address hidden>

tags: added: signedoff
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed down to rel_3_6. Thanks, Jason and Jennifer!

Noting that I did some testing to verify that the index recreation would take a reasonable amount of time in a database with several hundred thousand authority records, which is why I am comfortable with the backport.

Changed in evergreen:
importance: Undecided → Medium
no longer affects: evergreen/3.5
Changed in evergreen:
assignee: nobody → Galen Charlton (gmc)
assignee: Galen Charlton (gmc) → nobody
status: New → Fix Committed
Elaine Hardy (ehardy)
tags: added: cat-authority
removed: authority cataloging
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.