Authority merge time out when too many records,

Bug #1193490 reported by Steve Callender on 2013-06-21
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned

Bug Description

Tested in 2.3

The server times out When doing an authority merge when there are a lot of bib records involved.

In the authority.merge_records function, it turns on the ingest.reingest.force_on_same_marc setting in order to re-ingest bibs, but if there are too many, it takes too long and the actual API call times out waiting for data.

One thing that could be changed to help relieve the amount of data, is it looks like it un-neccesarrily re-ingests the target bib records instead of just doing the source ones.

In the function, right before the,

-- 3. Temporarily set reingest on same to TRUE

The original target records should be pulled out before the re-ingest so we just do source records. They then can be added back in after the re-ingest.

I'm not sure of the best way to do this though. I don't think moving them to a temp table is a good idea. Maybe a local variable? Maybe there is a better way to accomplish this by re-writing the code here.

In my testing, I was trying to merge a record that had 4 entries into one that had 400, and received the timeout and failed on the merge.

Steve

Mike Rylander (mrylander) wrote :

Another option, which would help in the situation you encountered, would be to split the force-on-same-flag into two, one for each of bib and authority. Then use just the authority force-on-same flag inside the merge function, and because only the bibs that were pointing to the new master authority record were changed, only those would end up being updated via the cascade of trigger-induced reingest.

However, this only helps the "lightly used merged into heavily used" case. A heavily used authority being merge into another authority will suffer the same timeout possibility. To address the general case, I think we may need to consider a reingest queue. This would be useful more generally for upgrade-time and configuration-caused reingests, as well.

There are many roads we could take to such a thing, among them:
  * An A/T reactor
  * A cron-fired script that reads a queue table and generates a script to cause the reingest
  * A LISTEN/NOTIFY client for Postgres that reacts by asynchronously firing a reingest as needed, based on a queue table
  * Other, fancier things...

All of these things will require, I believe, one common ability; they need to be able to inform the main reingest trigger that, instead of doing its work, it should instead simply insert the record id into a queue and move on. That can take the form of a global flag (ingest.reingest.asynchronous, maybe) that the code would use in situations where queuing for async reingest is known to be the preferred method, and even by admins to cause all reingest to be performed async.

Thoughts?

tags: added: cataloging reingest timeout

Has there been any further progress witht his bug?

Yamil (ysuarez) on 2014-01-14
tags: added: authority
Ben Shum (bshum) on 2014-02-16
Changed in evergreen:
status: New → Triaged
importance: Undecided → Medium
tags: added: performance
Yamil (ysuarez) wrote :

For those interested, Berklee college of Music solicited a quote from ESI to fix this bug. I am looking for partners to fund the implementation of the fix. Feel free to email for more information about the quote.

Thanks in advance,
Yamil

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers