Evergreen

Authority merge time out when too many records,

Bug #1193490 reported by Steve Callender on 2013-06-21

This bug report is a duplicate of: Bug #1979071: Queued Ingest functionality. Edit Remove

This bug affects 15 people

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	Triaged	Medium	Unassigned

Bug Description

Tested in 2.3

The server times out When doing an authority merge when there are a lot of bib records involved.

In the authority.merge_records function, it turns on the ingest.reingest.force_on_same_marc setting in order to re-ingest bibs, but if there are too many, it takes too long and the actual API call times out waiting for data.

One thing that could be changed to help relieve the amount of data, is it looks like it un-neccesarrily re-ingests the target bib records instead of just doing the source ones.

In the function, right before the,

-- 3. Temporarily set reingest on same to TRUE

The original target records should be pulled out before the re-ingest so we just do source records. They then can be added back in after the re-ingest.

I'm not sure of the best way to do this though. I don't think moving them to a temp table is a good idea. Maybe a local variable? Maybe there is a better way to accomplish this by re-writing the code here.

In my testing, I was trying to merge a record that had 4 entries into one that had 400, and received the timeout and failed on the merge.

Steve

Tags:

Revision history for this message

Mike Rylander (mrylander) wrote on 2013-06-25:

Another option, which would help in the situation you encountered, would be to split the force-on-same-flag into two, one for each of bib and authority. Then use just the authority force-on-same flag inside the merge function, and because only the bibs that were pointing to the new master authority record were changed, only those would end up being updated via the cascade of trigger-induced reingest.

However, this only helps the "lightly used merged into heavily used" case. A heavily used authority being merge into another authority will suffer the same timeout possibility. To address the general case, I think we may need to consider a reingest queue. This would be useful more generally for upgrade-time and configuration-caused reingests, as well.

There are many roads we could take to such a thing, among them:
  * An A/T reactor
  * A cron-fired script that reads a queue table and generates a script to cause the reingest
  * A LISTEN/NOTIFY client for Postgres that reacts by asynchronously firing a reingest as needed, based on a queue table
  * Other, fancier things...

All of these things will require, I believe, one common ability; they need to be able to inform the main reingest trigger that, instead of doing its work, it should instead simply insert the record id into a queue and move on. That can take the form of a global flag (ingest.reingest.asynchronous, maybe) that the code would use in situations where queuing for async reingest is known to be the preferred method, and even by admins to cause all reingest to be performed async.

Thoughts?

tags:

added: cataloging reingest timeout

Revision history for this message

Gislaine Hamelin (gislaine-hamelin) wrote on 2013-07-25:

Has there been any further progress witht his bug?

Yamil (ysuarez) on 2014-01-14

tags:

added: authority

Ben Shum (bshum) on 2014-02-16

Changed in evergreen:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: performance

Revision history for this message

Yamil (ysuarez) wrote on 2014-06-09:

For those interested, Berklee college of Music solicited a quote from ESI to fix this bug. I am looking for partners to fund the implementation of the fix. Feel free to email for more information about the quote.

Thanks in advance,
Yamil

Elaine Hardy (ehardy) on 2021-10-14

tags:

added: cat-authority
removed: authority cataloging

Revision history for this message

Andrea Neiman (aneiman) wrote on 2022-06-21:

This is addresses by the Queued Ingest work, bug 1979071

Revision history for this message

Andrea Neiman (aneiman) wrote on 2023-06-07:

Marking this as duplicate of bug 1979071, Queued Ingest, which was released with Evergreen 3.11.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1979071 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.