Re-indexing deleted records upon modification

Bug #797238 reported by George Duimovich
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Invalid
Undecided
Unassigned

Bug Description

EG 2.0.6
PG 8.4

I don't think this is a bug, but more inquiry re: expected behavior.

We identify some MARC tag that needs removal or updating. In this case, we updated/moved some 659's to be 653's, and then completely removed remaining 659's, impacting some 23,000 bibs. We don't filter out deleted records from this update since we want the updates to be complete (even the teeny tiny space savings count).

We then notice about 3500 rows where tag='659' remained in metabib.real_full_rec and further investigation identifies that these rows are associated with bib record id's that have been deleted (e.g. active=f, deleted = t).

Questions:
Would a full re-indexing of the database update these rows metabib.real_full_rec even though these rows are associated with deleted bibs? There are quicker shortcuts to removing these rows via SQL in lieu of re-indexing, but I'm wondering if updates that touch deleted records should also update some of the related metabib tables?

Over time and if performing many direct SQL cleanup / updates, there would appear to be some small optimization (for space, etc.) to be gained by cleaning up related metabib entries even for deleted records following large search / replaces. So we'll just document and follow-up with manual cleanups of the metabib indexes but curious about alternatives.

Revision history for this message
Mike Rylander (mrylander) wrote :

Evergreen won't ever re-ingest a deleted record. Once deleted the last state is essentially written in stone. If you undelete it, it will be re-ingested, though.

Also, note that updating a record to save a few bytes by removing a tag will actually end up using more space, for two reason: the auditor table gets a new row, and the old row will still be in the table taking up space until that specific space is reclaimed by a subsequent vacuum. If space savings is the primary reason for including deleted records in the update, it's actually better and (in light of the above intended skipping of deleted records during re-ingest) more correct thing to do.

As for cleaning up metabib.full_rec, changing the data there would be counter-productive as reports expect to be able to pull information on "deleted" records from the super-simple record extracts view (in a "at the time of deletion" state).

--miker

Changed in evergreen:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.