Re-indexing deleted records upon modification
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Invalid
|
Undecided
|
Unassigned |
Bug Description
EG 2.0.6
PG 8.4
I don't think this is a bug, but more inquiry re: expected behavior.
We identify some MARC tag that needs removal or updating. In this case, we updated/moved some 659's to be 653's, and then completely removed remaining 659's, impacting some 23,000 bibs. We don't filter out deleted records from this update since we want the updates to be complete (even the teeny tiny space savings count).
We then notice about 3500 rows where tag='659' remained in metabib.
Questions:
Would a full re-indexing of the database update these rows metabib.
Over time and if performing many direct SQL cleanup / updates, there would appear to be some small optimization (for space, etc.) to be gained by cleaning up related metabib entries even for deleted records following large search / replaces. So we'll just document and follow-up with manual cleanups of the metabib indexes but curious about alternatives.
Evergreen won't ever re-ingest a deleted record. Once deleted the last state is essentially written in stone. If you undelete it, it will be re-ingested, though.
Also, note that updating a record to save a few bytes by removing a tag will actually end up using more space, for two reason: the auditor table gets a new row, and the old row will still be in the table taking up space until that specific space is reclaimed by a subsequent vacuum. If space savings is the primary reason for including deleted records in the update, it's actually better and (in light of the above intended skipping of deleted records during re-ingest) more correct thing to do.
As for cleaning up metabib.full_rec, changing the data there would be counter-productive as reports expect to be able to pull information on "deleted" records from the super-simple record extracts view (in a "at the time of deletion" state).
--miker