Make corporate author index less strict
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
New
|
Wishlist
|
Unassigned |
Bug Description
In supported versions of Evergreen up to and including 2.8, the default author|corporate index is quite strict, at least for added entries. If I understand correctly, a MARC 710 field will be indexed only if that same field also has a subfield $e containing 'creator' or a subfield $4 containing 'aut' or 'cre' (see bug 1073217).
It seems to me that this is *too* strict in practice. I suspect most libraries will have a lot of 710 tags with no $e or $4 subfield. At Sitka, for example, less than 2% of our records with 710's have those subfields.
I propose that we replace the existing author|corporate index with something broader. One option would be to create a new MARC-based author|corporate index on 110 $abcdq and 710 $abcdq. (I don't see an obvious way to implement this as a MODS-based index without modifying the XSLT in config.
Changed in evergreen: | |
importance: | Undecided → Wishlist |
tags: | added: cataloging |
I'm pretty heavily against relaxing the stock index definition. Precision is important; running a very specific search against the corporate author index that includes tons of records with matching 710s that are *not* corporate authors (because, without the $e or $4 relator subfields, you have no idea what the 710 is actually trying to specify) would reflect poorly on Evergreen.
For contrasting data in the wild, we currently have:
2,636,880 records in total
894,030 records with a 710 field
465,000 records with a 710 field that _do_ have a $e or $4 subfield
460,689 records with a 710 field with a $e or $4 subfield with a value of 'pbl' or 'publisher'
3,647 records with a 710 field with a $e or $4 subfield with a value of 'aut', 'cre', or 'creator'
and 340,353 records with a 110 field
-- From queries like:
SELECT COUNT(*) FROM (
SELECT DISTINCT(record)
FROM metabib.full_rec
WHERE tag = '710'
AND subfield IN ('e', '4')
AND value IN ('aut', 'cre', 'creator')
) AS x;
A stock "corporate author" index that indexes 710 as corporate author without using the relator subfields would pollute our index by taking it from the 344K valid entries to well more than twice the number of entries.
I am, however, in favour of beefing up the documentation how to change the stock index definitions, or create new index definitions, to suit local requirements, and think the 110/710 field would make for a good example if you wanted to provide a demonstration of creating a "related corporate name" search index.