Make corporate author index less strict

Bug #1437069 reported by Jeff Davis
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Wishlist
Unassigned

Bug Description

In supported versions of Evergreen up to and including 2.8, the default author|corporate index is quite strict, at least for added entries. If I understand correctly, a MARC 710 field will be indexed only if that same field also has a subfield $e containing 'creator' or a subfield $4 containing 'aut' or 'cre' (see bug 1073217).

It seems to me that this is *too* strict in practice. I suspect most libraries will have a lot of 710 tags with no $e or $4 subfield. At Sitka, for example, less than 2% of our records with 710's have those subfields.

I propose that we replace the existing author|corporate index with something broader. One option would be to create a new MARC-based author|corporate index on 110 $abcdq and 710 $abcdq. (I don't see an obvious way to implement this as a MODS-based index without modifying the XSLT in config.xml_transform.) Sitka could just do this locally, but I think this change would benefit the broader EG user community.

Tags: cataloging
Revision history for this message
Dan Scott (denials) wrote :

I'm pretty heavily against relaxing the stock index definition. Precision is important; running a very specific search against the corporate author index that includes tons of records with matching 710s that are *not* corporate authors (because, without the $e or $4 relator subfields, you have no idea what the 710 is actually trying to specify) would reflect poorly on Evergreen.

For contrasting data in the wild, we currently have:

2,636,880 records in total
  894,030 records with a 710 field
  465,000 records with a 710 field that _do_ have a $e or $4 subfield
  460,689 records with a 710 field with a $e or $4 subfield with a value of 'pbl' or 'publisher'
       3,647 records with a 710 field with a $e or $4 subfield with a value of 'aut', 'cre', or 'creator'

and 340,353 records with a 110 field

-- From queries like:
SELECT COUNT(*) FROM (
    SELECT DISTINCT(record)
    FROM metabib.full_rec
    WHERE tag = '710'
        AND subfield IN ('e', '4')
        AND value IN ('aut', 'cre', 'creator')
) AS x;

A stock "corporate author" index that indexes 710 as corporate author without using the relator subfields would pollute our index by taking it from the 344K valid entries to well more than twice the number of entries.

I am, however, in favour of beefing up the documentation how to change the stock index definitions, or create new index definitions, to suit local requirements, and think the 110/710 field would make for a good example if you wanted to provide a demonstration of creating a "related corporate name" search index.

Revision history for this message
Don Butterworth (don-butterworth) wrote :

I would like to speak in favor of Jeff Davis' recommendation. The use of subfield "e Relator Term" in the 110 and 710 fields is "optional". See:
* Bibliographic Formats and Standards: Field 110 - Corporate Name
http://www.oclc.org/bibformats/en/1xx/110.html
* Bibliographic Formats and Standards: Field 710 - Corporate Name
http://www.oclc.org/bibformats/en/7xx/710.html

It has not been common practice in the North American cataloging community to include $e in any author field until the advent of RDA. Because of this practice the vast, vast majority of 110 and 710 entries are genuine corporate authors and need to be included in the Author Browse index.

Further, I would argue that $e should not be included when compiling the Results Screen of the Author Browse Index. Example:

Wesley, John (202)
Wesley, John, editor (48)
Wesley, John, joint author (32)
Wesley, John, jt. author (8)

--- Prefer ---

Wesley, John (290)

Changed in evergreen:
importance: Undecided → Wishlist
tags: added: cataloging
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.