Comment 1 for bug 1437069

Revision history for this message
Dan Scott (denials) wrote :

I'm pretty heavily against relaxing the stock index definition. Precision is important; running a very specific search against the corporate author index that includes tons of records with matching 710s that are *not* corporate authors (because, without the $e or $4 relator subfields, you have no idea what the 710 is actually trying to specify) would reflect poorly on Evergreen.

For contrasting data in the wild, we currently have:

2,636,880 records in total
  894,030 records with a 710 field
  465,000 records with a 710 field that _do_ have a $e or $4 subfield
  460,689 records with a 710 field with a $e or $4 subfield with a value of 'pbl' or 'publisher'
       3,647 records with a 710 field with a $e or $4 subfield with a value of 'aut', 'cre', or 'creator'

and 340,353 records with a 110 field

-- From queries like:
SELECT COUNT(*) FROM (
    SELECT DISTINCT(record)
    FROM metabib.full_rec
    WHERE tag = '710'
        AND subfield IN ('e', '4')
        AND value IN ('aut', 'cre', 'creator')
) AS x;

A stock "corporate author" index that indexes 710 as corporate author without using the relator subfields would pollute our index by taking it from the 344K valid entries to well more than twice the number of entries.

I am, however, in favour of beefing up the documentation how to change the stock index definitions, or create new index definitions, to suit local requirements, and think the 110/710 field would make for a good example if you wanted to provide a demonstration of creating a "related corporate name" search index.