Name search field entries include "role" value, bloating metabib.author_title_field_entry

Bug #1277895 reported by Dan Scott
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Triaged
Medium
Unassigned

Bug Description

* Evergreen master as of 2014-02-08
* PostgreSQL 9.1 / 9.3
* Ubuntu 12.04 / Fedora 20

metabib.author_field_entry is currently populated with approximately twice as many rows as it should be, and includes extraneous text ("creator"). This is a problem because of table/index bloat and because a search for "creator" will result in wrong results and likely cause a table scan.

The current definition of the author:personal index field is:

8 | author | personal | Personal Author | //mods32:mods/mods32:name[@type='personal' and mods32:role/mods32:roleTerm[text()='creator']]

The MODS32 name section looks like the following:

<name type="personal">
  <namePart>Girdlestone, Cuthbert Morto</namePart>
  <namePart type="date">1895-1975</namePart>
  <role><roleTerm authority="marcrelator" type="text">creator</roleTerm></role>
</name>

The entries that we see in metabib.author_field_entry look like:

evergreen=# select id, source, field, value from metabib.author_field_entry WHERE value ~ 'Girdles';
 id | source | field | value
----+--------+-------+------------------------------------------------
  5 | 3 | 8 | Girdlestone, Cuthbert Morton 1895-1975
  6 | 3 | 8 | Girdlestone, Cuthbert Morton 1895-1975 creator

So we need to be grabbing /name/namePart and not /name/role ...

Tags: search bloat
Ben Shum (bshum)
Changed in evergreen:
status: New → Triaged
importance: Undecided → Medium
Elaine Hardy (ehardy)
tags: added: bloat search
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.