non-filing indicators break title search relevance in non-English titles
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
High
|
Unassigned |
Bug Description
* Evergreen 2.0.6 (reproduced in master)
* OpenSRF 2.0.0
* PostgreSQL 9.0
The default xpath expression for title indexing relies on MODS32 titleInfo element, which grabs the nonSort and title elements along with intervening whitespace.
This is generally fine for English titles, which have non-filing indicators (245 indicator 2) for titles like "The foobar" or "A foobar", where the space between the non-filing article that goes into nonSort and the filing remainder that goes into title is expected. The resulting value in metabib.
However, in French and other languages, it is typical for non-filing indicators to be used for titles like "l'Histoire" - in which case, nonSort gets "l'" and title gets "Histoire", and the resulting value in metabib.
One step towards a fix is to extract the nodeset for the discrete elements in titleInfo instead of titleInfo itself, avoiding the empty whitespace nodes:
UPDATE config.
However, then the default joiner that we pass to biblio.
Consequently, we can add a condition to the joiner clause in biblio.
IF raw_text IS NOT NULL AND idx.field_class <> 'title' THEN
This still isn't perfect for relevance, as the keyword field entries still get "l' Histoire" - but it vastly improves title search for our French titles.
For an example MARC record to test with, see http://
Changed in evergreen: | |
importance: | Undecided → High |
Changed in evergreen: | |
milestone: | none → 2.2.0 |
Changed in evergreen: | |
status: | New → In Progress |
assignee: | nobody → Dan Scott (denials) |
Changed in evergreen: | |
assignee: | Dan Scott (denials) → nobody |
Changed in evergreen: | |
milestone: | 2.2.0alpha1 → 2.2.0alpha2 |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
I went with a different approach: modify MODS32 to include a titleNonfiling element that ignores the non-filing indicators and gives you the title string in one unmodified string.
Repo: working fix-nonfiling- titles
Branch: user/dbs/
Note: the upgrade script does not currently include a "reingest titles that have non-filing indicators and apostrophes" upgrade action.