incorrect merge of title prefix in search engine

Bug #422355 reported by solrize
This bug report is a duplicate of:  Bug #262265: Merge Title Prefix into Title Field. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Open Library
Confirmed
High
solrize

Bug Description

http://openlibrary.org/search?q=thebest finds a lot of wrongly indexed titles, e.g. "thebest of leigh brackett", because the title_prefix "the" has gotten improperly concatenated with "best". There are similar errors for other prefixes. I thought we were not supposed to insert a space because of prefixes like L-apostrophe. Is there a precise rule we are supposed to use?

solrize (solrize)
description: updated
solrize (solrize)
Changed in openlibrary:
assignee: nobody → solrize (solrize)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
solrize (solrize) wrote :

It looks like we have an explicit trailing blank in title prefixes like "The ". That is easy to get wrong. I think inserting a space should be the default and we should have a way to suppress it, perhaps an alternate prefix field. Are there any examples other than L-apostrophe (like "L'Hotel du Nord") that shouldn't get a space? Maybe I should insert the space for every prefix except that one.

Revision history for this message
Edward Betts (edwardbetts) wrote :

Please read https://bugs.launchpad.net/openlibrary/+bug/262265 it contains other examples.

Maybe we should merge these bugs.

Revision history for this message
solrize (solrize) wrote :

Hmm, Karen writes:

a list of title prefixes (with language, but unfortunately not using the
ISO language code!)

http://www.loc.gov/marc/bibliographic/bdapndxf.html

Should we just strip out those prefixes automatically?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.