strip accent characters in solr index

Bug #540866 reported by Anand Chitipothu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Open Library
New
High
Edward Betts

Bug Description

Most international records from Library of Congress have titles and authors are in a romanized form with accent characters. It is impossible for people to find these records from search unless we add the accent-less version of the names and titles to solr index.

For example: http://upstream.openlibrary.org/authors/OL617A/Śrī_Śrī
This author name when written in English becomes "Sri Sri". I knew that there exists an entry about this author in OL and still it took me more than one hour to find this record.

Changed in openlibrary:
milestone: none → upstream
assignee: nobody → Edward Betts (edwardbetts)
importance: Undecided → High
summary: - strip ascent characters in solr index
+ strip accent characters in solr index
description: updated
Revision history for this message
Edward Betts (edwardbetts) wrote :

http://openlibrary.org/authors/OL617A has "Sri Sri" as an alternative name, so it matches. We need another example

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 540866] Re: strip accent characters in solr index

On 23-Jun-10, at 10:29 PM, Edward Betts wrote:

> http://openlibrary.org/authors/OL617A has "Sri Sri" as an alternative
> name, so it matches. We need another example

http://openlibrary.org/authors/OL204764A/Yarraṃśeṭṭi_Śāyi

Revision history for this message
Edward Betts (edwardbetts) wrote :
Revision history for this message
Edward Betts (edwardbetts) wrote :
Revision history for this message
Edward Betts (edwardbetts) wrote :

We need to rebuild the solr index to fix this bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.