Comment 2 for bug 5417

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 5417] Find accented forms when searching (e.g. Carlos Perelló Marín with "perello")

Christian Reis wrote:
> Public bug report changed:
> https://launchpad.net/malone/bugs/5417
>
> Comment:
> I think we might have an easier way out just smashing the fti indexes to
> contain only non-accented versions of the characters, and then
> converting the query strings provided to the fti helper. I'm guessing,
> though, and Stuart as usually will have a better idea.

I think we would need to smash the values going into the indexes - tsearch2
is designed to only work in a single locale and encoding so it isn't going
to be any help to us here.

We already have code to do the deaccentification -
canonical.encoding.ascii_smash() handles the European latin based character
sets. Your still stuffed with character sets that don't have an ASCII
equivalent, such as Coptic, Greek or most of the Asian languages.

If we want to proceed, canonical.encoding.ascii_smash() needs to be brought
into the database environment by embedding the logic into the ftq() method.

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/