Comment 6 for bug 5417

Revision history for this message
Christian Reis (kiko) wrote : Re: [Bug 5417] Find accented forms when searching (e.g. Carlos Perelló Marín with "perello")

On Wed, Dec 07, 2005 at 02:03:07AM -0000, Stuart Bishop wrote:
> I think we would need to smash the values going into the indexes - tsearch2
> is designed to only work in a single locale and encoding so it isn't going
> to be any help to us here.

Yes, that's what I had intended to say.

The portuguese use cases are simple -- all transliterations of accented
characters are simply conversions to the unaccented version. So àéíõü
would become aeiou. This makes searches work a lot better in the face of
the fact that people often omit them, for various reasons.

Looking at the ascii_smash code, it is pretty easy to fix specific cases
where it gets things wrong -- just add an exception to the mapping. I
suspect it does the right thing for most of the cases, so perhaps we
could proceed with using this, and have people tell us when we get it
wrong so we can adjust.

> If we want to proceed, canonical.encoding.ascii_smash() needs to be brought
> into the database environment by embedding the logic into the ftq() method.

That sounds like a plan. Would it involve copying the code or could we
still work from a single codebase?
--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3376 0125