Comment 1 for bug 383102

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 383102] [NEW] bzr search can't find non-ascii text

 status triaged
 importance medium

> As you could see search tries to search unicode text in the plain file
> (cp1251 encoded). It's never could match.

So, if the file content was utf8 it would be fine. Is there some way
bzr-search can determine the encoding of the file at the time it indexes
it? I know we can use the BOM for unicode text files. Perhaps there is a
library out there that can do a good job.

bzr-search needs a fixed index it can lookup in quickly, so it needs to
generate unicode terms from the files it indexes. To date its been
pretty simplistic and assumed all content was utf8 :- clearly not
true :).

-ROb