bzr search plugin

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #383102
Comment #1

Comment 1 for bug 383102

Revision history for this message

Robert Collins (lifeless) wrote on 2009-06-04: Re: [Bug 383102] [NEW] bzr search can't find non-ascii text

status triaged
importance medium

> As you could see search tries to search unicode text in the plain file
> (cp1251 encoded). It's never could match.

So, if the file content was utf8 it would be fine. Is there some way
bzr-search can determine the encoding of the file at the time it indexes
it? I know we can use the BOM for unicode text files. Perhaps there is a
library out there that can do a good job.

bzr-search needs a fixed index it can lookup in quickly, so it needs to
generate unicode terms from the files it indexes. To date its been
pretty simplistic and assumed all content was utf8 :- clearly not
true :).

-ROb