Comment 5 for bug 383102

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 383102] [NEW] bzr search can't find non-ascii text

Robert Collins пишет:
> On Thu, 2009-06-04 at 09:01 +0000, Alexander Belchenko wrote:
>> I think bzr-search should use files content "as is", without decoding it
>> to unicode. Because there is currently no way to absolutely correctly
>> guess encoding and bzr has no file properties to attach this sort of
>> info to the committed content.
>
> This works too; OTOH it would be good to handle things like png with
> metadata headers more sensibly.

Well, search for non-ascii text with my encoding patch *does not* work
for me. I don't know how to look at your raw indices, but I can provide
testing branch with russian text.

>> In qbzr we're using special command-line option --encoding to specify
>> file content encoding for diff/annotate. This approach works well.
>> Default encoding is utf-8 there.
>
>> I suggest to provide similar option to search command, e.g.
>>
>> bzr search тест --encoding cp1251
>>
>> so this encoding argument will be used to encode command-line (unicode)
>> argument тест to some specific encoding and then used verbatim to search.
>>
>> Does it make sense for you?
>
> It certainly works better with bzr's lack of knowledge of file
> encodings. But how will bzr-search know how to output the file's
> contents sensibly? (For the hit preview).

There is only one way today: show it "as is". This is how annotate, cat
  and diff works today. And people don't complain. I don't see the way
to make it better without file properties.