Hi Jim,

Thanks for looking into this. I think there are more people who have run 
into this, so there are probably a few dupes of this report.

Am 06/01/12 18:41, schrieb Jim Cresswell:
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
>      return the_callable(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
>      ret = run(*run_argv)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
>      return self.run(**all_cmd_args)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
>      return self._operation.run_simple(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
>      return _do_with_cleanups(
>    File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
>      result = func(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
>      accelerator_tree, hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
>      hardlink=hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
>      accelerator_tree=accelerator_tree, hardlink=hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
>      delta_from_tree=delta_from_tree)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
>      delta_from_tree)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
>      accelerator_tree, hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
>      ContentFilterContext(tree_path, tree))
>    File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
>      chunks = filter.writer(chunks, context)
>    File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
>      return _kw_expander(chunks, context)
>    File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
>      encoder=encoder)]
>    File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
>      return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case, 
rather than skipping the keyword substitution altogether.

I think we should probably just consider the file a byte stream, and add 
the keyword expansion as utf-8-encoded (for lack of something better).

The best way to go about fixing this is probably to add a test that 
reproduces the problematic scenario, and then fixing that. Perhaps 
fixing it is as simple as adding .encode('utf-8') in a few (right) 
places, but I'm not sure. Let me know if I can give you any more pointers.

Cheers,

Jelmer