Hi Jim, Thanks for looking into this. I think there are more people who have run into this, so there are probably a few dupes of this report. Am 06/01/12 18:41, schrieb Jim Cresswell: > Public bug reported: > > Files containing valid UTF-8 characters not in the ASCII character set > crash the Keyword Plugin on processing, exiting with UnicodeDecodeError > error. An example file is attached. An example traceback is > > bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode > byte 0xe2 in position 4964: ordinal not in range(128) > > Traceback (most recent call last): > File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code > return the_callable(*args, **kwargs) > File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr > ret = run(*run_argv) > File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases > return self.run(**all_cmd_args) > File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run > return self._operation.run_simple(*args, **kwargs) > File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple > return _do_with_cleanups( > File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups > result = func(*args, **kwargs) > File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run > accelerator_tree, hardlink) > File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout > hardlink=hardlink) > File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree > accelerator_tree=accelerator_tree, hardlink=hardlink) > File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize > delta_from_tree=delta_from_tree) > File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree > delta_from_tree) > File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree > accelerator_tree, hardlink) > File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files > ContentFilterContext(tree_path, tree)) > File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes > chunks = filter.writer(chunks, context) > File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander > return _kw_expander(chunks, context) > File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander > encoder=encoder)] > File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords > return result + rest > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128) > > Currently I'm working around the error by altering expand_keywords to > skip the file on error as follows: It should be possible to actually do something sensible in this case, rather than skipping the keyword substitution altogether. I think we should probably just consider the file a byte stream, and add the keyword expansion as utf-8-encoded (for lack of something better). The best way to go about fixing this is probably to add a test that reproduces the problematic scenario, and then fixing that. Perhaps fixing it is as simple as adding .encode('utf-8') in a few (right) places, but I'm not sure. Let me know if I can give you any more pointers. Cheers, Jelmer