Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar Keywords Plugin |
New
|
Undecided
|
Unassigned |
Bug Description
Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin on processing, exiting with UnicodeDecodeError error. An example file is attached. An example traceback is
bzr: ERROR: exceptions.
Traceback (most recent call last):
File "/usr/lib64/
return the_callable(*args, **kwargs)
File "/usr/lib64/
ret = run(*run_argv)
File "/usr/lib64/
return self.run(
File "/usr/lib64/
return self._operation
File "/usr/lib64/
return _do_with_cleanups(
File "/usr/lib64/
result = func(*args, **kwargs)
File "/usr/lib64/
accelerator
File "/usr/lib64/
hardlink=
File "/usr/lib64/
accelerator
File "/usr/lib64/
delta_
File "/usr/lib64/
delta_
File "/usr/lib64/
accelerator
File "/usr/lib64/
ContentFilt
File "/usr/lib64/
chunks = filter.
File "/home/
return _kw_expander(
File "/home/
encoder=
File "/home/
return result + rest
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
Currently I'm working around the error by altering expand_keywords to skip the file on error as follows:
def expand_keywords(s, keyword_dicts, context=None, encoder=None, style=None):
"""Replace raw style keywords with another style in a string.
Note: If the keyword is already in the expanded style, the value is
not replaced.
:param s: the string
:param keyword_dicts: an iterable of keyword dictionaries. If values
are callables, they are executed to find the real value.
:param context: the parameter to pass to callable values
:param style: the style of expansion to use of None for the default
:return: the string with keywords expanded
"""
_expanded_style = _keyword_
result = ''
rest = s
while (True):
match = _KW_RAW_
if not match:
break
result += rest[:match.
keyword = match.group(1)
expansion = _get_from_
if callable(
try:
expansion = expansion(context)
except AttributeError, err:
if 'error' in debug.debug_flags:
trace.
expansion, keyword, err)
expansion = "(evaluation error)"
if expansion is None:
# Unknown expansion - leave as is
result += match.group(0)
rest = rest[match.end():]
continue
if '$' in expansion:
# Expansion is not safe to be collapsed later
expansion = "(value unsafe to expand)"
if encoder is not None:
expansion = encoder(expansion)
params = {'name': keyword, 'value': expansion}
result += _expanded_style % params
rest = rest[match.end():]
""" BODGE: Handle UTF-8 characters not included in ASCII by skipping the file """
try:
finalResult = result + rest
except:
e = sys.exc_info()[1]
print 'It all went wrong: %s' % e
print 'skipping file\n'
return s
else:
return finalResult
Obviously this is a pretty hacky 'solution', I am very new to Python and I would be grateful for any advice :) The environment is 64bit Red Hat.
Hi Jim,
Thanks for looking into this. I think there are more people who have run
into this, so there are probably a few dupes of this report.
Am 06/01/12 18:41, schrieb Jim Cresswell: UnicodeDecodeEr ror: 'ascii' codec can't decode python2. 4/site- packages/ bzrlib/ commands. py", line 926, in exception_ to_return_ code python2. 4/site- packages/ bzrlib/ commands. py", line 1126, in run_bzr python2. 4/site- packages/ bzrlib/ commands. py", line 691, in run_argv_aliases **all_cmd_ args) python2. 4/site- packages/ bzrlib/ commands. py", line 713, in run .run_simple( *args, **kwargs) python2. 4/site- packages/ bzrlib/ cleanup. py", line 134, in run_simple python2. 4/site- packages/ bzrlib/ cleanup. py", line 165, in _do_with_cleanups python2. 4/site- packages/ bzrlib/ builtins. py", line 1328, in run python2. 4/site- packages/ bzrlib/ branch. py", line 1452, in create_checkout python2. 4/site- packages/ bzrlib/ bzrdir. py", line 1293, in create_workingtree tree=accelerato r_tree, hardlink=hardlink) python2. 4/site- packages/ bzrlib/ workingtree_ 4.py", line 1475, in initialize tree=delta_ from_tree) python2. 4/site- packages/ bzrlib/ transform. py", line 2417, in build_tree python2. 4/site- packages/ bzrlib/ transform. py", line 2516, in _build_tree python2. 4/site- packages/ bzrlib/ transform. py", line 2586, in _create_files ntext(tree_ path, tree)) python2. 4/site- packages/ bzrlib/ filters/ __init_ _.py", line 170, in filtered_ output_ bytes writer( chunks, context) jim/.bazaar/ plugins/ keywords/ keywords. py", line 281, in _normal_kw_expander chunks, context) jim/.bazaar/ plugins/ keywords/ keywords. py", line 276, in _kw_expander user1/. bazaar/ plugins/ keywords/ keywords. py", line 244, in expand_keywords
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
> File "/usr/lib64/
> return the_callable(*args, **kwargs)
> File "/usr/lib64/
> ret = run(*run_argv)
> File "/usr/lib64/
> return self.run(
> File "/usr/lib64/
> return self._operation
> File "/usr/lib64/
> return _do_with_cleanups(
> File "/usr/lib64/
> result = func(*args, **kwargs)
> File "/usr/lib64/
> accelerator_tree, hardlink)
> File "/usr/lib64/
> hardlink=hardlink)
> File "/usr/lib64/
> accelerator_
> File "/usr/lib64/
> delta_from_
> File "/usr/lib64/
> delta_from_tree)
> File "/usr/lib64/
> accelerator_tree, hardlink)
> File "/usr/lib64/
> ContentFilterCo
> File "/usr/lib64/
> chunks = filter.
> File "/home/
> return _kw_expander(
> File "/home/
> encoder=encoder)]
> File "/home/
> return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case,
rather than skipping the keyword substitution altogether.
I think we shou...