Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin

Bug #912856 reported by Jim Cresswell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Keywords Plugin
New
Undecided
Unassigned

Bug Description

Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin on processing, exiting with UnicodeDecodeError error. An example file is attached. An example traceback is

bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
    return _do_with_cleanups(
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
    hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
    delta_from_tree=delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
    delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
    ContentFilterContext(tree_path, tree))
  File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
    chunks = filter.writer(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
    return _kw_expander(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
    encoder=encoder)]
  File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
    return result + rest
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Currently I'm working around the error by altering expand_keywords to skip the file on error as follows:

def expand_keywords(s, keyword_dicts, context=None, encoder=None, style=None):
 """Replace raw style keywords with another style in a string.

 Note: If the keyword is already in the expanded style, the value is
 not replaced.

 :param s: the string
 :param keyword_dicts: an iterable of keyword dictionaries. If values
   are callables, they are executed to find the real value.
 :param context: the parameter to pass to callable values
 :param style: the style of expansion to use of None for the default
 :return: the string with keywords expanded
 """
 _expanded_style = _keyword_style_registry.get(style)
 result = ''
 rest = s

 while (True):
  match = _KW_RAW_RE.search(rest)
  if not match:
   break
  result += rest[:match.start()]
  keyword = match.group(1)
  expansion = _get_from_dicts(keyword_dicts, keyword)
  if callable(expansion):
   try:
    expansion = expansion(context)
   except AttributeError, err:
    if 'error' in debug.debug_flags:
     trace.note("error evaluating %s for keyword %s: %s",
      expansion, keyword, err)
    expansion = "(evaluation error)"
  if expansion is None:
   # Unknown expansion - leave as is
   result += match.group(0)
   rest = rest[match.end():]
   continue
  if '$' in expansion:
   # Expansion is not safe to be collapsed later
   expansion = "(value unsafe to expand)"
  if encoder is not None:
   expansion = encoder(expansion)
  params = {'name': keyword, 'value': expansion}
  result += _expanded_style % params
  rest = rest[match.end():]

 """ BODGE: Handle UTF-8 characters not included in ASCII by skipping the file """
 try:
  finalResult = result + rest
 except:
  e = sys.exc_info()[1]
  print 'It all went wrong: %s' % e
  print 'skipping file\n'
  return s
 else:
  return finalResult

Obviously this is a pretty hacky 'solution', I am very new to Python and I would be grateful for any advice :) The environment is 64bit Red Hat.

Tags: ascii
Revision history for this message
Jim Cresswell (jim-cresswell) wrote :
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 912856] [NEW] Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin
Download full text (3.5 KiB)

Hi Jim,

Thanks for looking into this. I think there are more people who have run
into this, so there are probably a few dupes of this report.

Am 06/01/12 18:41, schrieb Jim Cresswell:
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
> return the_callable(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
> ret = run(*run_argv)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
> return self.run(**all_cmd_args)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
> return self._operation.run_simple(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
> return _do_with_cleanups(
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
> result = func(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
> hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
> accelerator_tree=accelerator_tree, hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
> delta_from_tree=delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
> delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
> ContentFilterContext(tree_path, tree))
> File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
> chunks = filter.writer(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
> return _kw_expander(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
> encoder=encoder)]
> File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
> return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case,
rather than skipping the keyword substitution altogether.

I think we shou...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers