Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin

Bug #912856 reported by Jim Cresswell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Keywords Plugin
New
Undecided
Unassigned

Bug Description

Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin on processing, exiting with UnicodeDecodeError error. An example file is attached. An example traceback is

bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
    return _do_with_cleanups(
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
    hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
    delta_from_tree=delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
    delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
    ContentFilterContext(tree_path, tree))
  File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
    chunks = filter.writer(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
    return _kw_expander(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
    encoder=encoder)]
  File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
    return result + rest
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Currently I'm working around the error by altering expand_keywords to skip the file on error as follows:

def expand_keywords(s, keyword_dicts, context=None, encoder=None, style=None):
 """Replace raw style keywords with another style in a string.

 Note: If the keyword is already in the expanded style, the value is
 not replaced.

 :param s: the string
 :param keyword_dicts: an iterable of keyword dictionaries. If values
   are callables, they are executed to find the real value.
 :param context: the parameter to pass to callable values
 :param style: the style of expansion to use of None for the default
 :return: the string with keywords expanded
 """
 _expanded_style = _keyword_style_registry.get(style)
 result = ''
 rest = s

 while (True):
  match = _KW_RAW_RE.search(rest)
  if not match:
   break
  result += rest[:match.start()]
  keyword = match.group(1)
  expansion = _get_from_dicts(keyword_dicts, keyword)
  if callable(expansion):
   try:
    expansion = expansion(context)
   except AttributeError, err:
    if 'error' in debug.debug_flags:
     trace.note("error evaluating %s for keyword %s: %s",
      expansion, keyword, err)
    expansion = "(evaluation error)"
  if expansion is None:
   # Unknown expansion - leave as is
   result += match.group(0)
   rest = rest[match.end():]
   continue
  if '$' in expansion:
   # Expansion is not safe to be collapsed later
   expansion = "(value unsafe to expand)"
  if encoder is not None:
   expansion = encoder(expansion)
  params = {'name': keyword, 'value': expansion}
  result += _expanded_style % params
  rest = rest[match.end():]

 """ BODGE: Handle UTF-8 characters not included in ASCII by skipping the file """
 try:
  finalResult = result + rest
 except:
  e = sys.exc_info()[1]
  print 'It all went wrong: %s' % e
  print 'skipping file\n'
  return s
 else:
  return finalResult

Obviously this is a pretty hacky 'solution', I am very new to Python and I would be grateful for any advice :) The environment is 64bit Red Hat.

Tags: ascii
Revision history for this message
Jim Cresswell (jim-cresswell) wrote :
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 912856] [NEW] Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin
Download full text (3.5 KiB)

Hi Jim,

Thanks for looking into this. I think there are more people who have run
into this, so there are probably a few dupes of this report.

Am 06/01/12 18:41, schrieb Jim Cresswell:
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
> return the_callable(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
> ret = run(*run_argv)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
> return self.run(**all_cmd_args)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
> return self._operation.run_simple(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
> return _do_with_cleanups(
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
> result = func(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
> hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
> accelerator_tree=accelerator_tree, hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
> delta_from_tree=delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
> delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
> ContentFilterContext(tree_path, tree))
> File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
> chunks = filter.writer(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
> return _kw_expander(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
> encoder=encoder)]
> File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
> return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case,
rather than skipping the keyword substitution altogether.

I think we shou...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.