Bazaar Keywords Plugin

Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin

Bug #912856 reported by Jim Cresswell on 2012-01-06

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Bazaar Keywords Plugin	New	Undecided	Unassigned

Bug Description

Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin on processing, exiting with UnicodeDecodeError error. An example file is attached. An example traceback is

bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
    return _do_with_cleanups(
  File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
    hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
    delta_from_tree=delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
    delta_from_tree)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
    accelerator_tree, hardlink)
  File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
    ContentFilterContext(tree_path, tree))
  File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
    chunks = filter.writer(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
    return _kw_expander(chunks, context)
  File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
    encoder=encoder)]
  File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
    return result + rest
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)

Currently I'm working around the error by altering expand_keywords to skip the file on error as follows:

def expand_keywords(s, keyword_dicts, context=None, encoder=None, style=None):
"""Replace raw style keywords with another style in a string.

Note: If the keyword is already in the expanded style, the value is
not replaced.

:param s: the string
:param keyword_dicts: an iterable of keyword dictionaries. If values
are callables, they are executed to find the real value.
:param context: the parameter to pass to callable values
:param style: the style of expansion to use of None for the default
:return: the string with keywords expanded
"""
_expanded_style = _keyword_style_registry.get(style)
result = ''
rest = s

while (True):
  match = _KW_RAW_RE.search(rest)
  if not match:
   break
  result += rest[:match.start()]
  keyword = match.group(1)
  expansion = _get_from_dicts(keyword_dicts, keyword)
  if callable(expansion):
   try:
    expansion = expansion(context)
   except AttributeError, err:
    if 'error' in debug.debug_flags:
     trace.note("error evaluating %s for keyword %s: %s",
      expansion, keyword, err)
    expansion = "(evaluation error)"
  if expansion is None:
   # Unknown expansion - leave as is
   result += match.group(0)
   rest = rest[match.end():]
   continue
  if '$' in expansion:
   # Expansion is not safe to be collapsed later
   expansion = "(value unsafe to expand)"
  if encoder is not None:
   expansion = encoder(expansion)
  params = {'name': keyword, 'value': expansion}
  result += _expanded_style % params
  rest = rest[match.end():]

""" BODGE: Handle UTF-8 characters not included in ASCII by skipping the file """
try:
  finalResult = result + rest
except:
  e = sys.exc_info()[1]
  print 'It all went wrong: %s' % e
  print 'skipping file\n'
  return s
else:
  return finalResult

Obviously this is a pretty hacky 'solution', I am very new to Python and I would be grateful for any advice :) The environment is 64bit Red Hat.

Tags:

Revision history for this message

Jim Cresswell (jim-cresswell) wrote on 2012-01-06:

Example file with valid UTF-8 non-ASCII characters Edit (75 bytes, application/x-httpd-php)

Revision history for this message

Jelmer Vernooij (jelmer) wrote on 2012-01-07: Re: [Bug 912856] [NEW] Files containing valid UTF-8 characters not in the ASCII character set crash the Keyword Plugin

Download full text (3.5 KiB)

Hi Jim,

Thanks for looking into this. I think there are more people who have run
into this, so there are probably a few dupes of this report.

Am 06/01/12 18:41, schrieb Jim Cresswell:
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
> return the_callable(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
> ret = run(*run_argv)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
> return self.run(**all_cmd_args)
> File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
> return self._operation.run_simple(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
> return _do_with_cleanups(
> File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
> result = func(*args, **kwargs)
> File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
> hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
> accelerator_tree=accelerator_tree, hardlink=hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
> delta_from_tree=delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
> delta_from_tree)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
> accelerator_tree, hardlink)
> File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
> ContentFilterContext(tree_path, tree))
> File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
> chunks = filter.writer(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
> return _kw_expander(chunks, context)
> File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
> encoder=encoder)]
> File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
> return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case,
rather than skipping the keyword substitution altogether.

I think we shou...

Hi Jim,

Thanks for looking into this. I think there are more people who have run 
into this, so there are probably a few dupes of this report.

Am 06/01/12 18:41, schrieb Jim Cresswell:
> Public bug reported:
>
> Files containing valid UTF-8 characters not in the ASCII character set
> crash the Keyword Plugin on processing, exiting with UnicodeDecodeError
> error. An example file is attached. An example traceback is
>
> bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode
> byte 0xe2 in position 4964: ordinal not in range(128)
>
> Traceback (most recent call last):
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
>      return the_callable(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 1126, in run_bzr
>      ret = run(*run_argv)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
>      return self.run(**all_cmd_args)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 713, in run
>      return self._operation.run_simple(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 134, in run_simple
>      return _do_with_cleanups(
>    File "/usr/lib64/python2.4/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
>      result = func(*args, **kwargs)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 1328, in run
>      accelerator_tree, hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/branch.py", line 1452, in create_checkout
>      hardlink=hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/bzrdir.py", line 1293, in create_workingtree
>      accelerator_tree=accelerator_tree, hardlink=hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/workingtree_4.py", line 1475, in initialize
>      delta_from_tree=delta_from_tree)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2417, in build_tree
>      delta_from_tree)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2516, in _build_tree
>      accelerator_tree, hardlink)
>    File "/usr/lib64/python2.4/site-packages/bzrlib/transform.py", line 2586, in _create_files
>      ContentFilterContext(tree_path, tree))
>    File "/usr/lib64/python2.4/site-packages/bzrlib/filters/__init__.py", line 170, in filtered_output_bytes
>      chunks = filter.writer(chunks, context)
>    File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 281, in _normal_kw_expander
>      return _kw_expander(chunks, context)
>    File "/home/jim/.bazaar/plugins/keywords/keywords.py", line 276, in _kw_expander
>      encoder=encoder)]
>    File "/home/user1/.bazaar/plugins/keywords/keywords.py", line 244, in expand_keywords
>      return result + rest
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4964: ordinal not in range(128)
>
> Currently I'm working around the error by altering expand_keywords to
> skip the file on error as follows:
It should be possible to actually do something sensible in this case, 
rather than skipping the keyword substitution altogether.

I think we should probably just consider the file a byte stream, and add 
the keyword expansion as utf-8-encoded (for lack of something better).

The best way to go about fixing this is probably to add a test that 
reproduces the problematic scenario, and then fixing that. Perhaps 
fixing it is as simple as adding .encode('utf-8') in a few (right) 
places, but I'm not sure. Let me know if I can give you any more pointers.

Cheers,

Jelmer

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Example file with valid UTF-8 non-ASCII characters Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.