UnicodeDecodeError from bzr version if platform contains non-ASCII
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
High
|
Vincent Ladeuil |
Bug Description
https:/
The bzr version command can traceback if there are non-ASCII characters in the system's platform information.
bzr: ERROR: exceptions.
Traceback (most recent call last):
File "/usr/lib64/
return the_callable(*args, **kwargs)
File "/usr/lib64/
ret = run(*run_argv)
File "/usr/lib64/
return self.run(
File "/usr/lib64/
return self._operation
File "/usr/lib64/
self.cleanups, self.func, *args, **kwargs)
File "/usr/lib64/
result = func(*args, **kwargs)
File "/usr/lib64/
result = func(*args, **kwargs)
File "/usr/lib64/
show_
File "/usr/lib64/
to_file.write(" Platform: %s\n" % platform.
File "/usr/lib64/
self.
File "/usr/lib64/
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 66: ordinal not in range(128)
bzr 2.5.1 on python 2.7.5 (Linux-
fedora-
arguments: ['/usr/bin/bzr', 'version']
plugins: bash_completion
fastimport[
news_
encoding: 'utf-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
*** Bazaar has encountered an internal error. This probably indicates a
bug in Bazaar. You can help us fix it by filing a bug report at
https:/
including this traceback and a description of the problem.
Bazaar (bzr) 2.5.1
Python interpreter: /usr/bin/python 2.7.5
Python standard library: /usr/lib64/
Platform is being taken from the stdlib platform.platform() function. On Fedora this is reading the information from /etc/fedora-
If the name inside of this file contains non-ASCII characters, then a non-ascii byte string (str) is passed into bzrlib.
The easiest fix for this is to make sure to transform the string into unicode before handing it to the StreamWriter. Modifying ui.text.
def write(self, to_write):
if isinstance(
I'm not submitting this as a patch because most projects have their own helper function that does the unicode transformation and their own policies on what the default encoding should be (utf-8 and the user's locale setting being the top two choices I've seen). The important thing to note is that you have a suitable "errors=" setting so that bytes that are undecodable in the default chosen encoding do not end up causing a traceback when simply printing something for the user to read on the screen.
Related branches
- John A Meinel: Approve
-
Diff: 105 lines (+39/-8)3 files modifiedbzrlib/tests/test_version.py (+32/-5)
bzrlib/version.py (+3/-3)
doc/en/release-notes/bzr-2.6.txt (+4/-0)
Changed in bzr: | |
assignee: | nobody → Vincent Ladeuil (vila) |
Changed in bzr: | |
status: | Confirmed → In Progress |
Changed in bzr: | |
milestone: | none → 2.6b3 |
status: | In Progress → Fix Released |
Changed in bzr: | |
milestone: | 2.6b3 → 2.6.0 |
Two further comments to this issue:
First, an update to the code that I think is correct: I noticed that the TextUIOutputStream has an encoding attribute. I believe we can use this in the write() method something like this:
def write(self, to_write):
self.ui_ factory. clear_term( ) to_write, str):
encoding = self.encoding or 'ascii'
to_ write = unicode(to_write, encoding=encoding, errors='replace')
self.wrapped_ stream. write(to_ write)
if isinstance(
# Assume ascii if encoding is undefined because it's the only encoding that is
# guaranteed not to traceback in the StreamWriter.
You probably noticed the comment about falling back to ascii if encoding wasn't specified. I wasn't sure in what instances encoding wouldn't be set and whether we might be able to be a bit more lenient (and use utf-8 or the user's locale rather than ascii). I tried to run bzr selftests to determine that. Unfortunately, that lead to a second issue:
I ran bzr selftests both with and without the changes I mentioned to ui.text. py::TextUIOutpu tStream. write() . I found that there are a few selftest failures without the patch applied which point out the issue with using non-ascii byte str here but with the patch there are many more failures. After instrrumenting the code I found information like this:
45.5s write: type(to_ write)= <type 'unicode'> write)= 'u\'"\\ u0422\\ u0435\\ u0441\\ u04422" \\n\'' stream= <bzrlib. tests.StringIOW rapper object>
45.5s write: repr(to_
45.5s write: self.wrapped_
This is showing that the selftests that are failing with the patch are using a tests.StringIOW rapper instead of a StreamWriter. I believe that this means the test cases are incorrect. cStringIO can't accept unicode strings which are non-ascii:
http:// docs.python. org/2/library/ stringio. html#cStringIO. StringIO
So StringIOWrapper needs to be given a byte str object while StreamWriter needs to be given unicode type objects. The selftests seem to be incorrectly trying to substitute StringIOWrapper for StreamWriter so the tests themselves are broken, not the code.