Bazaar

Bug #1195783
Comment #2

Comment 2 for bug 1195783

Revision history for this message

Toshio Kuratomi (toshio) wrote on 2013-07-04:

Okay... here's a haaaaack that worksaround the tracebacks. I encountered a problem with the code. The comment in this sample explains the second problem as well. I'm going to apply this to the fedora package under the assumption that text/ui.py only affects the user interface and therefore it's fine to use errors='replace' there. If that's not the case, someone please stop me! I wouldn't want to corrupt someone's branch with replacement characters by mistake. (But bzr version issuing a traceback looks really bad for bzr so I figure it's worth the risk).

You could apply this to bzr if you want but the right way is probably much more invasive:

1) re-evaluate the test cases and switch to using some form of StreamWriter around a StringIO instead of a raw StringIO.
2) Either change your calling code to make sure that you always send byte str to test/ui.py::write() or add something to write() to make sure if the string is unicode it can be encoded in the StreamWriter's encoding.

        try:
            self.wrapped_stream.write(to_write)
        except UnicodeError:
            # Hack around several problems:
            # StreamWriters cannot handle non-ascii byte strs. So we have to
            # make sure that it is getting a unicode string.
            #
            # If we have a unicode string containing characters not available
            # in the stream's encoding, then we'll traceback here. So we have
            # to round trip to bytes and back to unicode, getting rid of the
            # characters we can't handle along the way.
            if isinstance(to_write, unicode):
                to_write = to_write.decode(self.encoding or 'ascii', errors='replace')
            to_write = unicode(to_write, encoding=self.encoding or 'ascii', errors='replace')
            self.wrapped_stream.write(to_write)

Without this patch, bzr selftests (with qbzr installed) results are:

FAILED (failures=10, errors=29, known_failure_count=62)
1093 tests skipped

With the patch, bzr selftests results are:
FAILED (failures=7, errors=25, known_failure_count=62)
1093 tests skipped

You might also be interested in these functions from the kitchen library:

* getwriter() -- A replacement for codecs.getwriter() that aims to provider a StreamWriter replacement that does not traceback.
  * Docs: http://pythonhosted.org/kitchen/api-text-converters.html#kitchen.text.converters.getwriter
  * Code: http://bzr.fedorahosted.org/bzr/kitchen/devel/annotate/head:/kitchen/text/converters.py#L301
* byte_string_valid_encoding -- Detect whether a byte str is valid in a particular encoding.
  * Docs: http://pythonhosted.org/kitchen/api-text-misc.html#kitchen.text.misc.byte_string_valid_encoding
  * Code: http://bzr.fedorahosted.org/bzr/kitchen/devel/annotate/head:/kitchen/text/misc.py#L343

If you don't want another dependency but would like to use some of that code, it is licensed under the LGPLv2+ so you should be able to copy it. Note that if signing the Canonical Contributor Agreement is a concern for you to do that, I have signed the Agreement but we'd have to look at who contributed to those functions to see whether I was the only contributor.

Okay... here's a haaaaack that worksaround the tracebacks.  I encountered a problem with the code.  The comment in this sample explains the second problem as well.  I'm going to apply this to the fedora package under the assumption that text/ui.py only affects the user interface and therefore it's fine to use errors='replace' there.  If that's not the case, someone please stop me!  I wouldn't want to corrupt someone's branch with replacement characters by mistake.  (But bzr version issuing a traceback looks really bad for bzr so I figure it's worth the risk).

You could apply this to bzr if you want but the right way is probably much more invasive:

try:
            self.wrapped_stream.write(to_write)
        except UnicodeError:
            # Hack around several problems:
            # StreamWriters cannot handle non-ascii byte strs.  So we have to
            # make sure that it is getting a unicode string.
            #
            # If we have a  unicode string containing characters not available
            # in the stream's encoding, then we'll traceback here.  So we have
            # to round trip to bytes and back to unicode, getting rid of the
            # characters we can't handle along the way.
            if isinstance(to_write, unicode):
                to_write = to_write.decode(self.encoding or 'ascii', errors='replace')
            to_write = unicode(to_write, encoding=self.encoding or 'ascii', errors='replace')
            self.wrapped_stream.write(to_write)

Without this patch, bzr selftests (with qbzr installed) results are:

FAILED (failures=10, errors=29, known_failure_count=62)
  1093 tests skipped

With the patch, bzr selftests results are:
  FAILED (failures=7, errors=25, known_failure_count=62)
  1093 tests skipped

You might also be interested in these functions from the kitchen library:

* getwriter() -- A replacement for codecs.getwriter() that aims to provider a StreamWriter replacement that does not traceback.
  * Docs: http://pythonhosted.org/kitchen/api-text-converters.html#kitchen.text.converters.getwriter
  *  Code: http://bzr.fedorahosted.org/bzr/kitchen/devel/annotate/head:/kitchen/text/converters.py#L301
* byte_string_valid_encoding -- Detect whether a byte str is valid in a particular encoding.
  * Docs: http://pythonhosted.org/kitchen/api-text-misc.html#kitchen.text.misc.byte_string_valid_encoding
  * Code: http://bzr.fedorahosted.org/bzr/kitchen/devel/annotate/head:/kitchen/text/misc.py#L343

If you don't want another dependency but would like to use some of that code, it is licensed under the LGPLv2+ so you should be able to copy it.  Note that if signing the Canonical Contributor Agreement is a concern for you to do that, I have signed the Agreement but we'd have to look at who contributed to those functions to see whether I was the only contributor.