Logging with a non-ascii string raises UnicodeError in the standard logging package

Bug #317779 reported by philyoon on 2009-01-16
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Low
Unassigned

Bug Description

This is due to the fact that logging.StreamHandler.emit method tries to encode the input with the 'encoding' attribute of the stream. I think you should not use StreamWriter as a file-like object, since StreamWriter isn't intended to accept str inputs. The fix for this issue follows:

=== modified file 'bzrlib/trace.py'
--- bzrlib/trace.py 2009-01-08 17:53:46 +0000
+++ bzrlib/trace.py 2009-01-16 07:39:12 +0000
@@ -269,9 +269,7 @@
         r'%Y-%m-%d %H:%M:%S')
     # after hooking output into bzr_log, we also need to attach a stderr
     # handler, writing only at level info and with encoding
- writer_factory = codecs.getwriter(term_encoding)
- encoded_stderr = writer_factory(sys.stderr, errors='replace')
- stderr_handler = logging.StreamHandler(encoded_stderr)
+ stderr_handler = logging.StreamHandler(sys.stderr)
     stderr_handler.setLevel(logging.INFO)
     logging.getLogger('bzr').addHandler(stderr_handler)

philyoon (philyoon) wrote :

Whitespaces in the description seem to be merged. Attaching the diff file.

INADA Naoki (songofacandy) wrote :

It may be happen because non-ascii byte string was logged.
All of non-ascii string should be unicode str.

Martin Pool (mbp) wrote :

I'm trying to get rid of this use of logging in https://code.edge.launchpad.net/~mbp/bzr/remove-logging so this bug should be taken into account there, and vice versa.

John A Meinel (jameinel) wrote :

Martin, do you have an idea of the status of this bug now that some of your logging changes have landed?

Changed in bzr:
importance: Undecided → Low
status: New → Incomplete
Martin Pool (mbp) wrote :

I wouldn't assume it's fixed because not all of that work has landed yet. We do still using logging.

Changed in bzr:
status: Incomplete → Confirmed
tags: added: logging trace unicode
Martin Pool (mbp) on 2010-03-15
Changed in bzr:
assignee: nobody → Martin Pool (mbp)
status: Confirmed → In Progress
Martin Pool (mbp) wrote :

You can reproduce this from a Python shell with

>>> trace.note(u"\u68ee".encode("utf-8"))
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

just emitting a unicode object does work, and it gets converted according to the error handling of the converter:

>>> trace.note(u"\u68ee")
?

With this patch applied, in a C locale, you cannot log unicode strings, but you can log non-ascii byte strings and they get passed through as such:

In [10]: trace.note(u"\u68ee hello")
Traceback (most recent call last):
  File "/usr/lib/python2.6/logging/__init__.py", line 791, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

In [11]: trace.note(u"\u68ee hello".encode('utf-8'))
森 hello

I'm not sure that's really an improvement though. I think generally we want the input to the log functions to be unicode strings, and for them to have errors=replace treatment going to the log file or stderr.

Can you tell me more about the context where you hit this?

Changed in bzr:
assignee: Martin Pool (mbp) → nobody
status: In Progress → Confirmed
Jelmer Vernooij (jelmer) on 2017-11-09
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments