python-oops-tools

analyse_oops_reports raised an UnicodeEncodeError loading an oops from the filesystem

Bug #891186 reported by Diogo Matsubara on 2011-11-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	python-oops-tools	Fix Released	Critical	Martin Packman

Bug Description

Traceback (most recent call last):
  File "bin/analyse_error_reports", line 49, in <module>
    oopstools.scripts.analyse_error_reports.main()
  File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/scripts/analyse_error_reports.py", line 137, in main
    summary.renderHTML(fp)
  File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dbsummaries.py", line 681, in renderHTML
    section.renderHTML(fp)
  File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dbsummaries.py", line 307, in renderHTML
    self._renderGroups(fp, html=True)
  File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dbsummaries.py", line 281, in _renderGroups
    group.renderHTML(fp)
  File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dbsummaries.py", line 177, in renderHTML
    % (self.count, _escape(self.etype), _escape(self.evalue)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 78: ordinal not in range(128)

The oops that caused the error: OOPS-fc231a0a7471d55e72e091f0bcc135eb

Related branches

lp:~lifeless/python-oops-tools/bug-891186

lp:~gz/python-oops-tools/dbsummaries_utf8_encode_891186

Merged into lp:python-oops-tools at revision 29

Graham Binns (community): Approve (code) on 2011-12-13

Diogo Matsubara (matsubara) on 2011-11-16

Changed in python-oops-tools:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

Martin Packman (gz) wrote on 2011-11-16:

This is a pretty simple problem.

oopstools/oops/models.py line 153
exception_value = models.CharField(max_length=MAX_EVALUE_LEN)

Django deals in unicode.

  oopstools/scripts/analyse_error_reports.py lines 136-138
    fp = open(options.html, 'wb')
    summary.renderHTML(fp)
    fp.close()

Opens a file with no encoding specified.

  oopstools/oops/dbsummaries.py lines 175-176
    fp.write('<div class="exc">%d <b>%s</b>: %s</div>\n'
             % (self.count, _escape(self.etype), _escape(self.evalue)))

Here `_escape()` is just a dumb cgi.escape wrapper that handles '&<>' and nothing else, and returns unicode when given unicode. Any value being written that can't be coerced to ascii will throw.

Using a better means of creating html here would be nice, but just settling on an encoding and making string values written conform to it would help.

Diogo Matsubara (matsubara) on 2011-11-16

no longer affects:

oops-tools

Revision history for this message

Martin Packman (gz) wrote on 2011-11-16:

Possible fix, requires callers to be consistent about passing unicode Edit (623 bytes, text/plain)

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-16:

That fix assumes all values will be unicode - something that isn't (currently) guaranteed. referrer is probably (or should be) bytestrings [its an http header: they are bytestrings on the wire, and for all upstream python posturing, a unicode http header is ill defined; particularly as there is now a bis recommendation for how to put unicode into a header: headers that are badly encoded will still exist].

Essentially we have a mix of data sources, so I think this needs to be type checking.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-16:

patch (no test case) Edit (576 bytes, text/plain)

What would really help here is a failing test. The attached patch passes all existing tests, so isn't a regression.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-16:

I've attached a branch as well, I won't propose for merge because its lacking a test - that might be something to add. I'd create an OOPS on the fly for this test I think.

Revision history for this message

Martin Packman (gz) wrote on 2011-11-17:

Adding an isinstance there is one way Robert, but you still get a page out at the end with no well-defined encoding, which may display incorrectly in browsers and such like. If any current inputs aren't unicode, or may have other problematic bytes such as control characters, it would be best to escape those to a form that will actually display usefully in a webpage before writing them out. That could either be done in the _escape function, or ideally much earlier.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-17: Re: [Bug 891186] Re: analyse_oops_reports raised an UnicodeEncodeError loading an oops from the filesystem

On Thu, Nov 17, 2011 at 2:35 PM, Martin Packman
<email address hidden> wrote:
> Adding an isinstance there is one way Robert, but you still get a page
> out at the end with no well-defined encoding, which may display

Thats not (at all) true AFAICT.

Revision history for this message

Selene ToyKeeper (toykeeper) wrote on 2011-11-17:

diff from merge 63908 Edit (863 bytes, text/plain)

I tried to merge a fix for the same problem a while back... it got rejected because it allows oops-tools to accept out-of-spec data (which was kind of the point).

Anyway, the implementation was very simple, and I've attached it.

Revision history for this message

Martin Packman (gz) wrote on 2011-11-17:

Yes Selene, your merge proposal was mentioned on IRC and is basically the same just from the other end. Will try and make sure something gets landed this time around. :)

Revision history for this message

Diogo Matsubara (matsubara) wrote on 2011-11-21:

#10

I added a branch with a test case that reproduces the problem: lp:~matsubara/oops-tools/891186-unicodeencodeerror

Revision history for this message

Diogo Matsubara (matsubara) wrote on 2011-11-21:

#11

When I merge in your patch Robert, I get another failure, this time with the Contains matcher.

======================================================================
ERROR: test_renderHTML_with_unicode_data (test_dbsummaries.TestWebAppErrorSummary)
test_dbsummaries.TestWebAppErrorSummary.test_renderHTML_with_unicode_data
----------------------------------------------------------------------
_StringException: Traceback (most recent call last):
  File "/home/matsubara/devel/canonical/oops-tools/891186-unicodeencodeerror/src/oopstools/oops/test/test_dbsummaries.py", line 52, in test_renderHTML_with_unicode_data
    self.assertThat(fp.read(), Contains(u'some unicode char (\xe1)'))
  File "/home/matsubara/devel/canonical/oops-tools/trunk/eggs/testtools-0.9.12-py2.7.egg/testtools/testcase.py", line 400, in assertThat
    mismatch = matcher.match(matchee)
  File "/home/matsubara/devel/canonical/oops-tools/trunk/eggs/testtools-0.9.12-py2.7.egg/testtools/matchers.py", line 606, in match
    if self.needle not in matchee:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2272: ordinal not in range(128)

Revision history for this message

Martin Packman (gz) wrote on 2011-11-21:

#12

You can only read and write bytes from cStringIO.StringIO, so the problem with this assertion is it's comparing bytes with unicode:

fp.seek(0)
self.assertThat(fp.read(), Contains(u'some unicode char (\xe1)'))

I'd write it instead asserting the UTF-8 byte string:

self.assertThat(fp.getvalue() Contains("some unicode char (\xc3\xa1)"))

Or with `.decode("utf-8")` after the getvalue if you want to keep the unicode Contains which is neater, especially with older testtools versions.

Revision history for this message

Martin Packman (gz) wrote on 2011-11-21:

#13

Robert, the problem with your version of the change is you're baking in the bug for later. If non-ascii bytestrings need to be output (which doesn't seem to be the case currently), you get the following situation:

>>> "%s %s" % (_escape(u"\xa7"), _escape("\xa7"))
'\xc2\xa7 \xa7'
>>> _.decode("utf-8")
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 3: invalid start byte

Creating a html with any non-utf-8 inputs will result in a page that is likely to display incorrectly, because it contains a mix of different encodings. Avoiding that by ensuring text inputs are all unicode is neater.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-21:

#14

On Tue, Nov 22, 2011 at 3:54 AM, Martin Packman
<email address hidden> wrote:
> Robert, the problem with your version of the change is you're baking in
> the bug for later. If non-ascii bytestrings need to be output (which
> doesn't seem to be the case currently), you get the following situation:
>
>>>> "%s %s" % (_escape(u"\xa7"), _escape("\xa7"))
> '\xc2\xa7 \xa7'
>>>> _.decode("utf-8")
> Traceback (most recent call last):
> ...
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 3: invalid start byte
>
> Creating a html with any non-utf-8 inputs will result in a page that is
> likely to display incorrectly, because it contains a mix of different
> encodings. Avoiding that by ensuring text inputs are all unicode is
> neater.

The reality is that they aren't all unicode, and we need to escape
them in some sensible fashion. I don't want to go around in circles on
this - I will have a look today at the escaping more closely; the
basic constraints I see are;
- crap data should be understandable (e.g. repr() style output)
- the report should generate
- we don't want any xss attack vectors.

Right now, because we're *not* using the django template engine for
this, its quite possible that all three goals are not reached.

-Rob

Martin Packman (gz) on 2011-12-09

Changed in python-oops-tools:
assignee:	nobody → Martin Packman (gz)
status:	Triaged → In Progress

Martin Packman (gz) on 2011-12-19

Changed in python-oops-tools:
status:	In Progress → Fix Committed

Robert Collins (lifeless) on 2012-02-01

Changed in python-oops-tools:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.