Comment 14 for bug 891186

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 891186] Re: analyse_oops_reports raised an UnicodeEncodeError loading an oops from the filesystem

On Tue, Nov 22, 2011 at 3:54 AM, Martin Packman
<email address hidden> wrote:
> Robert, the problem with your version of the change is you're baking in
> the bug for later. If non-ascii bytestrings need to be output (which
> doesn't seem to be the case currently), you get the following situation:
>
>>>> "%s %s" % (_escape(u"\xa7"), _escape("\xa7"))
> '\xc2\xa7 \xa7'
>>>> _.decode("utf-8")
> Traceback (most recent call last):
>  ...
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 3: invalid start byte
>
> Creating a html with any non-utf-8 inputs will result in a page that is
> likely to display incorrectly, because it contains a mix of different
> encodings. Avoiding that by ensuring text inputs are all unicode is
> neater.

The reality is that they aren't all unicode, and we need to escape
them in some sensible fashion. I don't want to go around in circles on
this - I will have a look today at the escaping more closely; the
basic constraints I see are;
 - crap data should be understandable (e.g. repr() style output)
 - the report should generate
 - we don't want any xss attack vectors.

Right now, because we're *not* using the django template engine for
this, its quite possible that all three goals are not reached.

-Rob