testtools should be able to cope with non-ascii tracebacks

Bug #501166 reported by Martin Packman
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
testtools
Fix Released
Critical
Martin Packman

Bug Description

The testtools.content.TracebackContent class is not robust against tracebacks where either the exception value, a script filename, or a script line contains non-ascii text. The particular problem is with the lines:

    value = self._result._exc_info_to_string(err, test)
    super(TracebackContent, self).__init__(content_type,
        lambda:[value.encode("utf8")])

Here value is a str object, trying to encode it uses the default encoding (generally ascii) to decode it to unicode before encoding as UTF-8 and throws if the str is not 7-bit clean.

This code is fixable, however I think the mime-types model for test metadata is overly complicated, fragile, and generally a bad idea.

Related branches

Revision history for this message
Martin Packman (gz) wrote :
Revision history for this message
Martin Packman (gz) wrote :
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 501166] [NEW] testtools should be able to cope with non-ascii tracebacks

On Mon, 2009-12-28 at 23:35 +0000, Martin [gz] wrote:
> Public bug reported:
>
> The testtools.content.TracebackContent class is not robust against
> tracebacks where either the exception value, a script filename, or a
> script line contains non-ascii text. The particular problem is with the
> lines:
>
> value = self._result._exc_info_to_string(err, test)
> super(TracebackContent, self).__init__(content_type,
> lambda:[value.encode("utf8")])
>
> Here value is a str object, trying to encode it uses the default
> encoding (generally ascii) to decode it to unicode before encoding as
> UTF-8 and throws if the str is not 7-bit clean.

So, for value to be a 'str' and not 7-bit clean:
 - you need an ascii file system encoding
 - you need non-ascii paths in the file system
 - and ascii symbol names throughout

or
 - test data that raises an AssertionError which has a str that is not
ascii.

Are there other causes for a non-ascii 'str' value there?

Anyhow, clearly we should fix this

 status triaged
 importance critical

I suggest that fixing it 'the right way' will be tricky for Python2.x,
because up till to Python 2.7 str(Exception) blows up badly with unicode
args in a few ways. So we probably need to change the stringification
logic to handle this case and return a 'unicode' value not a 'str'
value. However I think we currently use that largely as-is from Python
core. I can't look at this right now, but I think its very important to
fix.

> This code is fixable, however I think the mime-types model for test
> metadata is overly complicated, fragile, and generally a bad idea.

Well, its been hugely successful for me already in the few places I've
deployed it, so with respect: I disagree. For instance, its already
solved a major bug with subunit + bzr selftest which shows up readily in
selftest --parallel. The code is simple, nearly trivial, and the
specifications (for e.g. MIME) are readily available allowing rampant
reuse by other tools.

Backtraces and test data in /general/ are for showing to humans but also
need to be machine processable (e.g. to look for common patterns or
encode and transmit to remote machines) and MIME has had many many
successes at bridging these two needs including both HTTP and SMTP.

-Rob

Changed in testtools:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Martin Packman (gz) wrote :

As mentioned, I can also get testtools to break with a non-ascii test script name (no other requirements), or non-ascii text on a line of the script where an exception is raised (either latin-1 or as per the encoding set at the top of the file).

This is python 2 specific, but assuming that every stringified traceback conforms to a single encoding is incorrect, even if this particular mistake with trying to encode a str object is fixed. As there is neither a meaningful encoding, content-type, or anything else mime-y, the envelope is pointless. Also note, the 'language' header used by this code is 1) deprecated 2) should be a two-letter code 3) is for actual languages, not programming languages.

Revision history for this message
Robert Collins (lifeless) wrote :

From the duplicate:

Martin [gz] says:
 - str(traceback) is bytes
 in those bytes [here]:
 - filenames are in mbcs
 - code lines are in the encoding of the file that the source is from
 - and the bytes for the error are in the encoding of the error - e.g. totally arbitrary.

We have to defend against three sets of different encodings.

Robert Collins wrote 20 hours ago: #2
Looks like we need to port a copy of the underlying things, and get the encoding etc sorted out in the root data.

Revision history for this message
Martin Packman (gz) wrote :

Ugh, okay, I've got this. The fix discussed with Robert as UDS is to essentially backport the Python 3 string semantics to Python 2, so tracebacks are formatted as unicode. This involves messing with a lot of core modules, but can be done without monkey patching.

Changed in testtools:
assignee: nobody → Martin [gz] (gz)
Jonathan Lange (jml)
Changed in testtools:
status: Triaged → In Progress
Changed in testtools:
milestone: none → next
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.