IVLE does not serve output from cgitb

Bug #979598 reported by Marco Lui on 2012-04-12
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

Steps to reproduce

Attempt to serve the following:

import cgitb; cgitb.enable();raise ValueError​

Expected outcome:
output of cgitb served as text/html

Actual outcome:
warning about invalid CGI header, followed by

/usr/lib/python2.6/cgitb.py:173: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  value = pydoc.html.repr(getattr(evalue, name))
<!--: spam
Content-Type: text/html

David Coles (dcoles) wrote :

Interesting. I guess the cgitb module is trying to generate output that is both valid as CGI and HTML. Unfortunately I don't believe that "<!--" is a valid field name for a HTTP header, hence why IVLE rejects it (http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 states "field-name" must be a valid token and http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 explicitly disallows the separator "<" from being a valid token). This is probably an upstream Python bug.

Despite this, not working with Python's built-in cgitb module is a pretty poor experience.

The second warning is due to cgitb module needing to be updated in the Python standard library. Looks like it's fixed in newer versions of Python, but in the meantime you can silence it using the warning module (http://docs.python.org/library/warnings.html).

Changed in ivle:
status: New → Triaged
importance: Undecided → Medium
Matt Giuca (mgiuca) wrote :

Whoa. I didn't know cgitb existed. But this looks incredibly broken. (As in, cgitb is broken, not us.)

The source for cgitb is quite clear that they are intending to do this -- the "<!--" isn't just some weird bug.

def reset():
    """Return a string that resets the CGI and browser to a known state."""
    return '''<!--: spam
Content-Type: text/html

<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
</font> </font> </font> </script> </object> </blockquote> </pre>
</table> </table> </table> </table> </table> </font> </font> </font>'''

This makes absolutely no sense. Firstly, as David said, it is a syntax error. RFC 3875 defines the 'field-name' production as a sequence of characters not including a "<" sign. So any conforming CGI server (such as IVLE) should NOT parse this.

Secondly, I don't see the point of it. There are two scenarios:
1. An exception is thrown before any bytes have been written to stdout.
2. An exception is thrown after at least one byte has been written to stdout.

In scenario #1, the cgitb module may be useful -- it prints out a formatted exception message in lieu of the normal CGI output. Except that the "<!--: spam" line is causing a syntax error. It would work fine without that line.
In scenario #2, there is no way to "reset" the browser. The comment in the reset() function is nonsensical -- once bytes have been written to stdout (especially once a blank line has been sent), there is no way to "undo" them and set the browser back to a clean page. Any subsequent bytes will become part of the page being rendered. The idea that "<!--" is going to "comment out" the rest of the page and reset the browser... I just don't see how it ever could have worked in any browser or CGI server.

In any case, IVLE already has a very similar functionality. We catch exceptions raised by student CGI scripts and display them nicely in HTML. I don't remember what we do if an exception is raised once bytes have already been written, but I am sure that cgitb doesn't help. I would recommend not using it.

Marking this bug as invalid.

Changed in ivle:
status: Triaged → Invalid
David Coles (dcoles) wrote :

Matt: I think the intent here was that if you had already written the headers and some some HTML, then this might produce somewhat valid graphical output (due to the way it commenting out the headers and closes any potential open tags). It's ugly, but in most "you can get away with almost any broken HTML" browsers, it'll probably give you something a little visually cleaner.

Doing this correctly would require the cgitb module to be aware of the output state of the page (have headers been sent and what is the state of the HTML output), so it could generate a valid output - not possible in general CGI.

A quick workaround is to monkey-patch the cgitb module:

>>> import cgitb
>>> cgitb._reset = cgitb.reset
>>> cgitb.reset = lambda: cgitb._reset().replace('<!--: spam\n','')>>> cgitb.enable()
>>> raise ValueError

Though this will cause an extra header to be printed in the output if you've already sent the CGI header.

David Coles (dcoles) wrote :

Marco: Or even better, just make sure you print your own "Content-Type: text/html" CGI header before enabling the cgitb module. Thus the "<--: spam" will just be treated as a HTML comment and your browser will happily ignore it and IVLE won't complain that you sent a bad CGI header since you didn't. :)

Marco Lui (saffsd) wrote :

Thanks for that. I figured that it was a strange behavior on cgitb's behalf, and I was curious to see what you guys thought. Good to know that according to standards it's cgitb that's broken. I wonder what CGI gateways actually allow HTML comments before headers are received - that's the only way cgitb's approach could work.

In any case, the workaround involving ensuring headers are printed and warnings are suppressed is usable. For future reference, suppress the deprecation warnings with
import warnings;warnings.filterwarnings("ignore",category=DeprecationWarning)
One thing to note is that the warnings.catch_warnings context manager is ineffective - I expect that this is a threading issue?

Another option is to invoke cgitb.enable(format="text"), which causes cgitb to produce text-only output, which is however unreadable in the browser - view source is usable though. IVLE's rendering of exceptions after bytes have been written shares the same problem, as the exception is written without any formatting.

Thanks for your comments!

Matt Giuca (mgiuca) wrote :

David (or anyone else): Can you actually remember what IVLE's behaviour is when an exception is raised after headers have been printed? I tried to find the code that handles it last night, but I've got no idea where it is (it doesn't seem to be in ivle/cgihandler.py -- there is an exception handler there but it's for unexpected exceptions only, not student code exceptions).

William Grant (wgrant) wrote :

Just tested locally. The behaviour in either case is the same: the traceback on stderr is treated as a normal part of the script output, and sent to the browser. But if it happens before the first output then IVLE's headerless warning appears, causing the traceback to be treated as plaintext.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers