Potential bug in Python cgi.FieldStorage can lead to problematic memory growth

Bug #143308 reported by Chris McDonough
2
Affects Status Importance Assigned to Milestone
Zope 2
Invalid
Medium
Chris McDonough

Bug Description

This is actually a potential Python bug but I'm submitting it here because it's likely we'll need to fix it within Zope for some period of time either by including our own cgi module or by monkeypatching.

When a POST request is used to submit a multipart/form-data form which includes an HTML "file" input element, the resulting multipart input is parsed using Python's cgi.FieldStorage class.

There appear to be two problems with cgi.FieldStorage, both which can manifest themselves as "memory hogs".

The first problem is that if the "content-length" of any part of the multipart input is not provided, FieldStorage appears to default to using a StringIO object to hold the output for the parsing of an individual part. As far as I can tell, individual elements of a multipart/form-data input stream are not required to be decorated with their lengths within an enclosed header ( see http://www.faqs.org/rfcs/rfc1867.html). Although they are not prevented from doing so, but they are not required to do so by the RFC. This seems to mean that every file upload from clients that do not supply a Content-Length header for the individual elements of a multipart input will end up as StringIO objects, and thus in RAM. That said, I've only tested with Firefox, but it does not supply these headers for individual parts of the multipart message. In practice, this appears to mean that all files uploaded via the multipart/form-data HTTP POST protocol will end up entirely in RAM, at least when uploaded by some class of clients, including one of the two major browsers.

This could be fixed by not using a StringIO object at all, but instead by always using a tempfile. This is a heavy handed fix and may be a "speed killer" so it might be better to find another solution that does some heuristic based on the overall length of the input stream.

Another less-commonly-replicated problem in FieldStorage can cause the entire contents of an uploaded file to be read into RAM.

When the uploaded file does not contain any newlines (individual parts of multipart input are not required to contain newlines), the entirety of that part will be read into RAM, as FieldStorage uses "readline()" to attempt to read a chunk of a file at a time. This appears to be a problem within the methods "read_lines_to_eof", "read_lines_to_outerboundary", and "skip_lines". Even if the RFC that defines "multipart/*" encodings required that "well-behaved" content was required to be encoded in such a way that it contained newlines every reasonable number of characters, I'm not sure that it's reasonable to have a strategy that depends on file parts ending properly in newlines when the input comes from arbitrary sources and thus may be intentionally "hostile", so the practice of using "readline()" within these methods should probably cease (or maybe the methods should go away altogether).

I am in the process of attempting to fix this.

Tags: bug zope
Revision history for this message
Chris McDonough (chrism-plope) wrote :

> The first problem is that if the "content-length" of any part of the
> multipart input is not provided, FieldStorage appears to default to using
> a StringIO object to hold the output for the parsing of an individual
> part. As far as I can tell, individual elements of a multipart/form-data
> input stream are not required to be decorated with their lengths within
> an enclosed header ( see http://www.faqs.org/rfcs/rfc1867.html).
> Although they are not prevented from doing so, but they are not required
> to do so by the RFC. This seems to mean that every file upload from
> clients that do not supply a Content-Length header for the individual
> elements of a multipart input will end up as StringIO objects, and thus
> in RAM. That said, I've only tested with Firefox, but it does not supply
> these headers for individual parts of the multipart message. In
> practice, this appears to mean that all files uploaded via the
> multipart/form-data HTTP POST protocol will end up entirely in RAM, at
> least when uploaded by some class of clients, including one of the two
> major browsers.

This analysis is false. FieldStorage only uses a StringIO to store data up to 1000 characters, and thereafter switches to a tempfile.

The analysis of the "newline" problem is still true.

Revision history for this message
Chris McDonough (chrism-plope) wrote :

A fix for this bug has been submitted to the Python SF bug tracker as http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470

Does anyone have an opinion on how (if?) we should fix this for Zope while this patch is being reviewed and before a new Python version is released? The risk of not fixing it is that it is possible for people who have access to make POST requests to Zope to consume arbitrary amounts of memory on the server upon which it is hosted. I haven't yet determined whether the "cgi-maxlen" zope.conf entry is helpful for mitigating this risk.

Revision history for this message
Florent Guillaume (efge) wrote :

I think cgi-maxlen will protect against this, that's what it's for.

Revision history for this message
Chris McDonough (chrism-plope) wrote :

I have checked and setting cgi-maxlen does indeed prevent egregiously large POST uploads from browsers that send the appropriate content-length. However some applications require egregiously large uploads to be accepted (intranet applications or file repository applications), so setting it to a value that would always prevent available server memory from being consumed isn't a generic solution for all users because it would prevent legitimate uploads from occurring.

That said, this bug has existed for eaons in Python (and continues to exist despite the Guido-reviewed patch sitting at http://sourceforge.net/tracker/index.php?func=detail&aid=1112549&group_id=5470&atid=105470 ) so I don't believe it's a showstopper for a release unless we want to ship our own version of cgi.py or monkey-patch the existing cgi.py.

Revision history for this message
Florent Guillaume (efge) wrote :

Changes: submitter email, edited transcript, importance (critical => medium)

Revision history for this message
Colin Watson (cjwatson) wrote :

The zope2 project on Launchpad has been archived at the request of the Zope developers (see https://answers.launchpad.net/launchpad/+question/683589 and https://answers.launchpad.net/launchpad/+question/685285). If this bug is still relevant, please refile it at https://github.com/zopefoundation/zope2.

Changed in zope2:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.