Potential bug in Python cgi.FieldStorage can lead to problematic memory growth
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zope 2 |
Invalid
|
Medium
|
Chris McDonough |
Bug Description
This is actually a potential Python bug but I'm submitting it here because it's likely we'll need to fix it within Zope for some period of time either by including our own cgi module or by monkeypatching.
When a POST request is used to submit a multipart/form-data form which includes an HTML "file" input element, the resulting multipart input is parsed using Python's cgi.FieldStorage class.
There appear to be two problems with cgi.FieldStorage, both which can manifest themselves as "memory hogs".
The first problem is that if the "content-length" of any part of the multipart input is not provided, FieldStorage appears to default to using a StringIO object to hold the output for the parsing of an individual part. As far as I can tell, individual elements of a multipart/form-data input stream are not required to be decorated with their lengths within an enclosed header ( see http://
This could be fixed by not using a StringIO object at all, but instead by always using a tempfile. This is a heavy handed fix and may be a "speed killer" so it might be better to find another solution that does some heuristic based on the overall length of the input stream.
Another less-commonly-
When the uploaded file does not contain any newlines (individual parts of multipart input are not required to contain newlines), the entirety of that part will be read into RAM, as FieldStorage uses "readline()" to attempt to read a chunk of a file at a time. This appears to be a problem within the methods "read_lines_
I am in the process of attempting to fix this.
> The first problem is that if the "content-length" of any part of the www.faqs. org/rfcs/ rfc1867. html).
> multipart input is not provided, FieldStorage appears to default to using
> a StringIO object to hold the output for the parsing of an individual
> part. As far as I can tell, individual elements of a multipart/form-data
> input stream are not required to be decorated with their lengths within
> an enclosed header ( see http://
> Although they are not prevented from doing so, but they are not required
> to do so by the RFC. This seems to mean that every file upload from
> clients that do not supply a Content-Length header for the individual
> elements of a multipart input will end up as StringIO objects, and thus
> in RAM. That said, I've only tested with Firefox, but it does not supply
> these headers for individual parts of the multipart message. In
> practice, this appears to mean that all files uploaded via the
> multipart/form-data HTTP POST protocol will end up entirely in RAM, at
> least when uploaded by some class of clients, including one of the two
> major browsers.
This analysis is false. FieldStorage only uses a StringIO to store data up to 1000 characters, and thereafter switches to a tempfile.
The analysis of the "newline" problem is still true.