Streaming big files through rocket can cause silent corruption

Bug #938261 reported by Nick Name on 2012-02-21
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Rocket Web Server

Bug Description

Version of Python used: Any (specifically, tested on 2.6 and 2.7)
Version of Rocket used: 1.2.4 - web2py (single file) build, but this is a generic rocket issue as well
HTTP client used to discover the bug: demonstration below uses wget

(Taken from discussion at )

How to reproduce: you have to have a wsgi worker, that produces output in parts (that is, returns a list or yields part as a generator). e.g: use web2py's "static" file server (which uses wsgi and does not use the FileSystermWorker).

Make sure that there's a large payload produced, and that it is made of a lot of small parts. e.g. put a 10MB file in web2py/applications/welcome/static/ (web2py will use 64K parts by default)
Consume file slowly, e.g. wget --limit=100k http://localhost:8000/welcome/static/ ; this would take 100 seconds to download the whole file even on localhost.
Let file download for 10 seconds, then pause wget (e.g. suspend it by using Ctrl-Z on linux/osx)
Wait 20 seconds
Let it continue (e.g. type 'fg' if you suspended it with ctrl-z)
Notice that when it reaches the end, wget will complain about missing bytes, reconnect and download the rest of the file (and will be happy with it). However, the file will be corrupt: A block (or many blocks) will be missing from the middle, and the last few blocks will be repeated (by the 2nd wget connection; if you disallow wget from resuming, the file will just be shorter).
A better idea where the problem is can be seen from the following ugly patch (applied against web2py's "one file"

@@ -1929,6 +1929,9 @@ class WSGIWorker(Worker):
                 self.conn.sendall(b('%x\r\n%s\r\n' % (len(data), data)))
+ except socket.timeout:
+ self.closeConnection = True
+ print 'Exception lost'
         except socket.error:
             # But some clients will close the connection before that
             # resulting in a socket error.
Running the same experiment with the patched will show that files get corrupted if 'exception lost' is printed to the web2py's terminal.

Discussion: The only way to use sendall() reliably is to immediately terminate the connection upon any error (including timeout), as there is no way to know how many bytes were sent. (That there is no way to know how many bytes were sent is clearly stated in the documentation; the implication that it is impossible to reliably recover from this is not). However, there are sendall() calls all over, and some will result in additional sendalls() following a failed sendall(). The worst offender seems to be WSGIWorker.write(), but I'm not sure the other sendalls are safe either.

Temporary workarounds: increase SOCKET_TIMEOUT significantly (default is 1 second; bump to e.g. 10), and not swallow socket.timeout in WSGIWorker.write().

Increasing the chunk size is NOT a helpful, because it only changes the number of bytes before the first loss (at a given bandwidth), but from that point, the problem is the same.

Changed in rocket:
assignee: nobody → Tim (tdfarrell)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers