Throttle memory/net I/O

Bug #931211 reported by Drew Smathers
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bafload
New
Undecided
Unassigned

Bug Description

Currently no throttling is done for uploads which is very bad for memory performance. Implement pluggable throttling strategy to prevent too much file data from being held in memory at the same time.

Revision history for this message
Drew Smathers (djfroofy) wrote :
Download full text (4.1 KiB)

Related conversation from Twisted IRC:

13:18 djfroofy: anyone have chops with unix mmap? writing some twisted (txaws) code and want to do some optimizations for multipart upload where i *think* mmap might
                 be of use
13:19 lifeless: sure. uhm, don't do it.
13:19 ivan: funny, I was playing with mmap today trying to make an optimized line reader and it was several times slower than the simplest Python
13:19 ! Vertlet [~<email address hidden>] has joined #twisted
13:20 lifeless: unlike read(), mmap blocks the process -> can't be deferred to a thread sensibly
13:20 lifeless: your local IO will be about a billion times faster than your network
13:20 ivan: interesting
13:20 lifeless: so whatever you save on memory, you'll pay for with seek latency if you get any serious load at all (and the seek latency turning into total process
                 halts)
13:20 djfroofy: lifeless: hehe ... yeah, ok
13:21 lifeless: f.read() -> allowThreads, read(), stopThread()
13:21 lifeless: mmap -> pagefault, stop, wait :)
13:21 djfroofy: lifeless: thanks for the sanity check. the idea was to do optimization on mp upload, mmap the different parts of the file. rather than reading in 5MB
                 chunks at a time
13:22 djfroofy: any other ideas that doesn't involve using mmap?
13:22 lifeless: well, what are you trying to optimise ?
13:22 lifeless: not 'what it does', but 'what part of the thing'
13:22 ! Vertel [~<email address hidden>] has quit [Ping timeout: 245 seconds]
13:22 lifeless: speed, memory, cpu, robustness, ...
13:22 djfroofy: so for uploading a 20GB file for example, not loading that into memory buffers
13:22 djfroofy: memory + speed
13:23 lifeless: probably you want to contribute support for multipart upload
13:23 djfroofy: lifeless: yes
13:23 lifeless: then pick a memory size allowance and work on that size chunks
13:23 djfroofy: lifeless: i have
13:23 djfroofy: contributed support or mp that is
13:23 lifeless: cool
13:23 lifeless: yeah, I saw something go by
13:24 djfroofy: ichoate as it is: https://code.launchpad.net/txaws
13:24 lifeless: from there, just pick a decent size - e.g. 1MB, and work in that size chunks
13:24 djfroofy: lifeless: from my understanding the minimum for upload_part is 5MB except for the last part
13:25 ! antihero_ is now known as antihero
13:25 lifeless: your system page cache will behave approximately the same as if you mmapped, and your process with however many chunks you allow to be inflight at once
                 held in memory, will be your total memory pressure
13:25 lifeless: (oh, and the tcp socket)
13:26 djfroofy: lifeless: right, so without mmap (which is bad from the good reasons you gave me), i assume the right strategy is to throttle how many parts are held
                 in memory
13:26 lifeless: right (you'd have to do that with mmap too BTW)
13:27 lifeless: because otherwise you're basically /asking/ for your process to be arbitrarily swapped out by the VM subsystem, and that works terribly for python
                 programs
13:28 lifeless: for python to stay fast you want your total ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.