Comment 7 for bug 1189808

Revision history for this message
Andrew Starr-Bochicchio (andrewsomething) wrote :

Some hints from our friends at OpenHatch:

We noticed the issue on migrate-upload-data memory consumption (https://bugs.launchpad.net/dat-overview/+bug/1189808) and have done some work on reducing the memory consumption (https://github.com/openhatch/oh-greenhouse/blob/master/greenhouse/uploads/management/commands/migrate-upload-data.py) that I think will be helpful for you.

The SQL you currently execute loads all of the data into memory. We changed the code to have a current time pointer, execute queries that grab uploads later than the current time pointer, limit size by a chunk size of 5000, and then updating the time pointer for the next query. With a chunk of 5000 it's currently only eating about 55MB (30MB for the process and 25MB for the 5000 chunk query). This stackoverflow post helped a lot: http://stackoverflow.com/questions/14144408/memory-efficient-constant-and-speed-optimized-iteration-over-a-large-table-in