The SQL you currently execute loads all of the data into memory. We changed the code to have a current time pointer, execute queries that grab uploads later than the current time pointer, limit size by a chunk size of 5000, and then updating the time pointer for the next query. With a chunk of 5000 it's currently only eating about 55MB (30MB for the process and 25MB for the 5000 chunk query). This stackoverflow post helped a lot: http://stackoverflow.com/questions/14144408/memory-efficient-constant-and-speed-optimized-iteration-over-a-large-table-in
Some hints from our friends at OpenHatch:
We noticed the issue on migrate-upload-data memory consumption (https:/ /bugs.launchpad .net/dat- overview/ +bug/1189808) and have done some work on reducing the memory consumption (https:/ /github. com/openhatch/ oh-greenhouse/ blob/master/ greenhouse/ uploads/ management/ commands/ migrate- upload- data.py) that I think will be helpful for you.
The SQL you currently execute loads all of the data into memory. We changed the code to have a current time pointer, execute queries that grab uploads later than the current time pointer, limit size by a chunk size of 5000, and then updating the time pointer for the next query. With a chunk of 5000 it's currently only eating about 55MB (30MB for the process and 25MB for the 5000 chunk query). This stackoverflow post helped a lot: http:// stackoverflow. com/questions/ 14144408/ memory- efficient- constant- and-speed- optimized- iteration- over-a- large-table- in