oq-engine: Optimize hazard curve export
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenQuake (deprecated) |
Fix Released
|
High
|
Lars Butler |
Bug Description
See bug # 1097676 for details. Basically, the method we're using for serializing hazard curve XML does not work efficiently for large data sets: it's slow and it's a memory hog.
A simple refactoring, to a method of writing in a more iterative fashion, should fix it.
Update:
Apparently the speed is not the fault of the XML writer. oq-engine code is responsible for the lack of speed and high memory consumption. See comments below for details.
Update 2:
All of these profiling tests were run on my Macbook Pro, with the following specs:
- 8gb ram
- SSD
- 2ghz intel i7 CPU (quad core, hyperthreading)
Update 3:
The code version used as a baseline for this activity: https:/
Changed in openquake: | |
status: | New → In Progress |
summary: |
- nrml: optimize hazard curve serialization + oq-engine: optimize hazard curve serialization |
description: | updated |
Changed in openquake: | |
importance: | Undecided → High |
assignee: | nobody → Lars Butler (lars-butler) |
description: | updated |
Changed in openquake: | |
status: | In Progress → Fix Committed |
tags: | added: hazard optimization |
Changed in openquake: | |
status: | Fix Committed → Fix Released |
As defined in bug # 1097676, I'm basing my profiling activities for this bug on Test 2 (the variant with 500k sites).
To start profiling, I first ran the full calculation. Then, I wrote a small script which performs only the `export` phase:
##########
# export.py
##########
from memory_profiler import profile
from openquake. calculators. hazard import ClassicalHazard Calculator as CHC
from openquake.db import models
@profile OqJob.objects. get(id= 1) export( exports= 'xml')
def main():
job = models.
calc = CHC(job)
calc.
if __name__ == '__main__':
main()
##########
Note the use of `memory_profiler`: http:// pypi.python. org/pypi/ memory_ profiler. This utility can be utilized with cProfile, like so: $ python -m cProfile -s time export.py