oq-engine: Optimize hazard curve export

Bug #1099763 reported by Lars Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
High
Lars Butler

Bug Description

See bug # 1097676 for details. Basically, the method we're using for serializing hazard curve XML does not work efficiently for large data sets: it's slow and it's a memory hog.

A simple refactoring, to a method of writing in a more iterative fashion, should fix it.

Update:
Apparently the speed is not the fault of the XML writer. oq-engine code is responsible for the lack of speed and high memory consumption. See comments below for details.

Update 2:
All of these profiling tests were run on my Macbook Pro, with the following specs:
- 8gb ram
- SSD
- 2ghz intel i7 CPU (quad core, hyperthreading)

Update 3:
The code version used as a baseline for this activity: https://github.com/larsbutler/oq-engine/tree/c6ebf83330f1e1d2f76ca85505a864bdc9d9d33e

Changed in openquake:
status: New → In Progress
summary: - nrml: optimize hazard curve serialization
+ oq-engine: optimize hazard curve serialization
description: updated
Changed in openquake:
importance: Undecided → High
assignee: nobody → Lars Butler (lars-butler)
Revision history for this message
Lars Butler (lars-butler) wrote : Re: oq-engine: optimize hazard curve serialization

As defined in bug # 1097676, I'm basing my profiling activities for this bug on Test 2 (the variant with 500k sites).

To start profiling, I first ran the full calculation. Then, I wrote a small script which performs only the `export` phase:

##########
# export.py
##########
from memory_profiler import profile

from openquake.calculators.hazard import ClassicalHazardCalculator as CHC
from openquake.db import models

@profile
def main():
    job = models.OqJob.objects.get(id=1)
    calc = CHC(job)
    calc.export(exports='xml')

if __name__ == '__main__':
    main()
##########

Note the use of `memory_profiler`: http://pypi.python.org/pypi/memory_profiler. This utility can be utilized with cProfile, like so: $ python -m cProfile -s time export.py

Revision history for this message
Lars Butler (lars-butler) wrote :

Prior to any optimizations, this is the profiling report. Note that this includes a memory profile.

Revision history for this message
Lars Butler (lars-butler) wrote :

The same test again, with a simple memory optimization: https://github.com/larsbutler/oq-engine/commit/c99ec83c88163b73b854f0c3af14f86f0b1a99f5

Note that the execution does not differ greatly, but the memory is much more reasonable: down to 228mb from 1639mb.

Revision history for this message
Lars Butler (lars-butler) wrote :

Same test again, with another optimization which gives marginally better memory consumption, but more importantly, we see speed increase by a factor of 1.5.

https://github.com/larsbutler/oq-engine/commit/9edd658720959dc65e5afa6d9867becc3db2793f

Revision history for this message
Lars Butler (lars-butler) wrote :

Same test again, with a third and final optimization: https://github.com/larsbutler/oq-engine/commit/8555c2bdc535f34bf0051ed44d0b4531c55c53c2

Here we see execution time drop to 1/3 of the original time. Memory consumption is stable at about 1/8 of the original level.

description: updated
summary: - oq-engine: optimize hazard curve serialization
+ oq-engine: Optimize hazard curve export
Revision history for this message
Lars Butler (lars-butler) wrote :

Prior to any optimizations, this is the profiling report from the full calculation (500k sites).

Revision history for this message
Lars Butler (lars-butler) wrote :

This is profiling report after applying all 3 optimizations above.

Execution time drops from 36.5 minutes to 31.3 minutes. It's a small but obvious improvement.

Revision history for this message
Lars Butler (lars-butler) wrote :
Changed in openquake:
milestone: none → 0.9.1
description: updated
Changed in openquake:
status: In Progress → Fix Committed
tags: added: hazard optimization
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.