Bug #797615 “We need to profile/optimise the risk calculator” : Bugs : OpenQuake (deprecated)

Muharem Hrnjadovic (al-maisan) on 2011-06-15

Changed in openquake:
status:	New → Confirmed
importance:	Undecided → High
tags:	added: risk

Muharem Hrnjadovic (al-maisan) on 2011-06-15

tags:

added: performance

Muharem Hrnjadovic (al-maisan) on 2011-06-15

Changed in openquake:
milestone:	none → 0.4.1

Revision history for this message

Andrea Cerisara (acerisara) wrote on 2011-06-17:

#1

Download full text (3.5 KiB)

I did some tests on the Classical PSHA calculator, here is a brief description of what I found:

* (this is something related to all calculators) We read the exposure file two times.

The first time we: 1. read the assets inside the region we want to compute and store the related sites 2. split the set of sites in blocks for later processing. (openquake.job.Job._partition, openquake.job.Job._read_sites_from_exposure). The second time we: 1. read the assets inside the region 2. store the assets in KVS for later processing (openquake.job.general.RiskJobMixin.store_exposure_assets).

I think we should do all of this in one pass.

* (this is something related to all calculators) When collecting the sites related to the assets, we save multiple times the same site.

When we collect the sites related to the assets we want to compute (openquake.job.Job._read_sites_from_exposure), if multiple assets are defined on the same site, we save multiple times the same site (lines 281-282). I think we should use a set here, and not a list.

* (this is something related to all calculators) We compute multiple times the same assets.

This is something I haven't proved yet with sample configuration files, but: 1. we collect (and split into blocks) all the sites related to the assets without any particular order. 2. we store the assets in KVS using the grid point as key, i.e.: we translate the site into a grid point using the cell size of the region. This means that all the assets defined on sites falling into the same grid point are stored together (openquake.job.general.RiskJobMixin.store_exposure_assets, line 127).

Now, let's assume to have:

BLOCK1, with site X that translates into grid point K
BLOCK2, with site Y that translates into grid point K

The sites both translate into the same grid point because they are closer than the cell size of the region.

Since inside the mixins (Classical, Probabilistic and Deterministic) we loop over the grid points defined in the block we are processing (for example openquake.risk.job.classical_psha.ClassicalPSHABasedMixin.compute_risk, line 76), we compute two times the assets that fall into grid point K.

* Classical PSHA optimization

The main output of this computation is a loss ratio curve. The inputs are:

- hazard curve, related to the site we use to compute the curve
- vulnerability function, related to the asset we use to compute the curve (that is defined on a specific site)

The scientific module that is able to compute this curve is openquake.risk.classical_psha_based (the main function is openquake.risk.classical_psha_based.compute_loss_ratio_curve).

I checked the performance of this module without connecting it to all the engine processing (i.e., mixins), and I found that:

- the computation of the LREM (a step to obtain the curve) is complex and thus slow, but can be cached, because it depends just on the vulnerability function (openquake.risk.classical_psha_based._compute_lrem) and many assets share the same function. By slow I mean many seconds.
- with a cached LREM, the computation is reasonably fast, around 20ms on my machine.

I did some tests by simply storing the computed LREMs in memory and I was a...

I did some tests on the Classical PSHA calculator, here is a brief description of what I found:

* (this is something related to all calculators) We read the exposure file two times.

The first time we: 1. read the assets inside the region we want to compute and store the related sites 2. split the set of sites in blocks for later processing. (openquake.job.Job._partition, openquake.job.Job._read_sites_from_exposure). The second time we: 1. read the assets inside the region 2. store the assets in KVS for later processing (openquake.job.general.RiskJobMixin.store_exposure_assets).

I think we should do all of this in one pass.

* (this is something related to all calculators) When collecting the sites related to the assets, we save multiple times the same site.

When we collect the sites related to the assets we want to compute (openquake.job.Job._read_sites_from_exposure), if multiple assets are defined on the same site, we save multiple times the same site (lines 281-282). I think we should use a set here, and not a list.

* (this is something related to all calculators) We compute multiple times the same assets.

This is something I haven't proved yet with sample configuration files, but: 1. we collect (and split into blocks) all the sites related to the assets without any particular order. 2. we store the assets in KVS using the grid point as key, i.e.: we translate the site into a grid point using the cell size of the region. This means that all the assets defined on sites falling into the same grid point are stored together (openquake.job.general.RiskJobMixin.store_exposure_assets, line 127).

Now, let's assume to have:

BLOCK1, with site X that translates into grid point K
BLOCK2, with site Y that translates into grid point K

The sites both translate into the same grid point because they are closer than the cell size of the region.

Since inside the mixins (Classical, Probabilistic and Deterministic) we loop over the grid points defined in the block we are processing (for example openquake.risk.job.classical_psha.ClassicalPSHABasedMixin.compute_risk, line 76), we compute two times the assets that fall into grid point K.

* Classical PSHA optimization

The main output of this computation is a loss ratio curve. The inputs are:

- hazard curve, related to the site we use to compute the curve
- vulnerability function, related to the asset we use to compute the curve (that is defined on a specific site)

The scientific module that is able to compute this curve is openquake.risk.classical_psha_based (the main function is openquake.risk.classical_psha_based.compute_loss_ratio_curve).

I checked the performance of this module without connecting it to all the engine processing (i.e., mixins), and I found that:

- the computation of the LREM (a step to obtain the curve) is complex and thus slow, but can be cached, because it depends just on the vulnerability function (openquake.risk.classical_psha_based._compute_lrem) and many assets share the same function. By slow I mean many seconds.
- with a cached LREM, the computation is reasonably fast, around 20ms on my machine.

I did some tests by simply storing the computed LREMs in memory and I was able (after that a reasonable amount of LREMs have been computed) to compute 40-50 assets per second (since we were previously computing  the LREM for each asset, the computation for a single asset took many seconds). I haven't checked how the serialization performs yet.

I also have some comments on the current mixin architecture, but I'll post them in a separate comment.

Muharem Hrnjadovic (al-maisan) on 2011-07-01

Changed in openquake:
milestone:	0.4.1 → none

John Tarter (toh2) on 2011-07-08

Changed in openquake:
milestone:	none → 0.4.2

Revision history for this message

beatpanic (kpanic) wrote on 2011-07-11:

#2

a Memoizer was implemented and now is in master, it caches _compute_lrem, we should implement tests on the Memoizer though

I'm also benchmarking and trying to find a possible solution about the slowness of plotting thousands of SVG(s) files, in particular the loop in __plot_single in CurvePlotter (openquake/output/curve.py) is a bottleneck. It writes slowly to the file-system and occupies 100% of one of my cpus

Revision history for this message

beatpanic (kpanic) wrote on 2011-07-12:

#3

17:38 < beatpanic> vitorsilva, hey vitor, just a little question it makes sense
                   to serialize thousands of little svg files when plotting
                   loss ratio curves in the risk side?
17:39 < vitorsilva> not at all beatpanic ... honestly loss ratio curves are not
                    even used... they are just used to compute loss curves...
17:41 < beatpanic> vitorsilva, I say this because it is a big performance
                   problem, for each curve plotted in svg format we spend a lot
                   of time in the end to just serialize that stuff
17:41 < vitorsilva> feel free to remove that step beatpanic
17:41 < beatpanic> vitorsilva, ok thanks a lot

based on this discussion, the plotting of SVG files could be removed, it's still not clear I could remove both loss ratio plotting either loss curve plotting, more updates tomorrow probably

Revision history for this message

beatpanic (kpanic) wrote on 2011-07-13:

#4

After a brief chat with Helen, svg plot code removal is the way to go

Revision history for this message

beatpanic (kpanic) wrote on 2011-07-21:

#5

https://github.com/gem/openquake/pull/343 covers another speed boost in the risk part, this time related to the probabilistic event based calculator

beatpanic (kpanic) on 2011-07-22

Changed in openquake:
assignee:	nobody → beatpanic (kpanic)

beatpanic (kpanic) on 2011-08-02

Changed in openquake:
status:	Confirmed → In Progress

Revision history for this message

John Tarter (toh2) wrote on 2011-08-10:

#6

I have closed this bug as multiple optimizations has been completed successfully. Future work should be done under a Blueprint, then on a bug by bug basis.
Thanks
John

Changed in openquake:
status:	In Progress → Fix Committed
milestone:	0.4.2 → 0.4.3

Lars Butler (lars-butler) on 2013-04-05

Changed in openquake:
status:	Fix Committed → Fix Released

OpenQuake (deprecated)

We need to profile/optimise the risk calculator

Bug Description

Other bug subscribers

Remote bug watches