We need to profile/optimise the risk calculator

Bug #797615 reported by Muharem Hrnjadovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
High
beatpanic

Bug Description

The main reason we did not make risk calculations available in the June-demo-GUI was the lack of performance in the risk engine. We need to anayse/profile the latter and enhance it by removing the performance issues identified.

Changed in openquake:
status: New → Confirmed
importance: Undecided → High
tags: added: risk
tags: added: performance
Changed in openquake:
milestone: none → 0.4.1
Revision history for this message
Andrea Cerisara (acerisara) wrote :
Download full text (3.5 KiB)

I did some tests on the Classical PSHA calculator, here is a brief description of what I found:

* (this is something related to all calculators) We read the exposure file two times.

The first time we: 1. read the assets inside the region we want to compute and store the related sites 2. split the set of sites in blocks for later processing. (openquake.job.Job._partition, openquake.job.Job._read_sites_from_exposure). The second time we: 1. read the assets inside the region 2. store the assets in KVS for later processing (openquake.job.general.RiskJobMixin.store_exposure_assets).

I think we should do all of this in one pass.

* (this is something related to all calculators) When collecting the sites related to the assets, we save multiple times the same site.

When we collect the sites related to the assets we want to compute (openquake.job.Job._read_sites_from_exposure), if multiple assets are defined on the same site, we save multiple times the same site (lines 281-282). I think we should use a set here, and not a list.

* (this is something related to all calculators) We compute multiple times the same assets.

This is something I haven't proved yet with sample configuration files, but: 1. we collect (and split into blocks) all the sites related to the assets without any particular order. 2. we store the assets in KVS using the grid point as key, i.e.: we translate the site into a grid point using the cell size of the region. This means that all the assets defined on sites falling into the same grid point are stored together (openquake.job.general.RiskJobMixin.store_exposure_assets, line 127).

Now, let's assume to have:

BLOCK1, with site X that translates into grid point K
BLOCK2, with site Y that translates into grid point K

The sites both translate into the same grid point because they are closer than the cell size of the region.

Since inside the mixins (Classical, Probabilistic and Deterministic) we loop over the grid points defined in the block we are processing (for example openquake.risk.job.classical_psha.ClassicalPSHABasedMixin.compute_risk, line 76), we compute two times the assets that fall into grid point K.

* Classical PSHA optimization

The main output of this computation is a loss ratio curve. The inputs are:

- hazard curve, related to the site we use to compute the curve
- vulnerability function, related to the asset we use to compute the curve (that is defined on a specific site)

The scientific module that is able to compute this curve is openquake.risk.classical_psha_based (the main function is openquake.risk.classical_psha_based.compute_loss_ratio_curve).

I checked the performance of this module without connecting it to all the engine processing (i.e., mixins), and I found that:

- the computation of the LREM (a step to obtain the curve) is complex and thus slow, but can be cached, because it depends just on the vulnerability function (openquake.risk.classical_psha_based._compute_lrem) and many assets share the same function. By slow I mean many seconds.
- with a cached LREM, the computation is reasonably fast, around 20ms on my machine.

I did some tests by simply storing the computed LREMs in memory and I was a...

Read more...

Changed in openquake:
milestone: 0.4.1 → none
John Tarter (toh2)
Changed in openquake:
milestone: none → 0.4.2
Revision history for this message
beatpanic (kpanic) wrote :

a Memoizer was implemented and now is in master, it caches _compute_lrem, we should implement tests on the Memoizer though

I'm also benchmarking and trying to find a possible solution about the slowness of plotting thousands of SVG(s) files, in particular the loop in __plot_single in CurvePlotter (openquake/output/curve.py) is a bottleneck. It writes slowly to the file-system and occupies 100% of one of my cpus

Revision history for this message
beatpanic (kpanic) wrote :

17:38 < beatpanic> vitorsilva, hey vitor, just a little question it makes sense
                   to serialize thousands of little svg files when plotting
                   loss ratio curves in the risk side?
17:39 < vitorsilva> not at all beatpanic ... honestly loss ratio curves are not
                    even used... they are just used to compute loss curves...
17:41 < beatpanic> vitorsilva, I say this because it is a big performance
                   problem, for each curve plotted in svg format we spend a lot
                   of time in the end to just serialize that stuff
17:41 < vitorsilva> feel free to remove that step beatpanic
17:41 < beatpanic> vitorsilva, ok thanks a lot

based on this discussion, the plotting of SVG files could be removed, it's still not clear I could remove both loss ratio plotting either loss curve plotting, more updates tomorrow probably

Revision history for this message
beatpanic (kpanic) wrote :

After a brief chat with Helen, svg plot code removal is the way to go

Revision history for this message
beatpanic (kpanic) wrote :

https://github.com/gem/openquake/pull/343 covers another speed boost in the risk part, this time related to the probabilistic event based calculator

beatpanic (kpanic)
Changed in openquake:
assignee: nobody → beatpanic (kpanic)
beatpanic (kpanic)
Changed in openquake:
status: Confirmed → In Progress
Revision history for this message
John Tarter (toh2) wrote :

I have closed this bug as multiple optimizations has been completed successfully. Future work should be done under a Blueprint, then on a bug by bug basis.
Thanks
John

Changed in openquake:
status: In Progress → Fix Committed
milestone: 0.4.2 → 0.4.3
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.