OpenQuake (deprecated)

Assess Hazard Computation Time (Classical PSHA)

Bug #832952 reported by Lars Butler on 2011-08-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenQuake (deprecated)	Won't Fix	High	Unassigned	OpenQuake (deprecated) 0.7.0

Bug Description

Devise an algorithm for estimating computation time based on Classical PSHA Hazard calculation parameters.

----------
Parameters
----------

The following parameters affect computation time (some more than others):

SITES or (REGION_VERTEX and REGION_GRID_SPACING)
INTENSITY_MEASURE_LEVELS
INCLUDE_AREA_SOURCES
TREAT_AREA_SOURCE_AS
AREA_SOURCE_DISCRETIZATION
INCLUDE_GRID_SOURCES
TREAT_GRID_SOURCE_AS
INCLUDE_FAULT_SOURCE
FAULT_RUPTURE_OFFSET
FAULT_SURFACE_DISCRETIZATION
RUPTURE_FLOATING_TYPE
INCLUDE_SUBDUCTION_FAULT_SOURCE
SUBDUCTION_RUPTURE_OFFSET
SUBDUCTION_SURFACE_DISCRETIZATION
SUBDUCTION_RUPTURE_FLOATING_TYPE
NUMBER_OF_LOGIC_TREE_SAMPLES
QUANTILE_LEVELS
COMPUTE_MEAN_HAZARD_CURVE
POES

For more details about _how_ each of these parameters affects the computation time, have a look at this table:
https://docs.google.com/spreadsheet/ccc?key=0AgmeiGIi49FLdEVaMEZ2S1VUOWUwanMzQW0zWDNkbFE&hl=en_US#gid=0

--------------
Analysis Cases
--------------

There are 3 cases which can be tested to analyze computation time.

Worst Case:
  - Compute on a list of sites
  - The list of sites is equal to the all of the locations defined in the source model.
    - In other words, we compute hazard for sites which are right on top of each source (1 site per source).

Reasonable Case 1:
- Compute on a rectangular region just large enough to contain all of the source sites in a source model.
- TODO: What should the grid spacing be?

Reasonable Case 2:
- Given the same region constraints defined in 'Reasonable case 1', pick a random list of sites.
- The number of sites chosen shall be equal the number of sources (thus, equal to the number of sites in 'Worst case').

These cases are simplified to assume that all sources are taken into account. It should be noted that there is one parameter (MAXIMUM_DISTANCE) which determines whether or not a source is taken into account. For example:

MAXIMUM_DISTANCE = 200.0 # kilometers
for site in sites:
    for source in sources:
        if distance between site and source is > MAXIMUM_DISTANCE:
            ignore the source

I suspect that this is meant to keep computations within realistic bounds. For instance, if you were to compute the seismic hazard of the entire USA, it would not make any sense for sites in California to consider sources in New York.

------------------
Test data required
------------------

- Source model containing at least 1 of each type of source (point, area, simple fault, complex fault)
- Source model logic tree
- 1 branch should be sufficient; what really matters is the total number of sources
- GMPE logic tree
- TODO: What role do GMPEs play in Classical hazard calculations?
- Configuration files for each case (Worst Case, Reasonable Case 1, Reasonable Case 2)

-------------
Hazard Curves
-------------

Hazard curves are the primary output of a hazard calculation.

There are 3 types of curves:
  - Hazard curves
  - Mean curves
  - Quantile curves

One of the measures we may want report (pre-calculation) is the estimated number of hazard curves which will be produced. (TODO: Figure out how to estimate hazard curve calculation _time_ based on the estimated curve _number_.) For Classical PSHA, the total number of curves can estimated as follows:

total_curves = num_of_sites
    * num_of_logic_tree_samples
    * (num_of_quantile_levels or 1)
    * (2 if COMPUTE_MEAN_HAZARD_CURVE==true else 1)

-----------
Hazard Maps
-----------

Hazard maps are less computationally expensive to produce than hazard curves.

Maps are derived from curve data by interpolating an IML value from each curve given a fixed PoE value. When compared to hazard curve computation, the cost of interpolation is less significant, but we still need to take this into account (particularly for calculations over a large set of sites).

The number of hazard maps produced by OpenQuake is equal to the number of POES values defined (if any).

-----------------------------
Breakdown of calculation time
-----------------------------

Initialization time:
  - Engine startup
  - Processing job input
  - Loading the KVS cache

Calculation time:
- Computation of hazard curves and maps

Results creation time:
- Serializing map and curve data to the specified output destination (DB or XML)

Total time = initialization time + calculation time + results creation time

See original description

Lars Butler (lars-butler) on 2011-08-24

Changed in openquake:
milestone:	none → 0.4.3
assignee:	nobody → Lars Butler (lars-butler)
status:	New → Confirmed
status:	Confirmed → In Progress

Lars Butler (lars-butler) on 2011-08-24

description:

updated

Lars Butler (lars-butler) on 2011-08-24

description:	updated
description:	updated

Lars Butler (lars-butler) on 2011-08-24

description:	updated
description:	updated
description:	updated

Lars Butler (lars-butler) on 2011-08-24

description:	updated
description:	updated

Lars Butler (lars-butler) on 2011-08-24

description:

updated

Lars Butler (lars-butler) on 2011-08-24

description:

updated

Lars Butler (lars-butler) on 2011-08-24

description:

updated

Lars Butler (lars-butler) on 2011-08-30

Changed in openquake:
importance:	Undecided → Medium

Lars Butler (lars-butler) on 2011-09-01

Changed in openquake:
assignee:	Lars Butler (lars-butler) → nobody
status:	In Progress → Confirmed

John Tarter (toh2) on 2011-09-06

Changed in openquake:
milestone:	0.4.3 → 0.4.4

Revision history for this message

John Tarter (toh2) wrote on 2011-09-22:

Marco,
Can you take a look at this and maybe get it started?
This is something that Damiano has been wanting for quite some time
Thanks
John

Changed in openquake:
assignee:	nobody → beatpanic (kpanic)

John Tarter (toh2) on 2011-10-05

Changed in openquake:
milestone:	0.4.4 → 0.4.5

John Tarter (toh2) on 2011-10-25

Changed in openquake:
importance:	Medium → High

beatpanic (kpanic) on 2011-10-25

Changed in openquake:
status:	Confirmed → In Progress

beatpanic (kpanic) on 2011-10-25

description:

updated

John Tarter (toh2) on 2011-11-01

Changed in openquake:
milestone:	0.4.5 → 0.4.6

John Tarter (toh2) on 2011-12-09

Changed in openquake:
milestone:	0.4.6 → 0.5.0

Revision history for this message

John Tarter (toh2) wrote on 2012-01-11:

This bug was added to the GME Hazard engine blue print to capture the requirement to be able to capture job timing.

Changed in openquake:
milestone:	0.5.0 → 0.5.1

John Tarter (toh2) on 2012-01-11

Changed in openquake:
status:	In Progress → New
assignee:	beatpanic (kpanic) → nobody

John Tarter (toh2) on 2012-01-20

Changed in openquake:
milestone:	0.5.1 → 0.6.0

John Tarter (toh2) on 2012-03-05

Changed in openquake:
milestone:	0.6.0 → 0.7.0

Revision history for this message

Lars Butler (lars-butler) wrote on 2013-03-11:

Since we've adding detailed progress reporting to the OQ Engine, I don't think this is terribly important anymore. Reporting the progress is enough; trying to predict computation time ahead of time is very very hard to do and doesn't really give us that much of a benefit.

I'm going to set this to `Won't fix`.

Changed in openquake:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.