Assess Hazard Computation Time (Classical PSHA)

Bug #832952 reported by Lars Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Won't Fix
High
Unassigned

Bug Description

Devise an algorithm for estimating computation time based on Classical PSHA Hazard calculation parameters.

----------
Parameters
----------

The following parameters affect computation time (some more than others):

SITES or (REGION_VERTEX and REGION_GRID_SPACING)
INTENSITY_MEASURE_LEVELS
INCLUDE_AREA_SOURCES
TREAT_AREA_SOURCE_AS
AREA_SOURCE_DISCRETIZATION
INCLUDE_GRID_SOURCES
TREAT_GRID_SOURCE_AS
INCLUDE_FAULT_SOURCE
FAULT_RUPTURE_OFFSET
FAULT_SURFACE_DISCRETIZATION
RUPTURE_FLOATING_TYPE
INCLUDE_SUBDUCTION_FAULT_SOURCE
SUBDUCTION_RUPTURE_OFFSET
SUBDUCTION_SURFACE_DISCRETIZATION
SUBDUCTION_RUPTURE_FLOATING_TYPE
NUMBER_OF_LOGIC_TREE_SAMPLES
QUANTILE_LEVELS
COMPUTE_MEAN_HAZARD_CURVE
POES

For more details about _how_ each of these parameters affects the computation time, have a look at this table:
https://docs.google.com/spreadsheet/ccc?key=0AgmeiGIi49FLdEVaMEZ2S1VUOWUwanMzQW0zWDNkbFE&hl=en_US#gid=0

--------------
Analysis Cases
--------------

There are 3 cases which can be tested to analyze computation time.

Worst Case:
  - Compute on a list of sites
  - The list of sites is equal to the all of the locations defined in the source model.
    - In other words, we compute hazard for sites which are right on top of each source (1 site per source).

Reasonable Case 1:
  - Compute on a rectangular region just large enough to contain all of the source sites in a source model.
  - TODO: What should the grid spacing be?

Reasonable Case 2:
  - Given the same region constraints defined in 'Reasonable case 1', pick a random list of sites.
    - The number of sites chosen shall be equal the number of sources (thus, equal to the number of sites in 'Worst case').

These cases are simplified to assume that all sources are taken into account. It should be noted that there is one parameter (MAXIMUM_DISTANCE) which determines whether or not a source is taken into account. For example:

MAXIMUM_DISTANCE = 200.0 # kilometers
for site in sites:
    for source in sources:
        if distance between site and source is > MAXIMUM_DISTANCE:
            ignore the source

I suspect that this is meant to keep computations within realistic bounds. For instance, if you were to compute the seismic hazard of the entire USA, it would not make any sense for sites in California to consider sources in New York.

------------------
Test data required
------------------

- Source model containing at least 1 of each type of source (point, area, simple fault, complex fault)
- Source model logic tree
  - 1 branch should be sufficient; what really matters is the total number of sources
- GMPE logic tree
  - TODO: What role do GMPEs play in Classical hazard calculations?
- Configuration files for each case (Worst Case, Reasonable Case 1, Reasonable Case 2)

-------------
Hazard Curves
-------------

Hazard curves are the primary output of a hazard calculation.

There are 3 types of curves:
  - Hazard curves
  - Mean curves
  - Quantile curves

One of the measures we may want report (pre-calculation) is the estimated number of hazard curves which will be produced. (TODO: Figure out how to estimate hazard curve calculation _time_ based on the estimated curve _number_.) For Classical PSHA, the total number of curves can estimated as follows:

total_curves = num_of_sites
    * num_of_logic_tree_samples
    * (num_of_quantile_levels or 1)
    * (2 if COMPUTE_MEAN_HAZARD_CURVE==true else 1)

-----------
Hazard Maps
-----------

Hazard maps are less computationally expensive to produce than hazard curves.

Maps are derived from curve data by interpolating an IML value from each curve given a fixed PoE value. When compared to hazard curve computation, the cost of interpolation is less significant, but we still need to take this into account (particularly for calculations over a large set of sites).

The number of hazard maps produced by OpenQuake is equal to the number of POES values defined (if any).

-----------------------------
Breakdown of calculation time
-----------------------------

Initialization time:
  - Engine startup
  - Processing job input
  - Loading the KVS cache

Calculation time:
  - Computation of hazard curves and maps

Results creation time:
  - Serializing map and curve data to the specified output destination (DB or XML)

Total time = initialization time + calculation time + results creation time

Changed in openquake:
milestone: none → 0.4.3
assignee: nobody → Lars Butler (lars-butler)
status: New → Confirmed
status: Confirmed → In Progress
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Changed in openquake:
importance: Undecided → Medium
Changed in openquake:
assignee: Lars Butler (lars-butler) → nobody
status: In Progress → Confirmed
John Tarter (toh2)
Changed in openquake:
milestone: 0.4.3 → 0.4.4
Revision history for this message
John Tarter (toh2) wrote :

Marco,
Can you take a look at this and maybe get it started?
This is something that Damiano has been wanting for quite some time
Thanks
John

Changed in openquake:
assignee: nobody → beatpanic (kpanic)
John Tarter (toh2)
Changed in openquake:
milestone: 0.4.4 → 0.4.5
John Tarter (toh2)
Changed in openquake:
importance: Medium → High
beatpanic (kpanic)
Changed in openquake:
status: Confirmed → In Progress
beatpanic (kpanic)
description: updated
John Tarter (toh2)
Changed in openquake:
milestone: 0.4.5 → 0.4.6
John Tarter (toh2)
Changed in openquake:
milestone: 0.4.6 → 0.5.0
Revision history for this message
John Tarter (toh2) wrote :

This bug was added to the GME Hazard engine blue print to capture the requirement to be able to capture job timing.

Changed in openquake:
milestone: 0.5.0 → 0.5.1
John Tarter (toh2)
Changed in openquake:
status: In Progress → New
assignee: beatpanic (kpanic) → nobody
John Tarter (toh2)
Changed in openquake:
milestone: 0.5.1 → 0.6.0
John Tarter (toh2)
Changed in openquake:
milestone: 0.6.0 → 0.7.0
Revision history for this message
Lars Butler (lars-butler) wrote :

Since we've adding detailed progress reporting to the OQ Engine, I don't think this is terribly important anymore. Reporting the progress is enough; trying to predict computation time ahead of time is very very hard to do and doesn't really give us that much of a benefit.

I'm going to set this to `Won't fix`.

Changed in openquake:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.