Assess Hazard Computation Time (Classical PSHA)
Bug Description
Devise an algorithm for estimating computation time based on Classical PSHA Hazard calculation parameters.

Parameters

The following parameters affect computation time (some more than others):
SITES or (REGION_VERTEX and REGION_
INTENSITY_
INCLUDE_
TREAT_AREA_
AREA_SOURCE_
INCLUDE_
TREAT_GRID_
INCLUDE_
FAULT_RUPTURE_
FAULT_SURFACE_
RUPTURE_
INCLUDE_
SUBDUCTION_
SUBDUCTION_
SUBDUCTION_
NUMBER_
QUANTILE_LEVELS
COMPUTE_
POES
For more details about _how_ each of these parameters affects the computation time, have a look at this table:
https://

Analysis Cases

There are 3 cases which can be tested to analyze computation time.
Worst Case:
 Compute on a list of sites
 The list of sites is equal to the all of the locations defined in the source model.
 In other words, we compute hazard for sites which are right on top of each source (1 site per source).
Reasonable Case 1:
 Compute on a rectangular region just large enough to contain all of the source sites in a source model.
 TODO: What should the grid spacing be?
Reasonable Case 2:
 Given the same region constraints defined in 'Reasonable case 1', pick a random list of sites.
 The number of sites chosen shall be equal the number of sources (thus, equal to the number of sites in 'Worst case').
These cases are simplified to assume that all sources are taken into account. It should be noted that there is one parameter (MAXIMUM_DISTANCE) which determines whether or not a source is taken into account. For example:
MAXIMUM_DISTANCE = 200.0 # kilometers
for site in sites:
for source in sources:
if distance between site and source is > MAXIMUM_DISTANCE:
ignore the source
I suspect that this is meant to keep computations within realistic bounds. For instance, if you were to compute the seismic hazard of the entire USA, it would not make any sense for sites in California to consider sources in New York.

Test data required

 Source model containing at least 1 of each type of source (point, area, simple fault, complex fault)
 Source model logic tree
 1 branch should be sufficient; what really matters is the total number of sources
 GMPE logic tree
 TODO: What role do GMPEs play in Classical hazard calculations?
 Configuration files for each case (Worst Case, Reasonable Case 1, Reasonable Case 2)

Hazard Curves

Hazard curves are the primary output of a hazard calculation.
There are 3 types of curves:
 Hazard curves
 Mean curves
 Quantile curves
One of the measures we may want report (precalculation) is the estimated number of hazard curves which will be produced. (TODO: Figure out how to estimate hazard curve calculation _time_ based on the estimated curve _number_.) For Classical PSHA, the total number of curves can estimated as follows:
total_curves = num_of_sites
* num_of_
* (num_of_
* (2 if COMPUTE_

Hazard Maps

Hazard maps are less computationally expensive to produce than hazard curves.
Maps are derived from curve data by interpolating an IML value from each curve given a fixed PoE value. When compared to hazard curve computation, the cost of interpolation is less significant, but we still need to take this into account (particularly for calculations over a large set of sites).
The number of hazard maps produced by OpenQuake is equal to the number of POES values defined (if any).

Breakdown of calculation time

Initialization time:
 Engine startup
 Processing job input
 Loading the KVS cache
Calculation time:
 Computation of hazard curves and maps
Results creation time:
 Serializing map and curve data to the specified output destination (DB or XML)
Total time = initialization time + calculation time + results creation time
