UHS: Create empty HDF5 result file(s) (async task)

Bug #897149 reported by Lars Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
High
Lars Butler

Bug Description

The results of the UHS calculator shall be stored in matrix form inside an HDF5 file. At the beginning the calculation, the result files (1 file per PoE) should be created with empty datasets.

The datasets should be sized according to the calculation parameters:
  - For each file, there should be 1 dataset per site
  - Each dataset will be a 2D array consisting of N rows and M columns (where N is the number of logic tree samples and M is the number of UHS periods)

The reason why I want to create the empty datasets upfront is so that result writers can simply write to the result file without having to a) check if a dataset exists; or b) resize/append to a dataset.

In my initial investigation, I found that creating a series empty files (with datasets) took a significant amount of time (several seconds). Here's a snippet of code I used to test file/dataset creation:

import h5py
import numpy

if __name__== "__main__":

    poes = [0.1, 0.02, 0.03, 0.04]

    periods = [0.0, 0.25, 0.5, 0.75]

    sites = 10000
    samples = 100

    files = [h5py.File('poe_%s.h5' % p, 'w') for p in poes]

    for f in files:
        for site in xrange(sites):
            f.create_dataset('site_%s' % site, dtype=numpy.float64, shape=(samples, len(periods)))
        f.close()

This creates 4 files, each containing 10000 datasets. The size of each dataset does not seem to affect the run time of this code.

The above code takes approximately 7 seconds to run, which is why this should be implemented as a set of async tasks (1 per PoE). The purpose of each task will be to simple create an empty file for a given PoE (located in an NFS dir) with empty datasets.

These tasks must run to completion before the main UHS calculation starts.

For more details on calculator workflow, see bug # 888171.

Changed in openquake:
importance: Undecided → High
status: New → In Progress
assignee: nobody → Lars Butler (lars-butler)
Revision history for this message
Lars Butler (lars-butler) wrote :
Changed in openquake:
status: In Progress → Fix Committed
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.