OpenQuake (deprecated)

Revisit binary disaggregation matrix result structure

Bug #884250 reported by Lars Butler on 2011-10-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenQuake (deprecated)	Fix Released	Medium	Lars Butler	OpenQuake (deprecated) 0.4.6

Bug Description

Disaggregation NRML results are currently structured to assume 1 HDF5 file per result type, per site, per realization, per poe, etc. For example:
<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml"
      xmlns="http://openquake.org/xmlns/nrml/0.2"
      gml:id="IDXXX">
    <disaggregationResultField poE="0.1" IMT="PGA" endBranchLabel="1" gml:id="ID000">
            <disaggregationResultNode gml:id="ID001">
                    <site>
                        <gml:Point gml:id="ID002">
                                <gml:pos>0.0 0.0</gml:pos>
                        </gml:Point>
                    </site>
                    <disaggregationMatrixSet groundMotionValue="0.25">
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudePMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistancePMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistanceEpsilonPMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonPMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonTectonicRegionTypePMF" path="/path"/>
                    </disaggregationMatrixSet>
            </disaggregationResultNode>
    </disaggregationResultField>
</nrml>

However, this can result in TONS of files in a large calculation. Thus, the python code written to write the data subsets (magdistpmf, latlonmagpmf, etc.) writes all types of results for a given site+poe+realization to a single file (each data subset is named according to its type--magdistpmf, for example).

Technically, we could keep the NRML structure the same and simply specify the same path for a collection of results. Like so:
<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml"
      xmlns="http://openquake.org/xmlns/nrml/0.2"
      gml:id="IDXXX">
    <disaggregationResultField poE="0.1" IMT="PGA" endBranchLabel="1" gml:id="ID000">
            <disaggregationResultNode gml:id="ID001">
                    <site>
                        <gml:Point gml:id="ID002">
                                <gml:pos>0.0 0.0</gml:pos>
                        </gml:Point>
                    </site>
                    <disaggregationMatrixSet groundMotionValue="0.25">
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudePMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistancePMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistanceEpsilonPMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonPMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonTectonicRegionTypePMF" path="/same/path/to/matrices.h5"/>
                    </disaggregationMatrixSet>
            </disaggregationResultNode>
    </disaggregationResultField>
</nrml>

Should we leave it like this (to allow flexibility of using individual files or separate files)?

Revision history for this message

Damiano Monelli (monelli) wrote on 2011-11-02:

To avoid duplicating the same path several times, what we can do is changing the schema so that the disaggregationMatrixSet element contains only one path.
The PMF types contained in the hdf5 file can be reported as attributes of the disaggregationResultField element, because they are common to all the nodes (that is locations) in the file.

Lars Butler (lars-butler) on 2011-11-03

Changed in openquake:
status:	New → Confirmed
importance:	Undecided → Medium
assignee:	nobody → Lars Butler (lars-butler)
milestone:	none → 0.4.6

Lars Butler (lars-butler) on 2011-11-04

Changed in openquake:
status:	Confirmed → In Progress

Lars Butler (lars-butler) on 2011-11-07

Changed in openquake:
status:	In Progress → Fix Committed

Lars Butler (lars-butler) on 2013-04-08

Changed in openquake:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Disaggregation calculator

Remote bug watches

Bug watches keep track of this bug in other bug trackers.