Revisit binary disaggregation matrix result structure

Bug #884250 reported by Lars Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
Medium
Lars Butler

Bug Description

Disaggregation NRML results are currently structured to assume 1 HDF5 file per result type, per site, per realization, per poe, etc. For example:
<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml"
      xmlns="http://openquake.org/xmlns/nrml/0.2"
      gml:id="IDXXX">
    <disaggregationResultField poE="0.1" IMT="PGA" endBranchLabel="1" gml:id="ID000">
            <disaggregationResultNode gml:id="ID001">
                    <site>
                        <gml:Point gml:id="ID002">
                                <gml:pos>0.0 0.0</gml:pos>
                        </gml:Point>
                    </site>
                    <disaggregationMatrixSet groundMotionValue="0.25">
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudePMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistancePMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistanceEpsilonPMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonPMF" path="/path"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonTectonicRegionTypePMF" path="/path"/>
                    </disaggregationMatrixSet>
            </disaggregationResultNode>
    </disaggregationResultField>
</nrml>

However, this can result in TONS of files in a large calculation. Thus, the python code written to write the data subsets (magdistpmf, latlonmagpmf, etc.) writes all types of results for a given site+poe+realization to a single file (each data subset is named according to its type--magdistpmf, for example).

Technically, we could keep the NRML structure the same and simply specify the same path for a collection of results. Like so:
<?xml version="1.0" encoding="UTF-8"?>
<nrml xmlns:gml="http://www.opengis.net/gml"
      xmlns="http://openquake.org/xmlns/nrml/0.2"
      gml:id="IDXXX">
    <disaggregationResultField poE="0.1" IMT="PGA" endBranchLabel="1" gml:id="ID000">
            <disaggregationResultNode gml:id="ID001">
                    <site>
                        <gml:Point gml:id="ID002">
                                <gml:pos>0.0 0.0</gml:pos>
                        </gml:Point>
                    </site>
                    <disaggregationMatrixSet groundMotionValue="0.25">
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudePMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistancePMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="MagnitudeDistanceEpsilonPMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonPMF" path="/same/path/to/matrices.h5"/>
                        <disaggregationMatrixBinaryFile disaggregationPMFType="LatitudeLongitudeMagnitudeEpsilonTectonicRegionTypePMF" path="/same/path/to/matrices.h5"/>
                    </disaggregationMatrixSet>
            </disaggregationResultNode>
    </disaggregationResultField>
</nrml>

Should we leave it like this (to allow flexibility of using individual files or separate files)?

Revision history for this message
Damiano Monelli (monelli) wrote :

To avoid duplicating the same path several times, what we can do is changing the schema so that the disaggregationMatrixSet element contains only one path.
The PMF types contained in the hdf5 file can be reported as attributes of the disaggregationResultField element, because they are common to all the nodes (that is locations) in the file.

Changed in openquake:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Lars Butler (lars-butler)
milestone: none → 0.4.6
Changed in openquake:
status: Confirmed → In Progress
Changed in openquake:
status: In Progress → Fix Committed
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.