Ubuntu packages for libjhdf4-java+libjhdf5-java and h5py are conflicting and causing weird crashes

Bug #882637 reported by Lars Butler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
Critical
Lars Butler

Bug Description

I am currently developing the main disaggregation task (bug # 860443). (At the time this bug was filed, this code has not yet landed in master). Part of this task makes a call to the DisaggregationCalculator class (in the Java code), which serializes a 5D disaggregation matrix to an HDF5 file. This code uses the libjhdf5-java and libjhdf4-java Ubuntu packages to serialize the file.

Some Python code was recently added which requires the python-h5py Ubuntu package. Once this code landed in master, I'm getting the following errors when I run the test suite:

$ ./run_tests -v
Test multiple add entry calls ... ok
test_add_entry_different_keys (tests.bulk_insert_unittest.BulkInserterTestCase) ... ok
test_flush (tests.bulk_insert_unittest.BulkInserterTestCase) ... ok
test_flush_geometry (tests.bulk_insert_unittest.BulkInserterTestCase) ... ok
Verify that :py:function:`openquake.kvs.cache_gc` is called. ... ok
Test that :py:function:`bin.cache_gc.clear_job_data` raises ... ok
Given the test data, make sure that ... ok
Test the proper behavior of ... ok
Test the proper behavior of ... ok
test_get_complex_fault_surface (tests.db_loader_unittest.NrmlModelLoaderTestCase) ... ok
Test that the ... ok
test_get_simple_fault_surface (tests.db_loader_unittest.NrmlModelLoaderTestCase) ... ok
test_parse_mfd_complex_fault (tests.db_loader_unittest.NrmlModelLoaderTestCase) ... ok
test_parse_mfd_simple_fault (tests.db_loader_unittest.NrmlModelLoaderTestCase) ... ok
test_parse_simple_fault_src (tests.db_loader_unittest.NrmlModelLoaderTestCase) ... ok
Test serialization of a single simple fault source with an ... ok
Similar to test_serialize, except the test input data includes a ... ok
For each model in the 'admin' schema, test for proper db routing ... ok
For each model in the 'admin' schema, test for proper db routing ... ok
For each model in the 'eqcat' schema, test for proper db routing ... ok
For each model in the 'eqcat' schema, test for proper db routing ... ok
For each model in the 'hzrdi' schema, test for proper db routing ... ok
For each model in the 'hzrdi' schema, test for proper db routing ... ok
For each model in the 'hzrdr' schema, test for proper db routing ... ok
For each model in the 'hzrdr' schema, test for proper db routing ... ok
For each model in the 'oqmif' schema, test for proper db routing ... ok
For each model in the 'oqmif' schema, test for proper db routing ... ok
For each model in the 'riski' schema, test for proper db routing ... ok
For each model in the 'riski' schema, test for proper db routing ... ok
For each model in the 'riskr' schema, test for proper db routing ... ok
For each model in the 'riskr' schema, test for proper db routing ... ok
For each model in the 'uiapi' schema, test for proper db routing ... ok
For each model in the 'uiapi' schema, test for proper db routing ... ok
The deterministic calculator is triggered. ... ok
test_loads_the_rupture_model (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
The hazard subsystem is able to trigger multiple computations. ... ok
test_simple_computation_using_the_java_calculator (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
The hazard subsystem stores the computed gmfs in kvs. ... ok
test_the_number_of_calculation_must_be_greater_than_zero (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
test_the_same_calculator_is_used_between_multiple_invocations (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
test_transforms_a_java_gmf_to_dict (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
test_when_measure_type_is_mmi_we_store_as_is (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
test_when_measure_type_is_not_mmi_exp_is_stored (tests.deterministic_hazard_unittest.DeterministicEventBasedMixinTestCase) ... ok
Exercise the deterministic risk job and make sure it runs end-to-end. ... ok
Exercises the function ... ok
Exercises the function ... ok
Test construction of a Double[] (Java array) from a list of floats. ... ok
Test the core function of the main disaggregation task. ... HDF5: infinite loop closing library
      D,T,FD,P,FD,P,FD,P,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E

If I completely remove the tests for the Python HDF5 code (tests/disagg_subsets_unittest.py), the test suite will not crash (as shown above).

Through a little more investigation, I found out that I can produce this error by simply importing `h5py` and running the tests which call the DisaggregationCalculator (which serializes HDF5 on the Java side).

This works:

$ (export DJANGO_SETTINGS_MODULE="openquake.settings"; nosetests tests/disaggregation_unittest.py)
..
----------------------------------------------------------------------
Ran 2 tests in 12.008s

OK

If I simply add an `import h5py` to this file (disaggregation_unittest.py), this happens:
$ (export DJANGO_SETTINGS_MODULE="openquake.settings"; nosetests tests/disaggregation_unittest.py)
.HDF5: infinite loop closing library
      D,T,FD,P,FD,P,FD,P,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E,E

Revision history for this message
Lars Butler (lars-butler) wrote :

al-maisan has been able to reproduce this error on this Ubuntu machine. Only Ubuntu systems are affected.

Revision history for this message
Lars Butler (lars-butler) wrote :

libhdf5 v1.8.7 does not appear to have this issues (as reported by Anton, who running on Gentoo). The packaged version of libhdf5 for Ubuntu is v1.8.4; this would indicate that the bug was fixed after 1.8.4.

Anton found a debian package for the newer version, which we may be able to use: http://packages.debian.org/experimental/libhdf5-7

Changed in openquake:
status: New → Confirmed
Changed in openquake:
importance: Undecided → Critical
John Tarter (toh2)
Changed in openquake:
milestone: none → 0.4.5
assignee: nobody → Lars Butler (lars-butler)
Revision history for this message
Lars Butler (lars-butler) wrote :

After doing a bit of research, we found an alternate version of libhdf5 (http://packages.debian.org/experimental/libhdf5-7). I can install it, but here's the problem: It basically replaces libhdf5-serial-1.8.4, packages which are dependent on libhdf5-serial-1.8.4 (such as libjhdf5-java and libjhdf4-java), and it breaks a few other packages required by openquake (such python-gdal). So it's a packaging nightmare.

So, over lunch I came up with a different solution: Remove all of hdf5 serialization from the Java side altogether. Instead, just return the 5D matrix to the pyton code from the Disaggregation java calculator through jpype and serialize it in python with h5py. I'm going to try this fix now.

Revision history for this message
Lars Butler (lars-butler) wrote :

Resolved by this pull request: https://github.com/gem/openquake/pull/559
Note: This pull also includes code for bug # 860443.

Changed in openquake:
status: Confirmed → In Progress
John Tarter (toh2)
Changed in openquake:
milestone: 0.4.5 → 0.4.6
Changed in openquake:
status: In Progress → Fix Committed
Revision history for this message
Paul Henshaw (paul-sl-henshaw) wrote :

New URL for PR following repository rename:

https://github.com/gem/oq-engine/pull/559

Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.