unexpected numerical lossiness during decoding
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libECBUFR |
Fix Committed
|
High
|
cpb |
Bug Description
Encoder:
double kelvin = 273.15;
double drybulb_c = 0.0; /* Celsius */
double drybulb_k = drybulb_c + kelvin; /* Kelvin */
int desc = 12101;
double scale = 2;
double reference = 0.
int BUFR_int = drybulb_k * pow(10,scale) - reference;
In the BUFR message, this is being coded as 27315. Which, I think, we can all agree is a sufficiently accurate representation of 0.0 C irrespective of any intermediate IEEE 754 magic.
In other words, the encoder looks good.
Decoder:
int BUFR_int = 27315;
double drybulb_k = (BUFR_int + reference) / pow(10,scale);
double drybulb_c = drybulb_k - kelvin;
Now, here's the thing... irrespective of the IEEE 754 encoding of the kelvin constant, we've supposedly kept all our transformations in the same 64-bit IEEE 754 form _except_ where we convert to and from the integral BUFR encoding. And the value we're encoding, 273.15, is fully representable in the BUFR definition we're using. So, if the BUFR encoding/decoding sequence is stable, it _should_ be perfectly safe to say:
assert( drybulb_c == 0.0 );
Sound reasonable?
Well, that's not what's happening. What's happening is that the double representation of the decoded kelvin_k is 273.149994 (which, even though 64-bit can't exactly represent 273.15, is less than half the precision it should be able to get), which gives me a drybulb_c of -0.000006.
So, point being is _something_ in the decoding process appears to be lossy at the API level. What goes in is _not_ what's coming out.
Now, there may be a perfectly logical reason; it could be something as trivial as the (BUFR_int + reference) / pow(10,scale) calculation being done in a 32-bit floating point space.
Changed in libecbufr: | |
status: | New → Fix Committed |
Vanh confirms:
You are right, currently, libecbufr decode and store most of the values
in FLOAT32.
And when you obtain a double by calling the function _get_dvalue( )
bufr_descriptor
The returned value is simply a cast from float to double.
You can replicate the problem you mention here outside of libecbufr just
by casting a float to double.
float fval = 273.15;
double dval = 273.15;
double casted_val = fval;
double drybulb_c = dval - casted_val;
and you will end up with drybulb_c = -0.000006
The solution may be to do all decoding and store using double only.
You can do that by changing functions bufr_datatype_ to_valtype( ) to_valtype( )
and bufr_encoding_
Replace all VALTYPE_FLT32 by VALTYPE_FLT64