That right there is a double-precision fma followed by a round
to single precision. This pattern is replicated for all single
precision operations, and is of course wrong.
I believe that correct results may be obtained by having
single-precision helpers that first convert the double-precision
input into a single-precision input using helper_tosingle(),
perform the required operation, then convert the result back to
double-precision using helper_todouble().
The manual says:
# For single-precision arithmetic instructions, all input values
# must be representable in single format; if they are not, the
# result placed into the target FPR, and the setting of
# status bits in the FPSCR and in the Condition Register
# (if Rc=1), are undefined.
The tosingle/todouble conversions are exact and bit-preserving.
They are used by load-single and store-single that convert a
single-precision in-memory value to the double-precision register
value. Therefore the input given to float32_add using this
conversion would be exactly the same as if we had given the
value unmollested from a memory input.
I don't know what real ppc hw does -- whether it takes all of
the double-precision input bits and rounds to 23-bits, like the
old 80387 hardware does, or truncates the input as I propose.
But for architectural results we don't have to care, because
of the UNDEFINED escape clause.
It should be a fused multiply add; you may need to use -ffast-math or
something to get the compiler to generate the proper instruction.
However, one can see from target/ ppc/translate/ fp-impl. inc.c:
/* fmadd - fmadds */
GEN_FLOAT_ACB(madd, 0x1D, 1, PPC_FLOAT);
through to _GEN_FLOAT_ACB:
gen_ helper_ f##op(t3, cpu_env, t0, t1, t2); \
gen_helper_ frsp(t3, cpu_env, t3); \
if (isfloat) { \
} \
That right there is a double-precision fma followed by a round
to single precision. This pattern is replicated for all single
precision operations, and is of course wrong.
I believe that correct results may be obtained by having
single-precision helpers that first convert the double-precision
input into a single-precision input using helper_tosingle(),
perform the required operation, then convert the result back to
double-precision using helper_todouble().
The manual says:
# For single-precision arithmetic instructions, all input values
# must be representable in single format; if they are not, the
# result placed into the target FPR, and the setting of
# status bits in the FPSCR and in the Condition Register
# (if Rc=1), are undefined.
The tosingle/todouble conversions are exact and bit-preserving.
They are used by load-single and store-single that convert a
single-precision in-memory value to the double-precision register
value. Therefore the input given to float32_add using this
conversion would be exactly the same as if we had given the
value unmollested from a memory input.
I don't know what real ppc hw does -- whether it takes all of
the double-precision input bits and rounds to 23-bits, like the
old 80387 hardware does, or truncates the input as I propose.
But for architectural results we don't have to care, because
of the UNDEFINED escape clause.