suboptimal x86-64 single-float-bits
As noted by Lutz Euler on sbcl-devel:
"SINGLE-FLOAT-BITS may be problematic as in it reads 64 bits from the
single-stack which AMD advises against if only 32 bits have been written
to the same address beforehand (this is a "narrow-to-wide store-to-load
forwarding restriction"). I have not yet evaluated how often this case
occurs and whether it poses a performance problem.
The code SINGLE-FLOAT-BITS generates could be improved anyway.
For example, when the source is on the single-stack it generates
MOV RDX, [RBP-8]
SHL RDX, 32
SAR RDX, 32
which is a funny way to sign-extend a value and where
MOVSXD RDX, DWORD PTR [RBP-8]
would be better -- a change that happens to shorten the read from the
single-stack from 64 to 32 bits, too, thus addressing the issue at hand."