Float truncate generates redundant machine code
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
SBCL |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
In SBCL 1.1.8.53, under x86-64, the following function
(defun f (x)
(declare (optimize speed)
(type (double-float 1.0d0 4.0d0) x))
(nth-value 0 (truncate x)))
compiles to:
CVTTSD2SI RCX, XMM0 ; no-arg-parsing entry point
SHL RCX, 1
MOV RAX, RCX
SAR RAX, 1
XORPD XMM0, XMM0
CVTSI2SD XMM0, RAX
MOV RDX, RCX
MOV RSP, RBP
CLC
POP RBP
RET
The integer result of the truncation is redundantly converted to a double float in XMM0.
In 1.0.27.46 (just some old version I happen to have readily compiled) this worked better:
CVTTSD2SI RDX, XMM0 ; no-arg-parsing entry point
SHL RDX, 3
MOV RSP, RBP
CLC
POP RBP
RET
In some SBCL versions inbetween these two a comparable function accidentally used generic arithmetic (see bug #489388). This was fixed in 1.0.34.11 (and a few following commits) and since then this redundant code is generated.
Could the redundant instructions be optimized away?
Changed in sbcl: | |
status: | New → Fix Released |
(values (truncate (the double-float x))) also generates suboptimal code, because SB-KERNEL: %DOUBLE- FLOAT is not marked as flushable (because it can signal errors) and the transform tries to be smart, but fails.