ASH lacks vops for efficient right-shift by non-constant amount
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
SBCL |
Fix Released
|
Wishlist
|
Unassigned |
Bug Description
The best vop translation of a known fixnum-to-fixnum right-shift uses the more general fixnum-to-fixnum translation for an unknown sign of 'amount' even when the amount is known by the front-end to be non-positive. In particular this makes LDB slower than need be.
This simple example should be reducible to the analogous case for Y being known non-negative:
(Noting that ASH-DERIVE-TYPE-AUX correctly avoids the need for (THE FIXNUM ...) around the whole expression.)
* (disassemble '(lambda (x y) (ash (the fixnum x) (the (integer -60 0) y))))
; disassembly for (LAMBDA (X Y))
; 02B882A2: 48D1FA SAR RDX, 1 ; no-arg-parsing entry point
; A5: 48D1F9 SAR RCX, 1
; A8: 4885C9 TEST RCX, RCX
; AB: 7913 JNS L1
; AD: 48F7D9 NEG RCX
; B0: 4883F93F CMP RCX, 63
; B4: 7605 JBE L0
; B6: B93F000000 MOV ECX, 63
; BB: L0: 48D3FA SAR RDX, CL
; BE: EB03 JMP L2
; C0: L1: 48D3E2 SHL RDX, CL
; C3: L2: 48D1E2 SHL RDX, 1
; C6: 488BE5 MOV RSP, RBP
; C9: F8 CLC
; CA: 5D POP RBP
; CB: C3 RET
NIL
Expect similar instruction stream as (substituting SHR for SHL):
* (disassemble '(lambda (x y) (the fixnum (ash (the fixnum x) (the (integer 0 20) y)))))
; disassembly for (LAMBDA (X Y))
; 02C2FEF2: 48D1F9 SAR RCX, 1 ; no-arg-parsing entry point
; 5: 48D3E2 SHL RDX, CL
; 8: 488BE5 MOV RSP, RBP
; B: F8 CLC
; C: 5D POP RBP
; D: C3 RET
I hope that those disassemblies are legible. The columns never seem to come out right for me with copy-n-paste.
Paul Khuong said that "It should be an easy project [too.]"
Changed in sbcl: | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |
tags: | added: arch-x86 compiler-ir2 easy optimization x86-64 |
Changed in sbcl: | |
status: | Confirmed → In Progress |
Changed in sbcl: | |
status: | Fix Committed → Fix Released |
Basically achieved by 60deeb7 (Simpler word-sized variable right shifts on x86 and x86-64).