Implement AND of fixnum = w. logxor/logior

Bug #1847348 reported by Paul F. Dietz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Won't Fix
Undecided
Unassigned

Bug Description

Wishlist: implement a short sequence of fixnum = comparisons with logxor and logior:

(and (= a b) (= c d)) ==> (eql (logior (logxor a b) (logxor c d)) 0)

On x86 the code is both shorter and faster (due to no conditional branches).

Example:

(defun f6 (x y z)
  (declare (type (integer 0 1000) x y z))
  (= x y z))

(defun f7 (x y z)
  (declare (type (integer 0 1000) x y z))
  (eql (logior (logxor x y) (logxor y z)) 0))

* (disassemble 'f6)
; disassembly for F6
; Size: 36 bytes. Origin: #x52D7B486
; 86: 4839D7 CMP RDI, RDX ; no-arg-parsing entry point
; 89: 740B JEQ L1
; 8B: BA17001050 MOV EDX, #x50100017 ; NIL
; 90: L0: 488BE5 MOV RSP, RBP
; 93: F8 CLC
; 94: 5D POP RBP
; 95: C3 RET
; 96: L1: 4839FE CMP RSI, RDI
; 99: BA17001050 MOV EDX, #x50100017 ; NIL
; 9E: 41BB4F001050 MOV R11D, #x5010004F ; T
; A4: 490F44D3 CMOVEQ RDX, R11
; A8: EBE6 JMP L0
NIL
* (disassemble 'f7)
; disassembly for F7
; Size: 33 bytes. Origin: #x52D7B516
; 16: 4831FA XOR RDX, RDI ; no-arg-parsing entry point
; 19: 4831F7 XOR RDI, RSI
; 1C: 4809FA OR RDX, RDI
; 1F: 4885D2 TEST RDX, RDX
; 22: BA17001050 MOV EDX, #x50100017 ; NIL
; 27: 41BB4F001050 MOV R11D, #x5010004F ; T
; 2D: 490F44D3 CMOVEQ RDX, R11
; 31: 488BE5 MOV RSP, RBP
; 34: F8 CLC
; 35: 5D POP RBP
; 36: C3 RET

Revision history for this message
Paul F. Dietz (paul-f-dietz) wrote :

The place this would really shine is string=.

Revision history for this message
Douglas Katzman (dougk) wrote :

I'm pretty sure that this is not a transformation that a C compiler would make by itself for a similar fragment such as "if (a == b && b == c) baz(c);"
Code size isn't a compelling reason,; it could go either direction. Adding a use of the comparison as a predicate makes the algebraic version take more bytes

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (= x y z) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 49 bytes. Origin: #x52B4C12C ; (LAMBDA (X Y Z))
; 2C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 30: 488945F8 MOV [RBP-8], RAX
; 34: 4839DF CMP RDI, RBX
; 37: 740B JEQ L1
; 39: L0: BA17001050 MOV EDX, #x50100017 ; NIL
; 3E: 488BE5 MOV RSP, RBP
; 41: F8 CLC
; 42: 5D POP RBP
; 43: C3 RET
; 44: L1: 4839FE CMP RSI, RDI
; 47: 75F0 JNE L0
; 49: 488BD3 MOV RDX, RBX
; 4C: B902000000 MOV ECX, 2
; 51: FF7508 PUSH QWORD PTR [RBP+8]
; 54: B8A2804B50 MOV EAX, #x504B80A2 ; #<FDEFN FOO>
; 59: FFE0 JMP RAX
; 5B: CC10 INT3 16 ; Invalid argument count trap

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (eql (logior (logxor x y) (logxor y z)) 0) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 59 bytes. Origin: #x52B4C05C ; (LAMBDA (X Y Z))
; 5C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 60: 488945F8 MOV [RBP-8], RAX
; 64: 488BD3 MOV RDX, RBX
; 67: 4831FA XOR RDX, RDI
; 6A: 488BC7 MOV RAX, RDI
; 6D: 4831F0 XOR RAX, RSI
; 70: 4809C2 OR RDX, RAX
; 73: 4885D2 TEST RDX, RDX
; 76: 740B JEQ L0
; 78: BA17001050 MOV EDX, #x50100017 ; NIL
; 7D: 488BE5 MOV RSP, RBP
; 80: F8 CLC
; 81: 5D POP RBP
; 82: C3 RET
; 83: L0: 488BD3 MOV RDX, RBX
; 86: B902000000 MOV ECX, 2
; 8B: FF7508 PUSH QWORD PTR [RBP+8]
; 8E: B8A2804B50 MOV EAX, #x504B80A2 ; #<FDEFN FOO>
; 93: FFE0 JMP RAX
; 95: CC10 INT3 16 ; Invalid argument count trap

Revision history for this message
Douglas Katzman (dougk) wrote :

Writing a version of string= that uses vector operations would be the simpler thing than trying to implement algebraic transforms to yield branch-free code (or fewer branches).
There isn't any framework in the compiler in which it fits.

In terms of bang-for-the-buck, we would be far better off with a few very simple tricks to produce branchless code from certain conditional expressions, particularly when a function just wants to return a genuine boolean (T or NIL) but implements it with jumps now. It could be something like:
 set :ne eax # return T if eax is nonzero
 and eax, 1 # since the 'set' operation only affects the low 8 bits
 mov eax, [somewhere+8*eax]
where 'somewhere' is a fixed address whose first 2 words hold T and NIL. That kind of thing.
I would want to see the effect on the branch miss rate but it sounds like it should be a winner.
This issue per se is a wont-fix.

Changed in sbcl:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.