Bug #1847348 “Implement AND of fixnum = w. logxor/logior” : Bugs : SBCL

Revision history for this message

Paul F. Dietz (paul-f-dietz) wrote on 2019-10-08:

#1

The place this would really shine is string=.

Revision history for this message

Douglas Katzman (dougk) wrote on 2019-11-15:

#2

I'm pretty sure that this is not a transformation that a C compiler would make by itself for a similar fragment such as "if (a == b && b == c) baz(c);"
Code size isn't a compelling reason,; it could go either direction. Adding a use of the comparison as a predicate makes the algebraic version take more bytes

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (= x y z) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 49 bytes. Origin: #x52B4C12C ; (LAMBDA (X Y Z))
; 2C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 30: 488945F8 MOV [RBP-8], RAX
; 34: 4839DF CMP RDI, RBX
; 37: 740B JEQ L1
; 39: L0: BA17001050 MOV EDX, #x50100017 ; NIL
; 3E: 488BE5 MOV RSP, RBP
; 41: F8 CLC
; 42: 5D POP RBP
; 43: C3 RET
; 44: L1: 4839FE CMP RSI, RDI
; 47: 75F0 JNE L0
; 49: 488BD3 MOV RDX, RBX
; 4C: B902000000 MOV ECX, 2
; 51: FF7508 PUSH QWORD PTR [RBP+8]
; 54: B8A2804B50 MOV EAX, #x504B80A2 ; #<FDEFN FOO>
; 59: FFE0 JMP RAX
; 5B: CC10 INT3 16 ; Invalid argument count trap

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (eql (logior (logxor x y) (logxor y z)) 0) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 59 bytes. Origin: #x52B4C05C ; (LAMBDA (X Y Z))
; 5C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 60: 488945F8 MOV [RBP-8], RAX
; 64: 488BD3 MOV RDX, RBX
; 67: 4831FA XOR RDX, RDI
; 6A: 488BC7 MOV RAX, RDI
; 6D: 4831F0 XOR RAX, RSI
; 70: 4809C2 OR RDX, RAX
; 73: 4885D2 TEST RDX, RDX
; 76: 740B JEQ L0
; 78: BA17001050 MOV EDX, #x50100017 ; NIL
; 7D: 488BE5 MOV RSP, RBP
; 80: F8 CLC
; 81: 5D POP RBP
; 82: C3 RET
; 83: L0: 488BD3 MOV RDX, RBX
; 86: B902000000 MOV ECX, 2
; 8B: FF7508 PUSH QWORD PTR [RBP+8]
; 8E: B8A2804B50 MOV EAX, #x504B80A2 ; #<FDEFN FOO>
; 93: FFE0 JMP RAX
; 95: CC10 INT3 16 ; Invalid argument count trap

I'm pretty sure that this is not a transformation that a C compiler would make by itself for a similar fragment such as "if (a == b && b == c) baz(c);"
Code size isn't a compelling reason,; it could go either direction. Adding a use of the comparison as a predicate makes the algebraic version take more bytes

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (= x y z) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 49 bytes. Origin: #x52B4C12C                          ; (LAMBDA (X Y Z))
; 2C:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
; 30:       488945F8         MOV [RBP-8], RAX
; 34:       4839DF           CMP RDI, RBX
; 37:       740B             JEQ L1
; 39: L0:   BA17001050       MOV EDX, #x50100017              ; NIL
; 3E:       488BE5           MOV RSP, RBP
; 41:       F8               CLC
; 42:       5D               POP RBP
; 43:       C3               RET
; 44: L1:   4839FE           CMP RSI, RDI
; 47:       75F0             JNE L0
; 49:       488BD3           MOV RDX, RBX
; 4C:       B902000000       MOV ECX, 2
; 51:       FF7508           PUSH QWORD PTR [RBP+8]
; 54:       B8A2804B50       MOV EAX, #x504B80A2              ; #<FDEFN FOO>
; 59:       FFE0             JMP RAX
; 5B:       CC10             INT3 16                          ; Invalid argument count trap

* (disassemble '(lambda (x y z)(declare (fixnum x y z)) (if (eql (logior (logxor x y) (logxor y z)) 0) (foo x))))
; disassembly for (LAMBDA (X Y Z))
; Size: 59 bytes. Origin: #x52B4C05C                          ; (LAMBDA (X Y Z))
; 5C:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
; 60:       488945F8         MOV [RBP-8], RAX
; 64:       488BD3           MOV RDX, RBX
; 67:       4831FA           XOR RDX, RDI
; 6A:       488BC7           MOV RAX, RDI
; 6D:       4831F0           XOR RAX, RSI
; 70:       4809C2           OR RDX, RAX
; 73:       4885D2           TEST RDX, RDX
; 76:       740B             JEQ L0
; 78:       BA17001050       MOV EDX, #x50100017              ; NIL
; 7D:       488BE5           MOV RSP, RBP
; 80:       F8               CLC
; 81:       5D               POP RBP
; 82:       C3               RET
; 83: L0:   488BD3           MOV RDX, RBX
; 86:       B902000000       MOV ECX, 2
; 8B:       FF7508           PUSH QWORD PTR [RBP+8]
; 8E:       B8A2804B50       MOV EAX, #x504B80A2              ; #<FDEFN FOO>
; 93:       FFE0             JMP RAX
; 95:       CC10             INT3 16                          ; Invalid argument count trap

Revision history for this message

Douglas Katzman (dougk) wrote on 2020-12-12:

#3

Writing a version of string= that uses vector operations would be the simpler thing than trying to implement algebraic transforms to yield branch-free code (or fewer branches).
There isn't any framework in the compiler in which it fits.

In terms of bang-for-the-buck, we would be far better off with a few very simple tricks to produce branchless code from certain conditional expressions, particularly when a function just wants to return a genuine boolean (T or NIL) but implements it with jumps now. It could be something like:
set :ne eax # return T if eax is nonzero
and eax, 1 # since the 'set' operation only affects the low 8 bits
mov eax, [somewhere+8*eax]
where 'somewhere' is a fixed address whose first 2 words hold T and NIL. That kind of thing.
I would want to see the effect on the branch miss rate but it sounds like it should be a winner.
This issue per se is a wont-fix.

Changed in sbcl:
status:	New → Won't Fix

SBCL

Implement AND of fixnum = w. logxor/logior

Bug Description

Other bug subscribers

Remote bug watches