sb-simd: Trouble with sse+xmm0 encoding

Bug #2015329 reported by ari pro
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

There is an issue with the sse+xmm0 encoding in https://github.com/sbcl/sbcl/blob/724690647ba5011f98afbea6243eb95627a0663e/contrib/sb-simd/code/define-instruction-vops.lisp#L111-L142 , which affects all of the -if and -blend functions in the sb-simd-sse4.1 package. The trouble is that xmm0 is sometimes used to store the first argument, x. The routine as it is written moves z into xmm0 without checking if xmm0 is in use, which can clobber x. Here is a test case:

(require "SB-SIMD")
(setf sb-ext:*evaluator-mode* :compile)
(macrolet
    ((fudge ()
       `(sb-simd-sse4.1::s64.2-blend
  (sb-simd-sse4.1:make-s64.2 -34359738367 -4294967295)
  (sb-simd-sse4.1:make-s64.2 -35184372088833 +31)
  (sb-simd-sse4.1:make-u64.2 18014398509481984 8))))
  (values (fudge) (let ((bar (fudge))) bar)))
=> #<SIMD-PACK +18014398509481984 +8>, #<SIMD-PACK -34359738367 -4294967295>

Here is a snippet of the output of the following:
(flet ((foo (a b mask)
  (declare (optimize (debug 1)))
  (sb-simd-sse4.1::s64.2-blend a b mask)))
  (sb-disassem:disassemble-code-component #'foo))

; 4ED: F30F6F5DB8 MOVDQU XMM3, [RBP-72]
; 4F2: 660F6F5201 MOVDQA XMM2, [RDX+1]
; 4F7: F30F6F45C8 MOVDQU XMM0, [RBP-56]
; 4FC: 660F6FC2 MOVDQA XMM0, XMM2
; 500: 660F6FD0 MOVDQA XMM2, XMM0
; 504: 660F3810D3 PBLENDVB XMM2, XMM3, XMM0

It appears that XMM2 (mask) overwrites XMM0 (a).

Compare that to the snippet from
(flet ((foo (a b mask)
  (declare (optimize (debug 3)))
  (sb-simd-sse4.1::s64.2-blend a b mask)))
  (sb-disassem:disassemble-code-component #'foo))

; 2DF: 660F6F6801 MOVDQA XMM5, [RAX+1]
; 2E4: 66410F6F6001 MOVDQA XMM4, [R8+1]
; 2EA: 660F6F5A01 MOVDQA XMM3, [RDX+1]
; 2EF: 41807D0800 CMP BYTE PTR [R13+8], 0 ; thread.stepping
; 2F4: 7402 JEQ L3
; 2F6: CC0E INT3 14 ; single-step trap (before)
; 2F8: L3: 660F6FC3 MOVDQA XMM0, XMM3
; 2FC: 660F6FD5 MOVDQA XMM2, XMM5
; 300: 660F3810D4 PBLENDVB XMM2, XMM4, XMM0

and, given (s64.2-blend a b mask) == (s64.2-if mask b a), note that
                        z (mask) y (b) x (a)
                 ------------------|----------------|-------------
(values
 (sb-simd:s64-if 18014398509481984 -35184372088833 -34359738367)
 (sb-simd:s64-if 8 31 -4294967295))
=> -34359738367, -4294967295

(values
 (sb-simd:s64-if 18014398509481984 -35184372088833 18014398509481984)
 (sb-simd:s64-if 8 31 8))
=> 18014398509481984, 8

which explains the behavior of the test case.

I attached a patch which modifies the routine to check if xmm0 is in use before assigning to it. The test suite currently passes with and without the current patch. But if you apply the diff below and then run the test suite (or just the code following the diff), you will see that the tests for the sse4.1 -if functions fail with the current code and pass with this patch.

---
 contrib/sb-simd/test-suite/test-suite.lisp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/sb-simd/test-suite/test-suite.lisp b/contrib/sb-simd/test-suite/test-suite.lisp
index 79fba2e9f..d7d9a7c5a 100644
--- a/contrib/sb-simd/test-suite/test-suite.lisp
+++ b/contrib/sb-simd/test-suite/test-suite.lisp
@@ -62,7 +62,7 @@
   (let ((name (intern-test-name test-name)))
     `(prog1 ',name
        (defun ,name ()
- (declare (optimize (debug 3) (safety 3)))
+ (declare (optimize (debug 1) (safety 3)))
          (with-test-harness
            (enter-test ',name)
            ,@body))
--
2.39.1

;; From tests/sb-simd.impure.lisp. Adjust the pathname if necessary.
(with-compilation-unit ()
  (dolist (file '("packages.lisp"
    "numbers.lisp"
    "utilities.lisp"
    "test-suite.lisp"
    "test-arefs.lisp"
    "test-simple-simd-functions.lisp"
    "test-horizontal-functions.lisp"
    "test-hairy-simd-functions.lisp"
    "test-packages.lisp"))
    (load (merge-pathnames file #P"../contrib/sb-simd/test-suite/"))))
(dolist (test sb-simd-test-suite::*tests*)
  (when (and (search "SSE4.1" (symbol-name test))
      (search "IF" (symbol-name test)))
    (handler-case (funcall test)
      (simple-error (c) (format t ": ~A~%" c)))))

One more thing: if you open contrib/sb-simd/code/define-instruction-vops.lisp in Emacs with the latest version of Slime, and you load the slime-sbcl-exts contrib, running (indent-sexp) on the file will the change the indentation of all of the :generator sections. See the last three commits of https://github.com/slime/slime/commits/master/contrib/slime-sbcl-exts.el (43d62a36ecd2b2ed1349388a9abdacca9b100c05 to 0470fc048fbd7987a25413be37f4e0efd93c204f).

Revision history for this message
ari pro (aripro) wrote :
Stas Boukarev (stassats)
Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.