Performance regression related to inlining

Bug #1960081 reported by Michael Kappert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

The following code (without any inline/notinline declarations) takes 0.5s on SBCL 2.1.7 but 16.5s on 2.1.8 - 2.2.1 (current):

(defun rad (x)
  (declare (double-float x))
  (* (* 2d0 pi) (* x (/ 1 360d0))))

(time
 (dotimes (k 100000000)
   (cis (rad 90d0))))

Declaiming RAD inline achieves the performance of 2.1.7 without inline declarations.
Surprisingly, declaiming RAD notinline is even faster.

Repeatable test case:

Save to test-inlining.cl and execute on 2.1.7 an 2.1.8 with
sbcl --no-userinit --eval '(progn (compile-file "test-inlining.cl") (load "test-inlining"))'
Add inline declarations and repeat.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Run with
;;; /opt/sbcl-2.2.1/bin/sbcl --no-userinit --eval '(progn (compile-file "test-inlining.cl") (load "test-inlining"))'
;;; OS: Fedora 34, Fedora 35

(defpackage :regression
  (:use :common-lisp))

(in-package :regression)

;; -- Performance regression when not using any inline declarations
;; -- Inlining RAD eliminates the performance regression
;; (declaim (inline rad))
;; -- Declaim notinline is faster than decaim inline ?!
;; (declaim (notinline rad))
(defun rad (x)
  (declare (double-float x))
  (* (* 2d0 pi) (* x (/ 1 360d0))))

;; SBCL 2.1.4, 2.1.6, 2.1.7:
;; no inline/notinline 1.9s
;; SBCL 2.2.0, 2.2.1, 2.1.8:
;; no inline/notinline: 16.5s
;; declaim inline rad: 1.5s
;; declaim notinline rad: 0.5s
(time
 (dotimes (k 100000000)
   (cis (rad 90d0))))

;;; EOF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

CL-USER> *features*
(:SWANK :CL-FAD :BORDEAUX-THREADS :THREAD-SUPPORT CFFI-FEATURES:FLAT-NAMESPACE
 CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX :CFFI CFFI-SYS::FLAT-NAMESPACE
 :SPLIT-SEQUENCE :SBCL-USES-SB-ROTATE-BYTE :QUICKLISP :ASDF3.3 :ASDF3.2
 :ASDF3.1 :ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE
 :X86-64 :GENCGC :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
 :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS :SB-THREAD
 :SB-UNICODE :SBCL :UNIX)
--------------------------------------------------------------------------------
uname -a
Linux aguas-13 5.15.18-100.fc34.x86_64 #1 SMP Sat Jan 29 13:00:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
--------------------------------------------------------------------------------

Revision history for this message
Stas Boukarev (stassats) wrote :

Is the better performance of 2.1.7 because it throws away the body of dotimes?

Changed in sbcl:
status: New → Invalid
Revision history for this message
Michael Kappert (mak08) wrote :

(defvar *acc* 0d0)
(defun rad (x)
  (declare (double-float x))
  (incf *acc*
        (* (* 2d0 pi) (* x (/ 1 360d0))))
(time
 (dotimes (k 100000000)
   (cis (rad 90d0))))
(format t "~a~%" *acc*)

==> Prints 1.57079632812136d8 in both 2.1.7 and 2.1.8 but it takes 6s in 2.1.7 vs. 24s in 2.1.8.
Am I still overlooking something?

Changed in sbcl:
status: Invalid → In Progress
Revision history for this message
Stas Boukarev (stassats) wrote :

You are not using the result of CIS.

Revision history for this message
Stas Boukarev (stassats) wrote :

This seems to be some weird AVX2 register interaction.

Stas Boukarev (stassats)
Changed in sbcl:
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.