SBCL

Performance regression related to inlining

Bug #1960081 reported by Michael Kappert on 2022-02-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	SBCL	Fix Released	Undecided	Unassigned

Bug Description

The following code (without any inline/notinline declarations) takes 0.5s on SBCL 2.1.7 but 16.5s on 2.1.8 - 2.2.1 (current):

(defun rad (x)
(declare (double-float x))
(* (* 2d0 pi) (* x (/ 1 360d0))))

(time
(dotimes (k 100000000)
(cis (rad 90d0))))

Declaiming RAD inline achieves the performance of 2.1.7 without inline declarations.
Surprisingly, declaiming RAD notinline is even faster.

Repeatable test case:

Save to test-inlining.cl and execute on 2.1.7 an 2.1.8 with
sbcl --no-userinit --eval '(progn (compile-file "test-inlining.cl") (load "test-inlining"))'
Add inline declarations and repeat.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Run with
;;; /opt/sbcl-2.2.1/bin/sbcl --no-userinit --eval '(progn (compile-file "test-inlining.cl") (load "test-inlining"))'
;;; OS: Fedora 34, Fedora 35

(defpackage :regression
(:use :common-lisp))

(in-package :regression)

;; -- Performance regression when not using any inline declarations
;; -- Inlining RAD eliminates the performance regression
;; (declaim (inline rad))
;; -- Declaim notinline is faster than decaim inline ?!
;; (declaim (notinline rad))
(defun rad (x)
(declare (double-float x))
(* (* 2d0 pi) (* x (/ 1 360d0))))

;; SBCL 2.1.4, 2.1.6, 2.1.7:
;; no inline/notinline 1.9s
;; SBCL 2.2.0, 2.2.1, 2.1.8:
;; no inline/notinline: 16.5s
;; declaim inline rad: 1.5s
;; declaim notinline rad: 0.5s
(time
(dotimes (k 100000000)
(cis (rad 90d0))))

;;; EOF
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

CL-USER> *features*
(:SWANK :CL-FAD :BORDEAUX-THREADS :THREAD-SUPPORT CFFI-FEATURES:FLAT-NAMESPACE
CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX :CFFI CFFI-SYS::FLAT-NAMESPACE
:SPLIT-SEQUENCE :SBCL-USES-SB-ROTATE-BYTE :QUICKLISP :ASDF3.3 :ASDF3.2
:ASDF3.1 :ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE
:X86-64 :GENCGC :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
:LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS :SB-THREAD
:SB-UNICODE :SBCL :UNIX)
--------------------------------------------------------------------------------
uname -a
Linux aguas-13 5.15.18-100.fc34.x86_64 #1 SMP Sat Jan 29 13:00:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
--------------------------------------------------------------------------------

Revision history for this message

Stas Boukarev (stassats) wrote on 2022-02-05:

Is the better performance of 2.1.7 because it throws away the body of dotimes?

Changed in sbcl:
status:	New → Invalid

Revision history for this message

Michael Kappert (mak08) wrote on 2022-02-05:

(defvar *acc* 0d0)
(defun rad (x)
  (declare (double-float x))
  (incf *acc*
        (* (* 2d0 pi) (* x (/ 1 360d0))))
(time
(dotimes (k 100000000)
   (cis (rad 90d0))))
(format t "~a~%" *acc*)

==> Prints 1.57079632812136d8 in both 2.1.7 and 2.1.8 but it takes 6s in 2.1.7 vs. 24s in 2.1.8.
Am I still overlooking something?

Changed in sbcl:
status:	Invalid → In Progress

Revision history for this message

Stas Boukarev (stassats) wrote on 2022-02-05:

You are not using the result of CIS.

Revision history for this message

Stas Boukarev (stassats) wrote on 2022-02-05:

This seems to be some weird AVX2 register interaction.

Stas Boukarev (stassats) on 2022-02-05

Changed in sbcl:
status:	In Progress → Fix Committed

Christophe Rhodes (csr21-cantab) on 2022-02-26

Changed in sbcl:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.