strange performance behavior of mem-aref

Bug #1833150 reported by Chen Ruofan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CFFI
Won't Fix
Wishlist
Unassigned

Bug Description

I'm using CFFI from quicklisp whose version seems 0.20.0 and SBCL 1.5.3. The system is Linux version 4.19.49-1-MANJARO.

In my computer the following code consumes about 0.5 seconds
----------------------------------------------------------------
(defparameter data (foreign-alloc :double :count 1000000))
(defparameter dtype :double)
(time (loop for i from 0 below 1000000
            do (mem-aref data dtype i))) ; fixed as per comment #1
Evaluation took:
  2.434 seconds of real time
  0.005647 seconds of total run time (0.005329 user, 0.000318 system)
  0.25% CPU
  3 forms interpreted
  6,063,810,392 processor cycles
  4 page faults
  1,162,272 bytes consed
-----------------------------------------------------------------

However, if replace dtype by :double directly then the time consumed is just 0.002 seconds. It is almost 250 times faster
-----------------------------------------------------------------
(time (loop for i from 0 below 1000000
            do (mem-aref data :double i)))
Evaluation took:
  0.002 seconds of real time
  0.002469 seconds of total run time (0.002190 user, 0.000279 system)
  100.00% CPU
  6,150,333 processor cycles
  0 bytes consed
------------------------------------------------------------------

I don't if it is a bug or feature. It greatly affects the efficiency of accessing the slots of object built upon CFFI anyway.

Revision history for this message
Chen Ruofan (physcrf) wrote :

Sorry, the first part of code is pasted wrong, it is
------------------------------------------------------------
(time (loop for i from 0 below 1000000
       do (mem-aref data dtype i)))
Evaluation took:
  0.447 seconds of real time
  0.445206 seconds of total run time (0.445087 user, 0.000119 system)
  [ Run times consist of 0.019 seconds GC time, and 0.427 seconds non-GC time. ]
  99.55% CPU
  1,112,480,384 processor cycles
  143,980,224 bytes consed
------------------------------------------------------------

Luís Oliveira (luismbo)
description: updated
Revision history for this message
Luís Oliveira (luismbo) wrote :

It's not a feature, of course :-), but not quite a bug either. When the type passed to MEM-AREF is not known at compile-time, it has to do a bunch of calculations at runtime. The same applies to virtually every other CFFI function.

We could try to speed up the slow path. For starters, we could run the statistical profiler with you microbenchmark and see if there's any low-hanging fruit. But, it's usually the case that the type is known at compile time.

I'll tentatively mark this issue as "Won't Fix", but perhaps you have a strong use-case that'll motivate us to change this to a Wishlist item. :-) What's your use case?

Changed in cffi:
status: New → Incomplete
status: Incomplete → Won't Fix
importance: Undecided → Wishlist
Revision history for this message
Chen Ruofan (physcrf) wrote :

Basically I'm trying to write a numerical array object in Common Lisp named narray, see https://github.com/physcrf/narray. I plan to use this narray as the basis of further scientific libraries.

The class narray contains the type information and thus we can define a row-major-nref for narray which you don't need to specify the type when using it. This row-major-nref is, of course, based on mem-aref, and the type is given to mem-aref at run time. When I tested matrix manipulation code, I found it is much slower than cl:row-major-aref.

Thanks for your reply. However, I'm still confused. The code (time (mem-aref data dtype 0)) consumes 738,180 processor cycles while (time (mem-aref data :double 0)) consumes only 3,150 processor cycles. The evaluation of (time dtype) takes 2,214 processor cycles. Why dtype and :double codes differ that much.

Maybe it has something to do with Common Lisp itself, and I'll try some other way to speed up the accessor.

Revision history for this message
Luís Oliveira (luismbo) wrote :

Have a look at the MEM-AREF function definition vs its compiler macro definition. The latter only kicks in when the type argument is constant. Profiling the slow MEM-AREF call should give you an idea of what's using all of those processor cycles.

Revision history for this message
Chen Ruofan (physcrf) wrote :

Thanks, I'll look the definition.

Revision history for this message
Chen Ruofan (physcrf) wrote :

Now the problem is solved by defining individual aref function for each data type. It is not an obvious way but it works.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.