Steel Bank Common Lisp

[PATCH] MAP-INTO is O(n^2) for lists

Reported by James M. Lawrence on 2012-05-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

(let ((data (make-array 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     0.000 seconds of real time

(let ((data (make-list 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     7.533 seconds of real time

I have rewritten MAP-INTO to use (MAP NIL ...).

In addition to fixing lists, the new MAP-INTO shaves 30% off execution time for vectors in my benchmarks.

I checked the generic sequence code in MAP-INTO by removing the vector and list typecases and re-running tests.

Nikodemus Siivola (nikodemus) wrote :

This looks very nice!

Nikodemus Siivola (nikodemus) wrote :

Very minor nit: tests should be wrapped in a WITH-TEST. I know nothing in map-tests.impure.lisp uses it yet, but we're glacially moving in the direction of all tests using it -- so all new tests should use it.

General vector leg would benefit from WITH-ARRAY-DATA. There are also more heroic measures that could be taken, eg. widetag dispatch, but WITH-ARRAY-DATA gets most of the way there.

Performance comparison with the open-coded version from seqtran.lisp would be interesting. I suspect it still wins big for unboxed vectors, but probably not so much for others.

These are non-critical observations. If you want to push this further, I'm happy to offer pointers, but I think this can go in pretty much as-is otherwise. Obviously nothing prevents from going further at a later date either. :)

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
importance: Undecided → Medium
status: New → In Progress
James M. Lawrence (llmjjmll) wrote :

Hi, thanks. I experimented with saetp dispatching but haven't found a
satisfactory strategy. I have a vested interest in making MAP-INTO
fast, so it's still something I want to pursue.

The problem I saw with dispatch tables a la vector-subseq* is that the
MAP NIL lambda in MAP-INTO has to keep state, i.e. the index variable.
It's also used to adjust the fill pointer later. We could put a
special variable inside the defuns in the dispatch table for MAP-INTO,
but would incrementing a special variable in a critical inner loop
spoil our efforts?

We could punt on O(1) dispatch, going instead with the worst case of
22 fixnum comparisons,

;;; Expands to a CASE form in which the DISPATCHER macro is called for
;;; each possible array element type of ARRAY.
(defmacro %dispatch-on-element-type (array dispatcher)
  `(case (%other-pointer-widetag ,array)
     ,@(map 'list
            (lambda (saetp)
              `(,(sb!vm:saetp-typecode saetp)
                 (,dispatcher ,(sb!vm:saetp-specifier saetp))))
            sb!vm:*specialized-array-element-type-properties*)))

That macro makes it easy to dispatch to closures. But given that SBCL
should scale both ways -- big arrays and tiny arrays -- the fixnum
comparisons may be spoilers.

I've updated the patch to use WITH-ARRAY-DATA, added some more tests,
and sprinkled some WITH-TESTs around map-tests.impure.lisp.

Is the SIMPLE-STRING dispatch of any value? I was following what the
compiler notes seemed to indicate. If (AREF (THE SIMPLE-VECTOR FOO) N)
is eventually transformed to the equivalent of (SVREF FOO N) then the
aref/svref parameter would be needless, but the disassemblies show a
difference.

Nikodemus Siivola (nikodemus) wrote :

Re. SIMPLE-STRING: hard to say for sure without benchmarking, really.

Oftentimes having both (SIMPLE-ARRAY CHARACTER (*)) and (SIMPLE-ARRAY BASE-CHAR (*)) in their own legs is worth it -- but I'm not really sure if MAP-INTO sees that much use with strings in the first place.

Nikodemus Siivola (nikodemus) wrote :

   (AREF (THE SIMPLE-VECTOR FOO) N)

should compile identically to

  (SVREF FOO N)

and if shape is

  (etypecase foo
    (simple-vector ...))

than just plain AREF should compile to identical code as well.

...operative word being "should". The AREF optimization paths are a bit too hairy for my taste, and it's always possible that something that should be firing isn't -- or the other way around.

James M. Lawrence (llmjjmll) wrote :

Array type dispatching should be a separate commit, if it happens at
all. In the meantime, handling the simple vector case has a
significant benefit and seems good enough for now.

AREF on an inferred simple-vector is about 9% slower than a direct
SVREF call, but since they are supposed to be equivalent I'll avoid
cluttering code with workarounds, so I've removed the aref/svref
parameter.

Feel free to suggest changes/fixes.

Nikodemus Siivola (nikodemus) wrote :

commit 151b7b5db692eb7c089e92100df0b037418e8d27
Author: James M. Lawrence <email address hidden>
Date: Thu May 17 19:16:54 2012 -0400

    fix MAP-INTO performance

    * remove the O(n^2) algorithm for lists

    * use (MAP NIL ...) for all sequence types

    * avoid unnecessary LENGTH calls

    * update fill pointer after mapping succeeds, not before

    * add tests for MAP-INTO (there were none!)

    * add some WITH-TESTs to tests/map-tests.impure.lisp

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers