[PATCH] MAP-INTO is O(n^2) for lists

Bug #1001043 reported by James M. Lawrence
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Medium
Unassigned

Bug Description

(let ((data (make-array 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     0.000 seconds of real time

(let ((data (make-list 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     7.533 seconds of real time

I have rewritten MAP-INTO to use (MAP NIL ...).

In addition to fixing lists, the new MAP-INTO shaves 30% off execution time for vectors in my benchmarks.

I checked the generic sequence code in MAP-INTO by removing the vector and list typecases and re-running tests.

Tags: review
Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

This looks very nice!

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

Very minor nit: tests should be wrapped in a WITH-TEST. I know nothing in map-tests.impure.lisp uses it yet, but we're glacially moving in the direction of all tests using it -- so all new tests should use it.

General vector leg would benefit from WITH-ARRAY-DATA. There are also more heroic measures that could be taken, eg. widetag dispatch, but WITH-ARRAY-DATA gets most of the way there.

Performance comparison with the open-coded version from seqtran.lisp would be interesting. I suspect it still wins big for unboxed vectors, but probably not so much for others.

These are non-critical observations. If you want to push this further, I'm happy to offer pointers, but I think this can go in pretty much as-is otherwise. Obviously nothing prevents from going further at a later date either. :)

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
James M. Lawrence (llmjjmll) wrote :

Hi, thanks. I experimented with saetp dispatching but haven't found a
satisfactory strategy. I have a vested interest in making MAP-INTO
fast, so it's still something I want to pursue.

The problem I saw with dispatch tables a la vector-subseq* is that the
MAP NIL lambda in MAP-INTO has to keep state, i.e. the index variable.
It's also used to adjust the fill pointer later. We could put a
special variable inside the defuns in the dispatch table for MAP-INTO,
but would incrementing a special variable in a critical inner loop
spoil our efforts?

We could punt on O(1) dispatch, going instead with the worst case of
22 fixnum comparisons,

;;; Expands to a CASE form in which the DISPATCHER macro is called for
;;; each possible array element type of ARRAY.
(defmacro %dispatch-on-element-type (array dispatcher)
  `(case (%other-pointer-widetag ,array)
     ,@(map 'list
            (lambda (saetp)
              `(,(sb!vm:saetp-typecode saetp)
                 (,dispatcher ,(sb!vm:saetp-specifier saetp))))
            sb!vm:*specialized-array-element-type-properties*)))

That macro makes it easy to dispatch to closures. But given that SBCL
should scale both ways -- big arrays and tiny arrays -- the fixnum
comparisons may be spoilers.

I've updated the patch to use WITH-ARRAY-DATA, added some more tests,
and sprinkled some WITH-TESTs around map-tests.impure.lisp.

Is the SIMPLE-STRING dispatch of any value? I was following what the
compiler notes seemed to indicate. If (AREF (THE SIMPLE-VECTOR FOO) N)
is eventually transformed to the equivalent of (SVREF FOO N) then the
aref/svref parameter would be needless, but the disassemblies show a
difference.

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

Re. SIMPLE-STRING: hard to say for sure without benchmarking, really.

Oftentimes having both (SIMPLE-ARRAY CHARACTER (*)) and (SIMPLE-ARRAY BASE-CHAR (*)) in their own legs is worth it -- but I'm not really sure if MAP-INTO sees that much use with strings in the first place.

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

   (AREF (THE SIMPLE-VECTOR FOO) N)

should compile identically to

  (SVREF FOO N)

and if shape is

  (etypecase foo
    (simple-vector ...))

than just plain AREF should compile to identical code as well.

...operative word being "should". The AREF optimization paths are a bit too hairy for my taste, and it's always possible that something that should be firing isn't -- or the other way around.

Revision history for this message
James M. Lawrence (llmjjmll) wrote :

Array type dispatching should be a separate commit, if it happens at
all. In the meantime, handling the simple vector case has a
significant benefit and seems good enough for now.

AREF on an inferred simple-vector is about 9% slower than a direct
SVREF call, but since they are supposed to be equivalent I'll avoid
cluttering code with workarounds, so I've removed the aref/svref
parameter.

Feel free to suggest changes/fixes.

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

commit 151b7b5db692eb7c089e92100df0b037418e8d27
Author: James M. Lawrence <email address hidden>
Date: Thu May 17 19:16:54 2012 -0400

    fix MAP-INTO performance

    * remove the O(n^2) algorithm for lists

    * use (MAP NIL ...) for all sequence types

    * avoid unnecessary LENGTH calls

    * update fill pointer after mapping succeeds, not before

    * add tests for MAP-INTO (there were none!)

    * add some WITH-TESTs to tests/map-tests.impure.lisp

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.