SBCL

[PATCH] MAP-INTO is O(n^2) for lists

Bug #1001043 reported by James M. Lawrence on 2012-05-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	SBCL	Fix Released	Medium	Unassigned

Bug Description

(let ((data (make-array 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     0.000 seconds of real time

(let ((data (make-list 50000)))
  (time (map-into data #'identity data))
  nil)
=> Evaluation took:
     7.533 seconds of real time

I have rewritten MAP-INTO to use (MAP NIL ...).

In addition to fixing lists, the new MAP-INTO shaves 30% off execution time for vectors in my benchmarks.

I checked the generic sequence code in MAP-INTO by removing the vector and list typecases and re-running tests.

Tags:

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2012-05-18:

This looks very nice!

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2012-05-18:

Very minor nit: tests should be wrapped in a WITH-TEST. I know nothing in map-tests.impure.lisp uses it yet, but we're glacially moving in the direction of all tests using it -- so all new tests should use it.

General vector leg would benefit from WITH-ARRAY-DATA. There are also more heroic measures that could be taken, eg. widetag dispatch, but WITH-ARRAY-DATA gets most of the way there.

Performance comparison with the open-coded version from seqtran.lisp would be interesting. I suspect it still wins big for unboxed vectors, but probably not so much for others.

These are non-critical observations. If you want to push this further, I'm happy to offer pointers, but I think this can go in pretty much as-is otherwise. Obviously nothing prevents from going further at a later date either. :)

Changed in sbcl:
assignee:	nobody → Nikodemus Siivola (nikodemus)
importance:	Undecided → Medium
status:	New → In Progress

Revision history for this message

James M. Lawrence (llmjjmll) wrote on 2012-05-19:

Hi, thanks. I experimented with saetp dispatching but haven't found a
satisfactory strategy. I have a vested interest in making MAP-INTO
fast, so it's still something I want to pursue.

The problem I saw with dispatch tables a la vector-subseq* is that the
MAP NIL lambda in MAP-INTO has to keep state, i.e. the index variable.
It's also used to adjust the fill pointer later. We could put a
special variable inside the defuns in the dispatch table for MAP-INTO,
but would incrementing a special variable in a critical inner loop
spoil our efforts?

We could punt on O(1) dispatch, going instead with the worst case of
22 fixnum comparisons,

;;; Expands to a CASE form in which the DISPATCHER macro is called for
;;; each possible array element type of ARRAY.
(defmacro %dispatch-on-element-type (array dispatcher)
  `(case (%other-pointer-widetag ,array)
     ,@(map 'list
            (lambda (saetp)
              `(,(sb!vm:saetp-typecode saetp)
                 (,dispatcher ,(sb!vm:saetp-specifier saetp))))
            sb!vm:*specialized-array-element-type-properties*)))

That macro makes it easy to dispatch to closures. But given that SBCL
should scale both ways -- big arrays and tiny arrays -- the fixnum
comparisons may be spoilers.

I've updated the patch to use WITH-ARRAY-DATA, added some more tests,
and sprinkled some WITH-TESTs around map-tests.impure.lisp.

Is the SIMPLE-STRING dispatch of any value? I was following what the
compiler notes seemed to indicate. If (AREF (THE SIMPLE-VECTOR FOO) N)
is eventually transformed to the equivalent of (SVREF FOO N) then the
aref/svref parameter would be needless, but the disassemblies show a
difference.

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2012-05-20:

Re. SIMPLE-STRING: hard to say for sure without benchmarking, really.

Oftentimes having both (SIMPLE-ARRAY CHARACTER (*)) and (SIMPLE-ARRAY BASE-CHAR (*)) in their own legs is worth it -- but I'm not really sure if MAP-INTO sees that much use with strings in the first place.

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2012-05-20:

(AREF (THE SIMPLE-VECTOR FOO) N)

should compile identically to

(SVREF FOO N)

and if shape is

(etypecase foo
(simple-vector ...))

than just plain AREF should compile to identical code as well.

...operative word being "should". The AREF optimization paths are a bit too hairy for my taste, and it's always possible that something that should be firing isn't -- or the other way around.

Revision history for this message

James M. Lawrence (llmjjmll) wrote on 2012-05-21:

0001-fix-MAP-INTO-performance.patch Edit (12.2 KiB, text/plain)

Array type dispatching should be a separate commit, if it happens at
all. In the meantime, handling the simple vector case has a
significant benefit and seems good enough for now.

AREF on an inferred simple-vector is about 9% slower than a direct
SVREF call, but since they are supposed to be equivalent I'll avoid
cluttering code with workarounds, so I've removed the aref/svref
parameter.

Feel free to suggest changes/fixes.

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2012-05-27:

commit 151b7b5db692eb7c089e92100df0b037418e8d27
Author: James M. Lawrence <email address hidden>
Date: Thu May 17 19:16:54 2012 -0400

fix MAP-INTO performance

* remove the O(n^2) algorithm for lists

* use (MAP NIL ...) for all sequence types

* avoid unnecessary LENGTH calls

* update fill pointer after mapping succeeds, not before

* add tests for MAP-INTO (there were none!)

* add some WITH-TESTs to tests/map-tests.impure.lisp

Changed in sbcl:
assignee:	Nikodemus Siivola (nikodemus) → nobody
status:	In Progress → Fix Committed

Christophe Rhodes (csr21-cantab) on 2012-08-11

Changed in sbcl:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

0001-fix-MAP-INTO-performance.patch Edit

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.