Enumerations are implemented inefficiently

Bug #250094 reported by Gwen Weinholt
4
Affects Status Importance Assigned to Milestone
Ikarus Scheme
Fix Committed
Medium
Abdulaziz Ghuloum

Bug Description

Enumerations are currently implemented as lists. Enum-sets would be more appropriately (and efficiently) implemented as numbers with an attached universe.
enum-set-union would be bitwise-ior, enum-set-intersection would be bitwise-and, enum-set-member? would be bitwise-bit-set? and so on. The syntax that define-enumeration creates would further help optimize enum-set operations at compile time.
An implementation that translates enum-sets into numbers would make enum-sets much more interesting. Not long ago I needed just this in order to implement liveness analysis (where registers are perfect to put in enum-sets, and a bitwise-based implementation makes at least an order of magnitude difference in performance).
From what I see there's nothing that would prevent an implementation like this, it might even be the intended way to implement it.

Related branches

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

All of this seems correct. The current implementation of enum-sets was done while I was trying to understand their specification (i.e., with total disregard to efficiency). One problem I see is that some enum operations (e.g., enum-set=?, enum-set-subset?) may take enum-sets of different types and compares both the values of the enum-sets as well as their universes. I haven't thought much of how to do this efficiently (any ideas?).

Changed in ikarus:
assignee: nobody → aghuloum
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
leppie (leppie) wrote :

> (any ideas?).

Too difficult/long-winded to explain, so I implemented it instead :-) (i need it too in IronScheme, as mine is limited to 64 symbols currently).

I didnt really test it, and didnt add many checks, but you will get the idea. Except for gensym, its R6RS portable :)

Revision history for this message
leppie (leppie) wrote :

Oops, made a typo in enum-set-projection :) (was returning incorrect enum type)

Revision history for this message
Gwen Weinholt (weinholt) wrote :

I also had trouble explaining how to implement it, so I implemented enum-set=?. I'm not sure which approach is more efficient, mine or leppie's.
leppie: I'm having a little trouble with your enum-set=?:
> (enum-set=? (make-enumeration '(black))
            (make-enumeration '(white)))

Unhandled exception
 Condition components:
   1. &error
   2. &who: apply
   3. &message: "not a procedure"
   4. &irritants: (#f)

I'm going to bed now, may the best enum-set=? win. :)

Revision history for this message
leppie (leppie) wrote :

Ha! Damn C# :-)

the if (v2*) must be if v2* !

Updated file.

Revision history for this message
leppie (leppie) wrote :

I added checks, and fixed a bug with make-enumeration not handling duplicate symbols correctly.

The ironscheme references can just be replaced with ikarus. You can attach a record printer too, but I dont know how to do that on ikarus.

Cheers

leppie

Revision history for this message
Michael D. Adams (mdmkolbe) wrote :

leppie, there are a lot of places the implementation you posted can and should be improved.

Those enummap, etc. hashtables introduces a unnecessary memory leak (e.g. "(let loop () (make-enumeration '(foo bar)))").

Having enumordermap contain a list of symbols instead of a hashtable of symbols makes some operations asymptotically slower than they could be, e.g., enum-indexer.

The enum-set-difference function is supposed to be asymmetric not symmetric difference resulting from xor.

The enum-set-complement function should be more directly implemented as a bitwise not, but whether that is effective might depend on how you implementation bignums.

Revision history for this message
leppie (leppie) wrote :

Thanks Michael

The memory leak isn't really a concern (for me anyways), as make-enumeration will create a new record every time (I cant gc records in my case) anyways which is probably more than a few bytes in a hashtable.

I have optimized enum-indexer to use bitwise-length instead (thanks).

I am not sure about the symmetry issue. From the R6RS example it seems assymetric, but it makes no mention of it. How does this assymetry work?

For enum-set-complement, I will need to see how it is done with bitwise-not. I assume something like (- (bitwise-not bignum)).

Cheers

leppie

PS: there are a few other bugs I have found and fixed, I will update the file once I am happy with it :-)

Revision history for this message
leppie (leppie) wrote :

Ok, I think I figure it all out :)

Here is an updated version (with more checks).

- enum-set-indexer now use bitwise-length
- enum-set-difference is now assymetric (result of (bitwise-xor v1 (bitwise-and v1 v2)))
- enum-set-complement now use bitwise-not (result of (bitwise-and (bitwise-not v) u) where u is the universe value).

Cheers

leppie

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote : Re: [Bug 250094] Re: Enumerations are implemented inefficiently

On Jul 24, 2008, at 2:42 AM, leppie wrote:

> The memory leak isn't really a concern (for me anyways), as make-
> enumeration will create a new record every time (I cant gc records
> in my
> case) anyways which is probably more than a few bytes in a hashtable.

Just because one part of the system is broken doesn't mean that
you should implement other parts of the system improperly. One
day you'd have to fix your records GC bug, and you would like to
have enumerations work as a consequence of that (synergistic bug
fixes).

I agree with Michael that no globals (hash tables or otherwise)
should be used to implement enumeration types (or record types
or whatever) unless absolutely required (e.g., non-generative
record types, global libraries, symbols, gensyms, etc.).

Aziz,,,

Revision history for this message
leppie (leppie) wrote :

> One day you'd have to fix your records GC bug

I explained a bit wrong, I cant GC the record-type once created (instances/records are fine, no problem there).

It's a .NET runtime limitation, nothing I can do. I am not really worried about a few bytes 'leaked' due to misuse or abuse. It's good enough for me, there are bigger fish to fry :-)

> I agree with Michael that no globals (hash tables or otherwise)
> should be used to implement enumeration types (or record types
> or whatever) unless absolutely required (e.g., non-generative
> record types, global libraries, symbols, gensyms, etc.).

So you saying I should have a structure like a cons with the car having the integer value and the cdr carrying shared info (hashtable's etc)?

From what I can see this does not involve globals or record-types or gensyms.

Ok, lets try that :)

Cheers

leppie

Revision history for this message
leppie (leppie) wrote :

> a structure like a cons with the car having the integer value and the cdr carrying shared info (hashtable's etc)

I guess it would be better for that structure to be a record-type.

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

On Jul 24, 2008, at 10:52 AM, leppie wrote:

> So you saying I should have a structure like a cons with the car
> having
> the integer value and the cdr carrying shared info (hashtable's etc)?

Yeah. Something like

(define-record-type enum-universe
   (fields id symbols symbol->index-hashtable index->symbol-vector))

(define-record-type enum-set
   (fields bits universe))

Revision history for this message
leppie (leppie) wrote :

Ok, how about this now?

Instead of enum-universe I use a cons, dont need more.

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

On Jul 24, 2008, at 11:37 AM, leppie wrote:

> Instead of enum-universe I use a cons, dont need more.

You don't. :) Probably you can't serialize/deserialize enum-sets
as a consequence.

Revision history for this message
leppie (leppie) wrote :

> Probably you can't serialize/deserialize enum-sets
as a consequence.

At least I dont have a problem with that on .NET :-)

I decided to go for an enum-universe record-type eventually, as I wanted to store the universe-value instead of deriving by counting the symbols.

Updated file.

Cheers

leppie

Revision history for this message
Michael D. Adams (mdmkolbe) wrote :

Minor notes:
 - for enum-set-difference (bitwise-and v1 (bitwise-not v2)) may be more efficient
 - for enum-set-complement (bitwise-not v) should be enough (i.e. you don't need the bitwise-and) since whether values not in u are set or not shouldn't matter.
 - as a consequence of the above the universe value can just be considered -1; there is no need to store it.

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

On Jul 24, 2008, at 12:48 PM, leppie wrote:

> At least I dont have a problem with that on .NET :-)

Are we talking about the same thing? If you compare types
(cons pairs) using eq?, you lose eq?-ness when you serialize
and deserialize.

Revision history for this message
leppie (leppie) wrote :

Michael: Thanks for the notes, I didnt realize that works too :) (I was not taking negative numbers into my equations/thinking :-) )

> Are we talking about the same thing? If you compare types
> (cons pairs) using eq?, you lose eq?-ness when you serialize
> and deserialize.

I dont think we are :) But from what I can see I would lose eq?-ness with records too, as eq? only check pointer equality on records. It could be different, I have not really tested it, only speaking from past experience of this scenario.

So do you in Ikarus, do a pointer 'fixup' on a record to be eq? once deserialized?

Cheers

leppie

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

On Jul 25, 2008, at 12:05 AM, leppie wrote:

> So do you in Ikarus, do a pointer 'fixup' on a record to be eq? once
> deserialized?

No, of course not. I have an id field in the type that can be
compared with eq?.

Revision history for this message
leppie (leppie) wrote :

> No, of course not. I have an id field in the type that can be
> compared with eq?.

Ok, I see (I was wondering what that id field did), still so many options to investigate :-)

Thanks

leppie

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

Fixed in revision 1568. Please do tell me if this makes any difference (better or worse) in performance.

Changed in ikarus:
status: Confirmed → Fix Committed
Revision history for this message
Gwen Weinholt (weinholt) wrote :

It certainly didn't hurt. enum-set-member? seems three times faster, and the constructor is five times faster, in my case. For my particular application this improvement is lost in the noise apparently. I think it would help a bit if (enum-set-member (set-name id) x) didn't have to lookup the index of id at runtime, and if (set-constructor ids ...) became a compile-time constant. But it's difficult to tell how much of an improvement it would be, since Ikarus doesn't have a profiler.
I'm using enum-sets sort of like C programs often use #define's and bitwise operations together to store flags in an integer. I don't see a reason why enum-sets couldn't be similarly efficient when used like that.
It's just a possibility, and maybe it isn't worth it for most use cases.

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

On Aug 5, 2008, at 6:20 AM, Göran Weinholt wrote:

> I think it would help a bit if (enum-set-member (set-name id) x)
> didn't have to lookup the index of id at runtime,

The design of enums does not allow for that directly because (set-
name id) expands to a symbol and that symbol can serve as key in any
enum. For example:

 > (define-enumeration E0 (x y z) make-E0)
 > (E0 x)
x
 > (define-enumeration E1 (q x r) make-E1)
 > (E1 x)
x
 > (enum-set-member? (E0 x) (make-E1 r x))
#t

If (E0 x) returned a special "enum-key" value instead of a symbol,
then the index can be stored in that key. But as we operate on
symbols only, all we can do is perhaps speculate that (E0 x) will be
used in E0 enums. I don't know if that's worth it.

> and if (set-constructor ids ...) became a compile-time constant.

This might be worth investigating.

> I'm using enum-sets sort of like C programs often use #define's and
> bitwise operations together to store flags in an integer. I don't
> see a reason why enum-sets couldn't be similarly efficient when
> used like that.

Well, they have extra baggage that C's enums or #define'd constants
don't carry, and they are run-time constructs. The compile-time
define-enumeration construct does very little to help the compiler as
it simply expands to run time calls. I'll keep it at the back of my
head to see if I can improve it a little. Also keep in mind that
I'm implementing enums in terms of R6RS records that I also have not
gotten around to optimizing yet. Once records become faster, enums
will get a little boost. But we'll have to wait for that too.

Changed in ikarus:
milestone: none → 0.0.4
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.