Bug #727615 “wanted: nicer hash-table thread safety policy” : Bugs : SBCL

Nikodemus Siivola (nikodemus) on 2011-03-02

description:

updated

Revision history for this message

Attila Lendvai (attila-lendvai) wrote on 2011-03-02:

#1

if concurrent access can cause heap corruption and remote (in time) crashes (as opposed to lost/duplicate entires), then ignore the rest.

but i'd say that if the overhead is some +50% or more then please don't make hashtables threadsafe by default.

my reasoning: there are zillions of thread safety issues and concurrent access to hashtables is just one of them. making them fail-safe will not help much, while it gives a performance penalty to all the other code that was written with threads in mind.

there already is a compile time :sb-hash-table-debug to get an sbcl that catches concurrent access for some performance penalty on each access in return.

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2011-03-02:

#2

Heap corruptions should not occur as a direct consequence, but the hash-table structure itself may be corrupted -- which may violate assumptions elsewhere in code, causing other essentially unpredictable issues.

I'll keep performance in mind, and we'll see what the cost turns out to be.

In worst case maybe there's call for *hash-table-synchronized-default* or similar.

Revision history for this message

James Y Knight (foom) wrote on 2011-03-02:

#3

This sounds like a *very* bad idea. Synchronizing hash-table access does not magically make your code thread-safe.

Do you have examples of real-life code which would be correct except for hashtables being unsynchronized? (it's easy to construct example cases, of course)

Java made that mistake (java.lang.Hashtable), and then fixed it (java.util.HashMap).

A global variable to modify the behavior sounds even worse...

Revision history for this message

Roman Marynchak (roman-marynchak) wrote on 2011-03-02:

#4

Well, the idea itself is not so bad. Some guys have already implemented it: http://www.lispworks.com/documentation/lw60/LW/html/lw-608.htm

In case there is a possibility to do something like this without serious performance issues... It is a nice feature. It should be very helpful when you have a ton of the legacy code which should be adapted to use SMP. I work with one CL system which has around 1M LOC, and I have seen a lot of very complex synchronization issues. Synchronized data structures allow better SMP problems analysis, and it is critical for large systems and large teams (teams which do not have he-knows-it-all-guy who can fix any SMP-related system bug in a week). When you have 50 hashtables and 5-10 working threads which access them with quite complex logic behind the scene, every smallest compiler feature towards SMP makes your life significantly easier. At least, it may help to debug the problem (Though, I don't say that it solves the issues magically).

Regarding the real-life example, imagine a hash table which holds a structure called 'relation' as a value, and a relation name as a key. The relation is something like '(A RELATED-TO B), and a name is a symbol. There are 10 rules, each of them running in its own thread, which are interested in this relation. Two of them can change it, other eight can only read it. All rules are launched every 1 second, they execute during 50ms and then sleep the rest time waiting for the next launch. In case hash-table is thread-safe, this stuff works perfectly, just because the maximum action delay upon a _final_ relation change is 1 second, which is not a problem (relation changes are allowed to overwrite one another without any action). But in case a table is not thread-safe, and a relation value becomes garbage, we are likely to end up in the debugger.

I can tell you about more examples, but anyway I don't push this feature very hard, I just think that it is _quite reasonable_.

In case we fight for the ultimate performance, just make this feature to be a build option. But don't throw it out all at once.

Well, the idea itself is not so bad. Some guys have already implemented it: http://www.lispworks.com/documentation/lw60/LW/html/lw-608.htm

In case there is a possibility to do something like this without serious performance issues... It is a nice feature. It should be very helpful when you have a ton of the legacy code which should be adapted to use SMP. I work with one CL system which has around 1M LOC, and I have seen a lot of very complex synchronization issues. Synchronized data structures allow better SMP problems analysis, and it is critical for large systems and large teams (teams which do not have he-knows-it-all-guy who can fix any SMP-related system bug in a week). When you have 50 hashtables and 5-10 working threads which access them with quite complex logic behind the scene, every smallest compiler feature towards SMP makes your life significantly easier. At least, it may help to debug the problem (Though, I don't say that it solves the issues magically).

Regarding the real-life example, imagine a hash table which holds a structure called 'relation' as a value, and a relation name as a key. The relation is something like '(A RELATED-TO B), and a name is a symbol. There are 10 rules, each of them running in its own thread, which are interested in this relation. Two of them can change it, other eight can only read it. All rules are launched every 1 second, they execute during 50ms and then sleep the rest time waiting for the next launch. In case hash-table is thread-safe, this stuff works perfectly, just because the maximum action delay upon a _final_ relation change is 1 second, which is not a problem (relation changes are allowed to overwrite one another without any action). But in case a table is not thread-safe, and a relation value becomes garbage, we are likely to end up in the debugger.

I can tell you about more examples, but anyway I don't push this feature very hard, I just think that it is _quite reasonable_.

In case we fight for the ultimate performance, just make this feature to be a build option. But don't throw it out all at once.

Revision history for this message

Juho Snellman (jsnell+bugs) wrote on 2011-03-02: Re: [Bug 727615] Re: wanted: nicer hash-table thread safety policy

#5

On Wed, Mar 2, 2011 at 11:12 PM, Roman Marynchak
<email address hidden>wrote:

> Regarding the real-life example, imagine a hash table which holds a

structure called 'relation' as a value, and a relation name as a key.
> The relation is something like '(A RELATED-TO B), and a name is a
> symbol. There are 10 rules, each of them running in its own thread,
> which are interested in this relation. Two of them can change it, other
> eight can only read it. All rules are launched every 1 second, they
> execute during 50ms and then sleep the rest time waiting for the next
> launch. In case hash-table is thread-safe, this stuff works perfectly,
> just because the maximum action delay upon a _final_ relation change is
> 1 second, which is not a problem (relation changes are allowed to
> overwrite one another without any action). But in case a table is not
> thread-safe, and a relation value becomes garbage, we are likely to end
> up in the debugger.
>

Just because it's possible to come up with a convoluted situation where
synchronized hash-tables would actually provide a useful guarantee assuming
very strict timing discipline, it doesn't mean that they'd be useful
generally. I maintain that in the real world the synchronization protocol
for a hash table is very often going to be a more complicated than just a
lock on the primitive operation. So the effect of synchronization by default
is:

- Give a false sense of security for users who don't really understand safe
concurrent programming.
- Performance loss for users who have to do their own synchronization
anyway.
- Performance loss for users who don't need any synchronization.

I think the only principled defaults are either "just don't corrupt the
whole image" or "detect unsafe concurrent accesses and signal an error".

--
Juho Snellman

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2011-03-03:

#6

Any comments re. point 2?

Revision history for this message

Gábor Melis (melisgl) wrote on 2011-03-03:

#7

Nikodemus Siivola <email address hidden> writes:

> Any comments re. point 2?

For the record, I agree with James and Juho on point 1.

On point 2, yes, it would be a nice property but at what cost?

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2011-03-03:

#8

A few of different strategies for #2:

- Read/write lock -- block writers during iteration.

- Pushing writes onto an update-list if iteration is happening, and having the iterator actually apply the updates.

- A mostly-lockless synchronized table can be iterated safely while it is being updated.

Revision history for this message

Nikodemus Siivola (nikodemus) wrote on 2011-03-12:

#9

Additional bits on the agenda:

* Printing a hash-table should show if its synchronized or not.

* Isolate the most common failure modes of unsynchronized hash-table accesses and tweak the errors that occur to state the likely cause.

Revision history for this message

Douglas Katzman (dougk) wrote on 2020-12-01:

#10

Hash-tables are never one-size-fits-all.
If our tables lack anything, it's the ability for users to makes tables that have some but not all address-sensitive keys, the way EQL tables do; and/or write their own flavor of weak table.

As to implicit synchronization, it's bad enough that we've already promised-but-not-really that multiple concurrent readers on *unsynchronized* tables are thread-safe in the absence of any writer.

For many use-cases, nothing matters more than raw performance, which is hampered by the fact that we've essentially made all hash-table locks recursive, and told people that they can also wrap WITH-LOCKED-HASH-TABLE around any operation whether it does its own internal locking or not, and that our mutexes (recursive and not) entail a _RIDICULOUS_ amount of overhead.
(i.e. read/modify/write patterns are already suspect, because the internal lock is never enough; and the locking macro adds overhead on top of overhead- multiple nested UNWIND-PROTECTS, e.g.)
This picture isn't going to get any better by adding more and more in the way of concurrency guarantees.

Trying to fix all these is, as I discovered, like trying to un-ring a bell.
I believe, as do the other commenters, that either the user needs to understand concurrent programming, or not engage in it.

All that said, the current table design catches a few more glitches than before, and for lack of clear direction, and the overwhelming number of bugs we'll never get to, this bug is won't-fix.

Changed in sbcl:
status:	Triaged → Won't Fix

SBCL

wanted: nicer hash-table thread safety policy

Bug Description

Other bug subscribers

Remote bug watches