wanted: nicer hash-table thread safety policy

Bug #727615 reported by Nikodemus Siivola on 2011-03-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

While our current policy is not unreasonable, it makes doing something elementary -- eliminating strictly unsafe hash-table accesses from a codebase far harder than it should be.

I propose the following:

1. Hash-tables should be synchronized by default, so that portable library code has a prayer of doing the right thing. Code that cannot pay the overhead of synchronization can still request unsychronized tables explicitly.

2. Synchronized hash-tables need to be safe to modify while iteration is happening in another thread.

WITH-LOCKED-HASH-TABLE is still needed to make interning into a hash-table work right

  (with-locked-hash-table (table)
    (or (gethash key table)
        (setf (gethash key table) value)))

but that is a user code correctness issue, not a "will this break horribly"-issue as the above points.

description: updated
Attila Lendvai (attila-lendvai) wrote :

if concurrent access can cause heap corruption and remote (in time) crashes (as opposed to lost/duplicate entires), then ignore the rest.

but i'd say that if the overhead is some +50% or more then please don't make hashtables threadsafe by default.

my reasoning: there are zillions of thread safety issues and concurrent access to hashtables is just one of them. making them fail-safe will not help much, while it gives a performance penalty to all the other code that was written with threads in mind.

there already is a compile time :sb-hash-table-debug to get an sbcl that catches concurrent access for some performance penalty on each access in return.

Nikodemus Siivola (nikodemus) wrote :

Heap corruptions should not occur as a direct consequence, but the hash-table structure itself may be corrupted -- which may violate assumptions elsewhere in code, causing other essentially unpredictable issues.

I'll keep performance in mind, and we'll see what the cost turns out to be.

In worst case maybe there's call for *hash-table-synchronized-default* or similar.

James Y Knight (foom) wrote :

This sounds like a *very* bad idea. Synchronizing hash-table access does not magically make your code thread-safe.

Do you have examples of real-life code which would be correct except for hashtables being unsynchronized? (it's easy to construct example cases, of course)

Java made that mistake (java.lang.Hashtable), and then fixed it (java.util.HashMap).

A global variable to modify the behavior sounds even worse...

Well, the idea itself is not so bad. Some guys have already implemented it: http://www.lispworks.com/documentation/lw60/LW/html/lw-608.htm

In case there is a possibility to do something like this without serious performance issues... It is a nice feature. It should be very helpful when you have a ton of the legacy code which should be adapted to use SMP. I work with one CL system which has around 1M LOC, and I have seen a lot of very complex synchronization issues. Synchronized data structures allow better SMP problems analysis, and it is critical for large systems and large teams (teams which do not have he-knows-it-all-guy who can fix any SMP-related system bug in a week). When you have 50 hashtables and 5-10 working threads which access them with quite complex logic behind the scene, every smallest compiler feature towards SMP makes your life significantly easier. At least, it may help to debug the problem (Though, I don't say that it solves the issues magically).

Regarding the real-life example, imagine a hash table which holds a structure called 'relation' as a value, and a relation name as a key. The relation is something like '(A RELATED-TO B), and a name is a symbol. There are 10 rules, each of them running in its own thread, which are interested in this relation. Two of them can change it, other eight can only read it. All rules are launched every 1 second, they execute during 50ms and then sleep the rest time waiting for the next launch. In case hash-table is thread-safe, this stuff works perfectly, just because the maximum action delay upon a _final_ relation change is 1 second, which is not a problem (relation changes are allowed to overwrite one another without any action). But in case a table is not thread-safe, and a relation value becomes garbage, we are likely to end up in the debugger.

I can tell you about more examples, but anyway I don't push this feature very hard, I just think that it is _quite reasonable_.

In case we fight for the ultimate performance, just make this feature to be a build option. But don't throw it out all at once.

On Wed, Mar 2, 2011 at 11:12 PM, Roman Marynchak
<email address hidden>wrote:

> Regarding the real-life example, imagine a hash table which holds a

structure called 'relation' as a value, and a relation name as a key.
> The relation is something like '(A RELATED-TO B), and a name is a
> symbol. There are 10 rules, each of them running in its own thread,
> which are interested in this relation. Two of them can change it, other
> eight can only read it. All rules are launched every 1 second, they
> execute during 50ms and then sleep the rest time waiting for the next
> launch. In case hash-table is thread-safe, this stuff works perfectly,
> just because the maximum action delay upon a _final_ relation change is
> 1 second, which is not a problem (relation changes are allowed to
> overwrite one another without any action). But in case a table is not
> thread-safe, and a relation value becomes garbage, we are likely to end
> up in the debugger.
>

Just because it's possible to come up with a convoluted situation where
synchronized hash-tables would actually provide a useful guarantee assuming
very strict timing discipline, it doesn't mean that they'd be useful
generally. I maintain that in the real world the synchronization protocol
for a hash table is very often going to be a more complicated than just a
lock on the primitive operation. So the effect of synchronization by default
is:

- Give a false sense of security for users who don't really understand safe
concurrent programming.
- Performance loss for users who have to do their own synchronization
anyway.
- Performance loss for users who don't need any synchronization.

I think the only principled defaults are either "just don't corrupt the
whole image" or "detect unsafe concurrent accesses and signal an error".

--
Juho Snellman

Nikodemus Siivola (nikodemus) wrote :

Any comments re. point 2?

Gábor Melis (melisgl) wrote :

Nikodemus Siivola <email address hidden> writes:

> Any comments re. point 2?

For the record, I agree with James and Juho on point 1.

On point 2, yes, it would be a nice property but at what cost?

Nikodemus Siivola (nikodemus) wrote :

A few of different strategies for #2:

- Read/write lock -- block writers during iteration.

- Pushing writes onto an update-list if iteration is happening, and having the iterator actually apply the updates.

- A mostly-lockless synchronized table can be iterated safely while it is being updated.

Nikodemus Siivola (nikodemus) wrote :

Additional bits on the agenda:

* Printing a hash-table should show if its synchronized or not.

* Isolate the most common failure modes of unsynchronized hash-table accesses and tweak the errors that occur to state the likely cause.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers