Reference counting changes cause Pushy to hang

Bug #738188 reported by Andrew Wilkins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Pushy
Fix Released
Critical
Andrew Wilkins

Bug Description

The changes made for Bug #733026 appear to cause a hang in Pushy.

Related branches

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Finally tracked it down... this is caused by a race condition wherein the server refers to an object that the client has deleted, but the server hasn't yet received notification about. The culprit is in the BaseConnection class: the code for waiting on "pending proxies".

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've looked into this a little bit further, and found what is going on. Take for example the following program:
    r_os = conn.modules.os
    for i in range(100):
        d = r_os.__dict__ #(1)
        print len(d.items()) #(2)

What occasionally will happen is this: (1) will incur a "getattr" message, which returns a __dict__. The __dict__ is marshalled with its items. Before the items are unmarshalled in one iteration, they are garbage collected from the previous iteration.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

The plan is currently to defer deletion of objects while there are pending unmarshalled messages. This *may* be done by having BaseConnection.delete and BaseConnection.__recv hold a lock, and transmit a set of "references" (object IDs) that will be used to pin an object.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

The above idea wouldn't work, since the object's not available after deletion (using weakref.ref with a callback, not __del__). A better idea (one that should actually be workable) is to have a version attached to objects. Whenever the owner of the object marshals the object, it increments the version by one. When the proxy is deleted, it sends a "delete" message to the owner with the version. If the version matches the latest version, the object is really deleted. Otherwise, the object is resent, resetting the version to zero.

I'm working on this (slowly - pesky day job). It's a bit rough at the moment, and awfully slow (2x slower than the 0.4 release), so I don't expect to be doing much else for the next release.

Changed in pushy:
status: Confirmed → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've more or less got this working. I know of just one last problem to iron out.

A decision I took a long time ago is really biting me, which is working around Windows' MSVCRT standard I/O locking behaviour. This is the reason why requests are never written at the same time as responses are read. I may have to revisit this; perhaps using ReadFile/WriteFile (like in the smb transport) could obviate this problem.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

There were some more race conditions that I hadn't handled, and the solution I had was getting more and more unwieldy. I'm going to have to rethink this. It may mean doing away with "references" in marshalling, and always passing an ID/opmask/args tuple, so the receiver can create a new proxy if the old one is deleted in the mean time.

One thing I'm still not sure about is how to notify the originator of an object that it can release the proxied object. We need it to keep a reference for as long as the client may refer to it, so the originator can use the original object, not a proxy of a proxy. I'll have to think about this some more.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

If anyone is watching this: I am working on this (slowly), but life in the Real World is pretty busy right now. Also, things are going to get more complicated with this solution.

The approach I'm looking into does away with assumptions in the server about the client having knowledge about the object being returned. Previously, if you asked for the same object twice, the second response would be a light-weight reference to the first result. Now we will marshal the object every time. I have tested it, and this is not overly expensive, IF another change is made: removal of dict/list element marshalling.

Previously, as a sort of weak way of supporting dir(), tuple(), etc. (i.e. builtin functions which interrogate internals in CPython) on proxied objects, Pushy would marshal the contents of a dict/list/set. This works in very simple use cases, but falls down when, for example, you make additional changes to the remote container after marshalling it for the first time.

Rather than try to fix this -- I think it's a lost cause without requesting changes to the Python interpreter -- I would rather implement specific work-arounds for common use cases, and otherwise document unsupported usage. Anyway, that's a digression... I'll deal with that after this defect is fixed.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

The Python implementation is fixed. I will make the same changes in the Java code before closing this.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Done. (Hooray!)

Changed in pushy:
status: In Progress → Fix Committed
Revision history for this message
Andrew Wilkins (axwalk) wrote :

For any readers: garbage collection of proxies will be enabled by default, and the peer will be notified of deletions en-masse every 5 seconds by default. These two settings are configurable. The API docs will be updated accordingly.

Andrew Wilkins (axwalk)
Changed in pushy:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.