_p_deactivate must honor first call

Bug #185066 reported by Jacob Holm
2
Affects Status Importance Assigned to Milestone
ZODB
Fix Released
Medium
Unassigned

Bug Description

I am not sure if this is a documentation bug or a code bug, but this recently cost me half a day to figure out.

The _p_deactivate method can be overridden in subclasses of persistent to control the "optimistic ghostification" of objects. The current documentation in persistent.interfaces says:

        """Deactivate the object.

        Possibly change an object in the saved state to the
        ghost state. It may not be possible to make some persistent
        objects ghosts, and, for optimization reasons, the implementation
        may choose to keep an object in the saved state.
        """

What it fails to mention is that _p_deactivate is called when initially creating a ghost in order to set the _p_changed attribute to None. (ZODB/serialize.py:511)
In this call, the state has not yet been set so if the call is ignored, _p_changed remains False. The end result is that the state is never loaded.

Jim Fulton (jim-zope)
Changed in zodb:
importance: Undecided → High
Revision history for this message
Jim Fulton (jim-zope) wrote :

It looks like this has been fixed on the trunk and on the 3.8 branch. At least, there's no _p_deactivate call on line 511 of serialize and there is an explict assignment of _p_changed to None.

Could you please verify the fix or provide more information?

Changed in zodb:
status: New → Incomplete
Revision history for this message
Jacob Holm (jacobholm) wrote :

AFAICT the bug is still there in the trunk (I haven't checked the 3.8 branch), but I may have been unclear about what is actually happening... I'll try again.

The actual call to _p_deactivate happens in persistent/cPersistence.c:903-906, but is caused by the assignment "obj._p_changed = None" in ZODB/serialize.py:519.

The problem is that instances of a class like

  class Foo(persistent.Persistent):
    def _p_deactivate(self): pass

will stay in the UPTODATE state in spite of the "obj._p_changed = None" call. That again means the state of such an object will never be loaded from the database.

A possible fix would be to replace the line in ZODB/serialize.py with

  Persistent._p_deactivate(obj)

This would ensure that the object is properly loaded before the custom _p_deactivate has a chance to screw things up...

Jim Fulton (jim-zope)
Changed in zodb:
status: Incomplete → Confirmed
Revision history for this message
Jim Fulton (jim-zope) wrote :

OK, I just committed a fix to the trunk.

Changed in zodb:
status: Confirmed → Fix Committed
Revision history for this message
Jacob Holm (jacobholm) wrote :

Your change fixes the immediate problem, but I am a bit concerned that custom _p_deactivate and (with the fix) _p_invalidate methods are called on objects marked UPTODATE, but whose state hasn't been loaded. What that means is that any nontrivial custom _p_deactivate/_p_invalidate needs to check whether the state is valid (is there even a reliable way to test that?).

To illustrate:

  class Foo(persistent.Persistent):
    bar = 0
    def __init__(self):
      self.bar = 1
    def _p_deactivate(self):
      assert self.bar

Will fail the assertion when the object is loaded.

Bypassing the custom _p_deactivate as I suggested solves this problem (I think).

Revision history for this message
Jim Fulton (jim-zope) wrote :

I'm backing out my fix because, as you point out, it doesn't go far enough. Calling the base deactivate isn't appropriate because there are some types, especially persistent classes, that can't be deactivated.

The current implementation badly mixes up _p_deactivate and _p_invalidate. Worse, there are some persistent objects that seem to rely, somewhat, on the default implementation of _p_invalidate calling _p_deactivate. :( I think this can be sorted out, but doing so is too risky for 3.9.

Changed in zodb:
importance: High → Medium
status: Fix Committed → Confirmed
Revision history for this message
Jacob Holm (jacobholm) wrote :

If by persistent classes you mean instances of ZODB.persistentclass.PersistentMetaClass, I don't think they are relevant here, because:

a) PersistentMetaClass does not subclass Persistent
b) PersistentMetaClass has a __getnewargs__ method

Each of these (I think) independently ensures that the "obj" in the "obj._p_changed = None" line in ZODB.serialize.ObjectReader.load_persistent is not a persistent class.

Thinking a bit more about it, I am convinced that the object must be either deactivated (ghostified) or fully loaded before calling *any* custom code. Forcing deactivation of an instance of Persistent (without calling custom code) can be done by calling "Persistent._p_deactivate(obj)".

Once that is done, we *could* call either "obj._p_deactivate()" or "obj._p_invalidate()" to give the object a chance to load itself if needed. However, this may have the unintended side-effect of loading, then immediately discarding the object. (See e.g. ZODB.blob.Blob._p_deactivate for the type of code that would be affected.) Also, I think it is unnecessary because the automatic activation on access should take care of things nicely. If the object wants to control whether it can become a ghost again later, fine! I don't see any reason to prevent loading the object as a ghost initially. If there *is* such a reason, that use case would be better handled by a new _p_maybeactivate method.

I agree that _p_deactivate and _p_invalidate are probably entangled a bit too much atm, but I see that as a separate issue. Hopefully we can get the current one fixed before 3.9.

Revision history for this message
Jim Fulton (jim-zope) wrote : Re: [Bug 185066] Re: _p_deactivate must honor first call

On Wed, Aug 26, 2009 at 2:34 PM, Jacob Holm<email address hidden> wrote:
> If by persistent classes you mean instances of
> ZODB.persistentclass.PersistentMetaClass, I don't think they are
> relevant here, because:
>
> a) PersistentMetaClass does not subclass Persistent
> b) PersistentMetaClass has a __getnewargs__ method
>
> Each of these (I think) independently ensures that the "obj" in the
> "obj._p_changed = None" line in
> ZODB.serialize.ObjectReader.load_persistent is not a persistent class.

a. That probably has the same bug.
b. You also need to worry about Connection.get, which also creates ghosts.

PersistentClasses are just one example. The concept of
non-ghostifiable objects is broader.

> Thinking a bit more about it, I am convinced that the object must be
> either deactivated (ghostified) or fully loaded before calling *any*
> custom code.

You might be right, although it's not clear that calling custom code
in the ghost state is correct either. _p_deactivate and _p_invalidate
should be noops in the ghost state.

>  Forcing deactivation of an instance of Persistent (without
> calling custom code) can be done by calling
> "Persistent._p_deactivate(obj)".

But that has other side effects that could break assumptions of the
object implementation. It doesn't make sense to provide the ability
to override _p_deactivate and then go around the overriden
implementation.

What's probably needed is a way to mark an object as being a ghost
independent of _p_deactivate or _p_invalidate. This indicates an
additional API or possibly meta-class tricks.

...

> I agree that _p_deactivate and _p_invalidate are probably entangled a
> bit too much atm, but I see that as a separate issue.

It's a separate issue if we think that invalidation isn't the right
model for initializing ghosts.

> Hopefully we can
> get the current one fixed before 3.9.

No, there are too many issues that need to be worked out and too much
opportunity for breakage.

Jim

--
Jim Fulton

Revision history for this message
Jacob Holm (jacobholm) wrote :
Download full text (3.5 KiB)

Jim Fulton wrote:
>> Each of these (I think) independently ensures that the "obj" in the
>> "obj._p_changed = None" line in
>> ZODB.serialize.ObjectReader.load_persistent is not a persistent class.
>
> a. That probably has the same bug.

I don't understand what you are saying here...

> b. You also need to worry about Connection.get, which also creates ghosts.

Correct. I hadn't noticed that one.

> PersistentClasses are just one example. The concept of
> non-ghostifiable objects is broader.

I understand the use for objects that *once fully loaded* will not again
become a ghost. What you claim is a use for objects that skips the
ghost state entirely and always perform a full load. Can you point me
to an example or a discussion of that use case? I am guessing that
PersistentClasses is one such example, but don't (yet) understand why.

> ...
>
> What's probably needed is a way to mark an object as being a ghost
> independent of _p_deactivate or _p_invalidate. This indicates an
> additional API or possibly meta-class tricks.
>

How about a new method "_p_initghost" (feel free to suggest a better
name if you care), whose default implementation in persistent.Persistent
would be equivalent to the current _p_deactivate. Objects that want to
avoid the ghost state can reimplement it to load the correct state
instead, e.g.:

   def _p_initghost(self):
       # NOTE: The documentation for IPersistentDataManager says that
       # setstate should only be called for objects in the ghost state,
       # but it seems to work anyway. My guess is that it is just a
       # warning against throwing away the current data, which should
       # be OK here because we don't *have* any current data. If I'm
       # wrong, we need another way to load the data...
       self._p_jar.setstate(self)

_p_initghost has well-defined input (whatever klass.__new__(klass, ...)
returns, with _p_oid and _p_jar assigned) and output (either a proper
ghost or a fully loaded object). For objects without a _p_initghost
method, we can fall back to _p_deactivate:

   obj._p_oid = oid
   obj._p_jar = jar
   try:
       meth = obj._p_initghost
   except AttributeError:
       # BBB for non-instances of Persistent
       meth = obj._p_deactivate
   meth()

Code that relies on the current behavior of calling a custom
_p_deactivate while creating a ghost can be made to work in the same in
old and new versions by adding a different override:

   def _p_initghost(self):
       # WARNING, this reintroduces bug 185066 for this class!!!
       self._p_deactivate()

The _p_initghost method could also be defined to take the oid and jar as
arguments and set them on the object. This would allow some additional
error checking in _p_initghost (the call is only valid if _p_oid and
_p_jar are None), but would also add a small extra burden on all
implementations.

> ...
>
>> I agree that _p_deactivate and _p_invalidate are probably entangled a
>> bit too much atm, but I see that as a separate issue.
>
> It's a separate issue if we think that invalidation isn't the right
> model for initializing ghosts.

I don't think invalidation is the right model, because _p_inva...

Read more...

Revision history for this message
Jim Fulton (jim-zope) wrote :

On Fri, Aug 28, 2009 at 11:05 AM, Jacob Holm<email address hidden> wrote:
...
>>> Hopefully we can
>>> get the current one fixed before 3.9.
>>
>> No, there are too many issues that need to be worked out and too much
>> opportunity for breakage.
>
> I see.  Just curious, what is the expected timeline for 3.10?

I have no idea. Hopefully, there won't be nearly as long a delay
between 3.9 and 3.10 as there was between 3.8 and 3.9. I had a lot of
work that had to be done in 3.9 that is now behind us.

I'll try to deal with this issue early in the process. If you don't
see any action in a few weeks, you can bug me if you want. I'm gonna
need a break from ZODB work for a while. :)

Jim

--
Jim Fulton

Revision history for this message
Jim Fulton (jim-zope) wrote :
Changed in zodb:
status: Confirmed → Fix Committed
Jim Fulton (jim-zope)
Changed in zodb:
status: Fix Committed → Fix Released
milestone: none → 3.10.0a1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.