DTMLMethod KeyError: 'validate'

Bug #143905 reported by John Eikenberry
2
Affects Status Importance Assigned to Milestone
Zope 2
Invalid
Medium
Unassigned

Bug Description

First the workaround/fix. Up your object cache-size in your zope.conf. It goes
in the <zodb_db main> section. Upping it from the default 5000 to 8000 fixed it
on our site.

The traceback. This is basically the problem of the temporary instance
attribute 'validate' disappearing midway through the DTMLMethod's __call__
method.

Traceback (innermost last):
  Module ZPublisher.Publish, line 115, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 41, in call_object
  Module OFS.DTMLMethod, line 153, in __call__
   - <DTMLMethod at /www_org/ContentRoot/index_html used for /www_org/ContentRoot/members/user_tools>
   - URL: http://client.base.kavi.com/index_html/manage_main [^]
   - Physical Path: /www_org/ContentRoot/index_html
KeyError: 'validate'

Short version:

It seems the ZODB object cache is flushing/ghosting the DTMLMethod from the
cache while it is running. In doing so, clearing the 'validate' entry from its
__dict__. So when it resumes it un-ghostifies the object and continues its
execution, though with a fresh __dict__.

Long version:

The error occurs on a page (user_tools) that includes summaries of from our
different applications. Including one that makes extensive use of a ZCatalog
and its getObject() method to evaluate _many_ objects for some criteria.

To reproduce the error it was necessary to hit the page in question with 2
mostly simultaneous requests. I'll refer to them as thread A and B below.
Thread A will be the thread in which the error occurs while thread B will
trigger the problem.

In tracking the error down I first followed Thread A. It would make it to the
zcatalog using summary noted above and release its thread lock while calling
getObject(). Didn't matter which object it was getting, the main point was that
this was releasing the lock for thread B to run in the middle of
DTMLMethod.__call__(). This is also important as the page in question is
pulling in a lot of objects and filling the object cache (important below).

So thread B then kicks off. Watching thread A's DTMLMethod instance __dict__
for the disappearing 'validate' key I saw it was gone almost immediately. I
followed it, starting with the first traceback entry, through to where it
bobo_traversed the ZApplicationWrapper which opened a new database connection.
In DB.open() it calls cacheGC() on each connection object in the pool which in
turn calls incrgc() on the PickleCache object for that connection.

Reading through the PickleCache code it is seems that this incrgc() method is
responsible for flushing/ghosting objects in the object cache.

At this point I came up with my theory about the ghosting/un-ghosting of the
in-use object in the cache. Tested with the larger object cache and it worked.
I also ran additional tests with a smaller cache which made the bug much easier
to reproduce (it was a bit of a pain originally).

This combination of ghosting in-use objects and the DTMLMethod's temporary
'validate' attribute seems bug-like to me. Ideally the PickleCache wouldn't
ghost in-use objects but it looks like the code already does what it can with
the persistence flags. The 'validate' attibute is specifically setup to avoid
persistence. So that leaves reworking DTMLMethod.__call__() to eliminate that
temporary 'validate' attribute.

Tags: bug zope
Revision history for this message
Tres Seaver (tseaver) wrote :

This bug is due to a funky micro-optimization in DTMLMethod,
which I don't even want to try to fix at this point: I'd rather
rip it out and think about optimizing the whole thing differently.

Does the attached patch make the problem go away? I'm sorry
to say there are no unit tests for OFS/DTMLMethod.py, wo I don't
know for sure that I haven't borked something, and I don't
have a DTML-based application to test with.

This is essentially a reversion of the two middle stanzas of:

  http://svn.zope.org/Zope/trunk/lib/python/OFS/DTMLMethod.py?rev=21236&r1=20616&r2=21236

Potentially, we should be reverting the third stanza as well; I don't konw.

Revision history for this message
Tres Seaver (tseaver) wrote :

Uploaded: issue_2312-experimental.patch

Oops, forgot the patch file, made against the current tip of
the 2.9 branch.

Revision history for this message
John Eikenberry (jae) wrote :

Yes. That fixes it. It makes sense and I ran several tests with that change in place.

Interesting that objects can go to/from ghost state within a running transaction. Hadn't considered that before.

Revision history for this message
Tres Seaver (tseaver) wrote :

> = Comment - Entry #4 by eikenberry on Apr 17, 2007 6:27 pm
>
> Yes. That fixes it. It makes sense and I ran several tests with that
> change in place.
>
> Interesting that objects can go to/from ghost state within a running
> transaction. Hadn't considered that before.\

Yep. The docs talk about a "Sticky" state for persistent objects,
which prevents ghostification; however, it seems that only C
code tan reliably put an object into that state. It did look
to mer for a while as thosugh setting

  self._p_changed = 0

would have that effect, but I have no confidence in that notion.

Revision history for this message
Tres Seaver (tseaver) wrote :

An alternative fix relies on landing Dieter's '_p_sticky'
patch, as supplied in:

  http://www.zope.org/Collectors/Zope/1596/

That patch is against ZODB 3.2, which makes it pretty hard
to use against any "modern" Zope.

Revision history for this message
Tres Seaver (tseaver) wrote :

Ripping out the micro-optimization still seems the right thing to do. Without tests for the module, however, it is too risky to try (and I don't have time to add proper tests).

Changed in zope2:
status: New → Confirmed
Revision history for this message
Colin Watson (cjwatson) wrote :

The zope2 project on Launchpad has been archived at the request of the Zope developers (see https://answers.launchpad.net/launchpad/+question/683589 and https://answers.launchpad.net/launchpad/+question/685285). If this bug is still relevant, please refile it at https://github.com/zopefoundation/zope2.

Changed in zope2:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.