[DM] ZODB pack optimization

Bug #373622 reported by Andreas Jung on 2009-05-08
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

2. Introduction

Packing large FileStorages can severely interfere with normal processing -- even affecting the whole system with bad ramifications for other applications:

    * The reachability analysis trashes the filesystem by an extreme
          o number of random accesses to the storage file
    * The copying phase reads from an unbuffered file, with decreased IO
          o performance
    * The copying phase holds the commit lock almost during the complete
          o time and only releases it for short periods

Potential effects are bad performance and commit contention.

3. Feature

Implement an idea by Jim Fulton: let the first packing phase (the proper packing) be executed in a separate process with a modified reachability analysis. The modified approach prepares reachability analysis during the pack index determination. During this sequential scan all references outgoing from the scanned state are recorded in memory. The reachability analysis itself is then performed on the recorded information -- without further accesses to the storage file. This almost eliminates random access to the storage file (backpointers may still cause some randomness). This can reduce the time for reachability analysis by up to 90 percent.

Synchronize the copying phase with storage modification by appropriate buffer flushing such that the copying phase can use buffered rather than unbuffered IO.

Let the copying phase usually run without holding the commit lock. The commit lock is acquired only for very short periods to protect the recognition of the phase end.

Packing currently writes the newly determined index at its end. However, any storage modification invalidates the index again. Therefore, writing the index is useless in almost all situations. Do not write the index to shorten the time when packing holds both the commit and storage lock.

Improve logging of the packing process to better recognized further optimization potential.

Implementing all these measures reduced packing time for Haufe from about 10 hours to about 2 hours.

4. Risks


      Versions are no longer supported
    * Packing may temporarily need more main memory
    * When incremental index determination is implemented,
          o the index written at the end of packing may become useful (but then, it could be introduced again).

      The new reachability analysis abuses fsTree
          o which can lead to size restrictions (we have only 6 rather than 8 bytes).

Andreas Jung (ajung) wrote :
Jim Fulton (jim-zope) wrote :

Some notes:

zc.FileStorage implemented the gc strategy described here. Unfortunately, the memory requirements turned out to be unacceptable. I'm trying a different strategy with the multi-db gc I'm working on. Keep an eye on zc.zodbdgc.

Another problem with packing is that it wrecks disk cache. zc.FileStorage has a strategy to avoid using much disk cache while scanning on linux.

Jim Fulton (jim-zope) wrote :
Changed in zodb:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers