[DM] ZODB pack optimization

Bug #373622 reported by Andreas Jung
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ZODB
Won't Fix
Undecided
Unassigned

Bug Description

2. Introduction

Packing large FileStorages can severely interfere with normal processing -- even affecting the whole system with bad ramifications for other applications:

    * The reachability analysis trashes the filesystem by an extreme
          o number of random accesses to the storage file
    * The copying phase reads from an unbuffered file, with decreased IO
          o performance
    * The copying phase holds the commit lock almost during the complete
          o time and only releases it for short periods

Potential effects are bad performance and commit contention.

3. Feature

Implement an idea by Jim Fulton: let the first packing phase (the proper packing) be executed in a separate process with a modified reachability analysis. The modified approach prepares reachability analysis during the pack index determination. During this sequential scan all references outgoing from the scanned state are recorded in memory. The reachability analysis itself is then performed on the recorded information -- without further accesses to the storage file. This almost eliminates random access to the storage file (backpointers may still cause some randomness). This can reduce the time for reachability analysis by up to 90 percent.

Synchronize the copying phase with storage modification by appropriate buffer flushing such that the copying phase can use buffered rather than unbuffered IO.

Let the copying phase usually run without holding the commit lock. The commit lock is acquired only for very short periods to protect the recognition of the phase end.

Packing currently writes the newly determined index at its end. However, any storage modification invalidates the index again. Therefore, writing the index is useless in almost all situations. Do not write the index to shorten the time when packing holds both the commit and storage lock.

Improve logging of the packing process to better recognized further optimization potential.

Implementing all these measures reduced packing time for Haufe from about 10 hours to about 2 hours.

4. Risks

    *

      Versions are no longer supported
    * Packing may temporarily need more main memory
    * When incremental index determination is implemented,
          o the index written at the end of packing may become useful (but then, it could be introduced again).
    *

      The new reachability analysis abuses fsTree
          o which can lead to size restrictions (we have only 6 rather than 8 bytes).

Revision history for this message
Andreas Jung (ajung) wrote :
Revision history for this message
Jim Fulton (jim-zope) wrote :

Some notes:

zc.FileStorage implemented the gc strategy described here. Unfortunately, the memory requirements turned out to be unacceptable. I'm trying a different strategy with the multi-db gc I'm working on. Keep an eye on zc.zodbdgc.

Another problem with packing is that it wrecks disk cache. zc.FileStorage has a strategy to avoid using much disk cache while scanning on linux.

Revision history for this message
Jim Fulton (jim-zope) wrote :
Changed in zodb:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.