Allow purging old transaction log entries

Bug #1000726 reported by John A Meinel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
U1DB
Confirmed
Medium
Unassigned

Bug Description

We want to avoid having any data that must be maintained forever that is not going to be actively used.

One of the O(history) tables today is the transaction_log table. It should be possible to remove old entries, and if a client connects and tries to sync from that generation, we fall back to a 'full sync'.

Along these lines, when you remove the old generations, you need to move the doc_ids that are current somewhere so that you can do the full sync. (It breaks the current notion that each generation corresponds to a single document change.)

Revision history for this message
John A Meinel (jameinel) wrote :

At the basic concept, this should give correct results, and only lose the ability to do partial synchronization from the revisions that have been purged. (If you request a sync from before the history horizon, you just end up with a full sync, getting some documents that you already have.)

Revision history for this message
John A Meinel (jameinel) wrote :

Note that the big win here is if you have documents that change a lot, but not a lot of documents. As a limit case, imagine 1 document, changed 10000 times. The transaction log would be 10k long, but after purging, you only have 1 doc_id you need to store.
As the other limit case, 10k documents, only created (never modified). Purging the log will just move where the doc_ids are stored, not actually saving any space (but maybe saving some index meta-info space, etc.)

Revision history for this message
John A Meinel (jameinel) wrote :

If the expectation is that this data won't be accessed often, you could also do post-processing. For example, you could store the rollup as a compressed string of delimited doc_ids.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.