Comment 3 for bug 1691648

Revision history for this message
Matthew Oliver (matt-0) wrote :

I've been looking more closely into this. From reading about vacuum on the sqlite3 site: https://www.sqlite.org/lang_vacuum.html, vacuum, when called by itself will rewrite the dbfile elsewhere and the copy it back, and need a write lock.
But it also supports a `vacuum INTO <filename>` variant, this basically does the same thing, but doesn't copy it back. And because of that it doesn't need a write lock.
I wonder if a better mechanism would be, when a vacuum option is enabled we VACUUM INTO and then sync that new vacuumed db, rsync and then remove the vacuumed dbfile.

We don't get the current node vacuumed, BUT everytime we do an rsync or rsync_then_merge we get vacummed dbs. Meaning as re rebalance and rotate hardward in a cluster, we know dbs do eventually get vacuumed _AND_ we'd always be sending a vacuumed db of the network, so less bytes.

Another option we could expore is turn on the auto_vacuum pragma, which doesn't do full file rebuilds, but that way it works does warn that it increases fragmentation in the file: https://www.sqlite.org/pragma.html#pragma_auto_vacuum
Which could bring in additional slowness.. besides to turn this on, existing containers would need to be vacuumed too. And to crazy size of at least our clusters (at nvidia) this would be quite a task.