OpenStack Object Storage (swift)

empty 5GB container DB

Bug #1691648 reported by Hugo Kou on 2017-05-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	In Progress	Medium	Unassigned

Bug Description

The frequent PUT/DELETE container size keeps growing. The size remain big even all objects were DELETED (empty container). Should replicator vacuum it according specific conditions ?

[root@prdd1slzswcon04 94a91c0b60f945ba2054b057cd4f1979]# sqlite3 94a91c0b60f945ba2054b057cd4f1979.db
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from object;
sqlite>

[root@prdd1slzswcon04 94a91c0b60f945ba2054b057cd4f1979]# curl -g -I -XHEAD "http://10.125.229.161:6001/d693/152228/AUTH_ems_prod/7488"
HTTP/1.1 204 No Content
X-Backend-Timestamp: 1414598357.76436
X-Container-Object-Count: 0
X-Put-Timestamp: 1414598357.76549
X-Backend-Put-Timestamp: 1414598357.76549

### Original empty DB size ###
-rw------- 1 root root 5.7G May 17 21:42 94a91c0b60f945ba2054b057cd4f1979.db

### Vacuum Test ###
-rw------- 1 root root 19K May 17 21:44 94a91c0b60f945ba2054b057cd4f1979.db

* Without Vacuum brings several potential issues. Includes https://bugs.launchpad.net/swift/+bug/1691566

* Wasting network bandwidth.
* Longer DB lock time.

Hugo

Revision history for this message

Matthew Oliver (matt-0) wrote on 2017-05-18:

db_replicator_vaccum.patch Edit (1.6 KiB, text/plain)

The idea of vacuuming has come up before. In the past we haven't bothered with vacuum because the normal life cycle of a Swift container in most clusters is heavy PUT focused and vacumming was an overhead that isn't worth it.

I don't think anyone is against it per say.. but it would be nice to find out how much overhead we get, and see if the benefits was worth it.

Having said that, if we were to do it, doing it only when we are going to already need to do a bunch of I/O rather then ticking on some timer would be better in my opinion. Like you say, it would be nice to send a vacuumed database when we need to push the whole database to a new node, say on a rebalance, or on a rsync_then_merge. If we do it before we send however, I could imagine an IO spike on the node replicating the DB (vacuuming before we send) and the when writing the DB on the recipient's end, so that would be the overhead I'm speaking of.

I would think it would look something like the attached patch

NOTE: the patch is untested, seems to run, but is as a demonstration.. but could be starting point to test the effects of vacuuming on a cluster. (you can turn it on/off on db replicaors).

clayg (clay-gerrard) on 2017-05-23

Changed in swift:
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Kottur (skottur) wrote on 2024-04-08:

There is lot more tilt towards usage of timed objects (expiring). It is both PUT and DELETE heavy. This will ante up the request for vacuum. Matthew Oliver's patch looks to be real good starting point.

Revision history for this message

Matthew Oliver (matt-0) wrote on 2024-04-11:

I've been looking more closely into this. From reading about vacuum on the sqlite3 site: https://www.sqlite.org/lang_vacuum.html, vacuum, when called by itself will rewrite the dbfile elsewhere and the copy it back, and need a write lock.
But it also supports a `vacuum INTO <filename>` variant, this basically does the same thing, but doesn't copy it back. And because of that it doesn't need a write lock.
I wonder if a better mechanism would be, when a vacuum option is enabled we VACUUM INTO and then sync that new vacuumed db, rsync and then remove the vacuumed dbfile.

We don't get the current node vacuumed, BUT everytime we do an rsync or rsync_then_merge we get vacummed dbs. Meaning as re rebalance and rotate hardward in a cluster, we know dbs do eventually get vacuumed _AND_ we'd always be sending a vacuumed db of the network, so less bytes.

Another option we could expore is turn on the auto_vacuum pragma, which doesn't do full file rebuilds, but that way it works does warn that it increases fragmentation in the file: https://www.sqlite.org/pragma.html#pragma_auto_vacuum
Which could bring in additional slowness.. besides to turn this on, existing containers would need to be vacuumed too. And to crazy size of at least our clusters (at nvidia) this would be quite a task.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-04-23: Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/916702

Changed in swift:
status:	Confirmed → In Progress

Revision history for this message

Matthew Oliver (matt-0) wrote on 2024-04-23:

Another option is to maybe add some kind of spase DB file checking in the db auditor. Which when over some threshold we run the vacuum.. of course this uses the assumption it'll be easy to determine the sparsess of the DB.

Via sqlite3 client I can get pages used etc.. so I assume I can get the same details and maybe there is enough information there.

Thoughts?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-04-24:

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/916861

Revision history for this message

Matthew Oliver (matt-0) wrote on 2024-04-24:

Here is another option: https://review.opendev.org/c/openstack/swift/+/916861
This one uses the db_auditor and only does it when the freepage count is over some threshold. Which means we can better tune it.
And also means if rebalance isn't always happening in a cluster at least accounts and containers are vacuumed when needed.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

db_replicator_vaccum.patch Edit

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.