delete_partition in swift-object-replicator is not asynchronous
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
New
|
Wishlist
|
Unassigned |
Bug Description
In Following cases,
- After adding the drive to the ring (and re-balance)
- setting the drive weight to 0 (and re-balance)
- drive contains hand-off partitions due to storage node failure
The partitions are moved (rsync and delete) from one drive to another.
With 1000's of objects in the partition, delete_partition() takes lot of time to complete. In one of our internal profiling (with client IO GET/PUT/DELETE on the drives)
- rsync takes 9 minutes to copy 6k objects.
- delete_partition takes 3 minutes to complete
The total time to complete the replication can be brought down 30%, if we can postpone the delete_partition or delete it asynchronously.
Note: deleting it asynchronously put pressure on the drive which impact the replication performance as well.
So, one proposal is
- move the partition to the "thrash" directory and delete the partition only at the end of the full run.
- if the drive file-system is running out of space (set some free file system space threshold), only then delete the partition asynchronously without moving tot he thrash directory.
That's a really cool idea!
I'd love to see this proposed as a feature/option abstracted under the delete partition function with a os/system thread owned/spawned by the replicator/ reconstructor that does the actual reaping/unlinking (maybe with some auditor style ratelimit options?).
To be clear, you don't think this proposal reduces the total amount of IO in anyway? It's just trying to put off delete IO so we can do replication sooner/faster?