duplicity is very slow due to excessive fsync

Bug #1570293 reported by ilia on 2016-04-14
This bug affects 3 people
Affects Status Importance Assigned to Milestone

Bug Description

I've recently installed duplicity v0.7.07. Have not used duplicity before. I have noticed very low performance doing backups. For medium sized files (several MB) throughput was about 8MB/s. I have done a test backup (89.3 MB, 785 small files) which has run for 167 seconds. This is about 0.5MB/s. In both cased both disks and CPU were mostly idle (my HDD throughput is about 130MB/s). Strace revealed an excessive fsync() for files under ~/.cache/duplicity (*.sigtar.part, *.manifest.part), done after each write(). Running with fsync() disabled (via LD_PRELOAD) cut the second test time from 167 to 3 seconds,a 55X improvement.

I have located the following patch as a fix for bug #1538333 (released in v0.7.07):

--- a/duplicity/dup_temp.py
+++ b/duplicity/dup_temp.py
@@ -150,9 +150,11 @@ class FileobjHooked:

     def write(self, buf):
- Write fileobj, return result of write()
+ Write fileobj, flush, return result of write()
- return self.fileobj.write(buf)
+ res = self.fileobj.write(buf)
+ self.flush()
+ return res

     def flush(self):
         Flush fileobj and force sync.

The commit message/changelog says:
    * Fix for bug #1538333 - assert filecount == len(self.files_changed)
      - added flush after every write for all FileobjHooked files which
        should prevent some errors when duplicity is forcibly closed.

It seems the use of fsync after flush is unjustified. Fsync has nothing to do with duplicity being interrupted via signal (Ctrl-C, SIGSEGV, SIGKILL, etc.). There are very few cases where fsync is needed in the applications and this is not among these. Please, remove "os.fsync(self.fileobj.fileno())" line altogether, an application buffer flush should be enough. OS and filesystem will take case of the data.

If you worry about abrupt power failures during backup, or your writes are inconsistent you can create temporary files and rename them to the target ones after they are closed (rename() syscall is atomic). I also would argue that even application-level flush() is not helpful here. It may reduce the probability of inconsistent file data, but not eliminate it. Think what happens if signal is received between write and flush. I'm not familiar with the code, but for me it seems that you can replace the flush with tmp file + rename upon close. (Flushing too is rarely useful besides output to the user or other program). Upon signal you can either close+rename or delete the tmp file, whatever suites best the application logic.

What are you running on? My testing shows that flush/fsync is only responsible for a 5-7% slowdown overall.

Changed in duplicity:
importance: Undecided → Medium
milestone: none → 0.7.08
status: New → In Progress
assignee: nobody → Kenneth Loafman (kenneth-loafman)
Nelson Elhage (nelhage) wrote :

I'm also seeing an immense slowdown in duplicity 0.7.07, which is fixed by running under `eatmydata(1)`. I'm running on ext4 on Ubuntu Trusty (14.04). Without `eatmydata`, duplicity is consuming <5% CPU and is logging a few files per second to stdout. With, it becomes noticeably snappier just watching output, and jumps to 100% CPU.

Happy to provide more details if I can.

ilia (ilia) wrote :

I've observed the slowdown under these configurations:

1. Ubuntu 12.04, ~/.cache is on ext4 on spinning HDD
2. Ubuntu 14.04, ~/.cache is on btrfs on spinning HDD

If you accidentally have nobarrier option in fstab or using SSD for ~/.cache, this could explain the light slowdown you experience, as opposite to @nelhage and my situation.

Changed in duplicity:
assignee: Kenneth Loafman (kenneth-loafman) → nobody
status: In Progress → Fix Committed
Changed in duplicity:
milestone: 0.7.08 →
Changed in duplicity:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers