Could finish faster in some cases if local files were scanned in a different order

Bug #477434 reported by psl
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zsync (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Binary package hint: zsync

zsync 0.6, Ubuntu 9.10 i386.

# zsync -V
zsync v0.6 (compiled Nov 6 2009 05:52:12)
By Colin Phipps <email address hidden>
Published under the Artistic License v2, see the COPYING file for details.

I believe, zsync can be improved in work with local files (before it tries to get data from HTTP server). Small changes to zsync can speed up the process.

First, I want to show a case, when zsync can finish fast but it doesn't recognize the situation and looks for a data those are not needed. This fix should be easy. Later I will show you on other examples what can be done to improve zsync.

zsync has 100% of data but it reads another seed file. zsync doesn't check that target reached 100%:

TEST A)

# ls -l
-rw------- 1 root root 730136576 2009-11-07 13:29 ubuntu-9.10-alternate-amd64.iso
-rw------- 1 root root 723068928 2009-11-07 12:31 ubuntu-9.10-alternate-i386.iso
# mv ubuntu-9.10-alternate-amd64.iso ubuntu-9.10-alternate-amd64.iso.100
# zsync -i ubuntu-9.10-alternate-amd64.iso.100 http://server/zsync/ubuntu-9.10-alternate-amd64.iso.zsync
Read ubuntu-9.10-alternate-amd64.iso.100. Target 100.0% complete.
Read ubuntu-9.10-alternate-i386.iso. Target 100.0% complete.
used 730136576 local, fetched 0

This scenario shows that zsync have 100% of target after reading file ubuntu-9.10-alternate-amd64.iso.100 but it ignores the fact that it can finish and reads next file, ubuntu-9.10-alternate-i386.iso. This is special case, easy to reproduce and to fix. I filtered zsync output to important messages only and I fetch data from http server at local LAN.

Before I will describe more tricky scenario, I will do some time measurement. I assume that when zsync has more data for target, it process seed file faster (less lookups for missing blocks).

TEST B)
# time zsync -i ubuntu-9.10-alternate-amd64.iso.100 -i ubuntu-9.10-alternate-i386.iso http://server/zsync/ubuntu-9.10-alternate-amd64.iso.zsync
Read ubuntu-9.10-alternate-amd64.iso.100. Target 100.0% complete.
Read ubuntu-9.10-alternate-i386.iso. Target 100.0% complete.
used 730136576 local, fetched 0

real 1m41.726s
user 0m23.885s
sys 0m4.456s

TEST C)
# time zsync -i ubuntu-9.10-alternate-i386.iso -i ubuntu-9.10-alternate-amd64.iso.100 http://nas/zsync/ubuntu-9.10-alternate-amd64.iso.zsync
Read ubuntu-9.10-alternate-i386.iso. Target 37.9% complete.
Read ubuntu-9.10-alternate-amd64.iso.100. Target 100.0% complete.
used 730136576 local, fetched 0

real 2m41.853s
user 0m57.672s
sys 0m6.212s

Test case B and C have the same result, ISO file can be build from local sources (no data from HTTP server are needed). Both TEST cases work with the same input files. Test C finished about 1 minute faster than test B. Order of files processed by ZSYNC is important! It is better to process file with more useful information first. I think that we can guess what file is important for us and has to be processed with higher priority.

More complex example, zsync processed some data already but it was interrupted before it finished the job (.part file left on the disk). And we have older version of the file too.

TEST D)

# ls -l
-rw------- 1 root root 671686656 2009-11-07 02:36 ubuntu-9.10-server-i386.iso
-rw------- 1 root root 723070976 2009-11-07 12:25 ubuntu-9.10-alternate-i386.iso.part
-rw------- 1 root root 723068928 2009-11-07 12:31 ubuntu-9.10-alternate-i386.iso

# zsync -i ubuntu-9.10-server-i386.iso http://server/zsync/ubuntu-9.10-alternate-i386.iso.zsync
Read ubuntu-9.10-server-i386.iso. Target 45.3% complete.
Read ubuntu-9.10-alternate-i386.iso. Target 100.0% complete.
Read ubuntu-9.10-alternate-i386.iso.part. Target 100.0% complete.
used 723070976 local, fetched 0

Test D is here to show you order of files processed by zsync:
1) read input files defined by user (-i file; ubuntu-9.10-server-i386.iso)
2) read target file (when it exists; ubuntu-9.10-alternate-i386.iso)
3) read .part file (when it exists; ubuntu-9.10-alternate-i386.iso.part)

My suggestion is to change zsync to process input files in this order:
1) Is there .part file? Process it! (previous run of zsync was interrupted, useful blocks should be in .part file already)
2) Is there target file? Process it! (it is possible that we already have 100% target)
3) read files defined by user (-i file)

One more notice to .part file. Maybe that zsync can process this file faster as blocks should be already aligned at correct positions, it doesn't need to search for blocks in the way like it works with other files. Experienced user can always put .part file with -i switch when it needs to process the file in standard way.

Revision history for this message
Matt Zimmerman (mdz) wrote :

A, B and C seem to be about skipping further input files when 100% of the data has been found, and are therefore equivalent to bug 605428, which already has a patch, so they require no further consideration.

D seems to be about changing the order in which zsync searches for local data. This is less obvious, and it's not clear to me whether the proposed ordering would be better *in general* than what zsync does today. It depends on how it is typically used, and I don't have any data on that.

Changed in zsync (Ubuntu):
importance: Undecided → Wishlist
status: New → Triaged
summary: - zsync doesn't work optimaly with local files (and this can be improved)
+ Could finish faster in some cases if local files were scanned in a
+ different order
Revision history for this message
psl (slansky) wrote :

Cases B and C were put to the example to demonstrate that order of processed files is important; the same input files in B and C case differ about 1 minute in processing time. D case suggest change order of reading local files; small and easy change in code can improve zsync performance, especially in the case that previous transfer was interrupted (user action, network connection, etc).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.