Consider using xz for compressing images

Bug #683849 reported by Michael Hope
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro image builds
Confirmed
Low
Unassigned

Bug Description

xz gives significantly smaller sizes than gzip when compressing linaro test head images:

michaelh@crucis:~/Downloads/alpha1/xz$ ls -la lin*
-rw-r--r-- 1 michaelh michaelh 67507088 2010-12-02 09:35 linaro-natty-headless-tar-20101201-1.tar.gz
-rw-r--r-- 1 michaelh michaelh 45826972 2010-12-02 09:26 linaro-natty-headless-tar-20101201-1.tar.xz

The xz image is 68 % of the size of the gzip image.

xz comes installed as standard in Ubuntu and is widely used.

Revision history for this message
Steve Langasek (vorlon) wrote :

I think the main disadvantage is that xz-compressed files aren't rsync-friendly. Rsync is likely to be significantly more useful with our images once we get persistent urls for "current" builds out of offspring; is the concern here for smaller images, or faster-to-download images?

Revision history for this message
Michael Hope (michaelh1) wrote :

Smaller for me as that means less of my data cap used up. Hmm. I wonder if rsync -z on the original tarball might be better...

Revision history for this message
Guilherme Salgado (salgado) wrote :

Well, given the original tarball is already gzipped, what benefit would you get by rsyncing it with -z?

Revision history for this message
Loïc Minier (lool) wrote :

If your gzipped tarball is properly created with --rsyncable, the resulting tgz can be rsync-ed faster if you have a local copy of a previous tarball because rsync identifies pieces which have not changed (most of the files remain identical, so most of the .tar.gz is identical to previous ones).

This is unrelated to rsync's compression.

Revision history for this message
Guilherme Salgado (salgado) wrote : Re: [Bug 683849] Re: Consider using xz for compressing images

On Thu, 2011-01-06 at 13:47 +0000, Loïc Minier wrote:
> If your gzipped tarball is properly created with --rsyncable, the
> resulting tgz can be rsync-ed faster if you have a local copy of a
> previous tarball because rsync identifies pieces which have not changed
> (most of the files remain identical, so most of the .tar.gz is identical
> to previous ones).
>
> This is unrelated to rsync's compression.

Right, but what I was wondering was about the -z switch to rsync that
Michael proposed. That shouldn't make a difference, or am I
misunderstanding things here?

To me it seems like the best alternative is to have rsyncable/fixed-name
dailies so that we can easily/cheaply sync them. IOW, mark this bug
invalid.

Revision history for this message
Loïc Minier (lool) wrote :

/me doesn't think -z will help much since it's already compressed

Still, there's a valid request that the images could be smaller, but presumably downloading a full image daily is going to be more bytes than updating a rsync-friendly image from a local copy, different use cases though.

Revision history for this message
Michael Hope (michaelh1) wrote :

I download at most once a month, so it's less likely that rsync will be able to do its magic as more of the image will have changed.

What I meant by rsync -z is you could supply the original uncompressed tarball and rsync that. rsync will have better data to work on and the -z option will compress things over the wire.

I did a comparison here:
 https://wiki.linaro.org/MichaelHope/Sandbox/RsyncingImages

Short story: rsync of xz was fastest as the file started smaller, rsync -z of the uncompressed tarball was a close second, rsync of a rsyncable gzip was the worst. This matches some tests in a previous job.

Note that I tested between two images that were a month apart.

I'm not worried - what ever you guys decide is fine.

Revision history for this message
Loïc Minier (lool) wrote :

Moving to linaro-images; one way would be to do both .xz and rsync-able tarballs, but then we'd have to track which one people used.

It basically depends whether we're optimizing for day-to-day syncs or for month after month full re-downloads.

I'm personally more of a day-to-day sync users in general, but due to the lack of a stable URL to rsync against, I currently don't do that. If I consider the angle of releases.linaro.org, people there are probably first time downloaders, or redownload the whole thing, so .xz would make sense. Maybe we should switch to .xz on releases.linaro.org and hence on snapshots.linaro.org, or maybe we could have .xz on releases and rsyncable .tgz on snapshots.l.o? Thoughts?

affects: linaro-image-tools → linaro-images
Changed in linaro-images:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.