rsync fails on large files with compression

Bug #1384503 reported by Joe Harrington on 2014-10-22
156
This bug affects 35 people
Affects Status Importance Assigned to Milestone
rsync (Ubuntu)
Undecided
Ryan Harper
Trusty
Medium
Unassigned

Bug Description

[Status]
This issue is currently Incomplete.

We're currently blocked on obtaining access to a reliable, and publicly
available dataset that can be shared to reproduce the issue. The test
is critical to being able to evaluate the impact of the change on other
users when assessing if the fix can be SRU'ed to trusty.

If you are affected by this bug and can reliably reproduce the issue
with a specific dataset that you are willing and permitted to share please
comment and specify the dataset location, and how you reproduce the issue.

[Original Description]
Copying large (>10GB) files with rsync -z (compression) leads to a long hang and eventual error after transferring part of the file. The error is consistent. The file copies at normal speed until it reaches its maximum size (1.4 GB out of 20 GB for one, 6.9 GB out of 29 GB for another). Then nothing happens for a while (many minutes). Finally, there is an error:

[....]
jh/.VirtualBox/win7/win7.vbox
jh/.VirtualBox/win7/win7.vbox-prev
jh/.VirtualBox/win7/win7.vdi
rsync: [sender] write error: Broken pipe (32)
rsync error: error in rsync protocol data stream (code 12) at io.c(837) [sender=3.1.0]

In this case, 6.9 GB of 29 GB transferred. Without -z, it works.

See the following upstream report, with a comment at the end from the rsync maintainer:

https://bugzilla.samba.org/show_bug.cgi?id=10372

According to this report, version 3.1.0 (included in 14.04) uses a different compression package from prior versions. Prior versions did not have this problem for me using the same command on the same systems. Both hosts ran Ubuntu 11.10 at the time, and all run 14.04 now, in each case with all updates applied, Intel hardware. Network connection between them is gigabit ethernet through one switch. A shell ssh between them in a terminal works and stays up during the failure, so it is not a network issue. There are no relevant entries in syslog on either machine. There is sufficient capacity on the receiving disk. All filesystems are ext4.

rsync command:

/usr/bin/rsync -aHSxvz --delete --stats --exclude=lost+found --exclude=.gvfs --exclude=/nonlaptop /home/ backup.host.edu:/bu/host/home/

(yes, I changed the machine names)

Current release (both hosts):

Description: Ubuntu 14.04.1 LTS
Release: 14.04

Current package (both hosts):

rsync:
  Installed: 3.1.0-2ubuntu0.1
  Candidate: 3.1.0-2ubuntu0.1
  Version table:
 *** 3.1.0-2ubuntu0.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
        100 /var/lib/dpkg/status
     3.1.0-2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

Thanks,

--jh--

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in rsync (Ubuntu):
status: New → Confirmed
Gerald Villemure (gvillemure) wrote :

I installed rsync_3.1.1-2_amd64.deb from the vivid repo.

It installed without issue on trusty and, fortunately, resolves this major bug.

Gerald

Joe Harrington (joeharr) wrote :

This is a relief, and thanks for giving your attention to this bug. Utopic has 3.1.1-2. Can it be pushed as an update to 14.04?

Thanks!

--jh--

Kevin_Traas (kevin-traas) wrote :

Just discovered this bug report - after fighting with hanging backup scripts that have worked for years, but broke after upgrading to 14.04. I agree with Joe. Can't 3.1.1-2 be back-ported to 14.04 LTS?

This bug has now been open for just shy of 8 months....

Thanks,
Kevin

Robie Basak (racb) on 2015-06-23
Changed in rsync (Ubuntu):
assignee: nobody → Ryan Harper (raharper)
status: Confirmed → Fix Released
Changed in rsync (Ubuntu Trusty):
status: New → Confirmed
assignee: nobody → Ryan Harper (raharper)
Ryan Harper (raharper) wrote :

I've attempted to reproduce this issue but I've not been able to at this point with stock 14.04 rsync versions between two 14.04 (up-to-date) systems. It may be highly dependent on the actual data; if you can reproduce this with something publicly available (like a large iso catted together multiple times or something else that I can create bit-for-bit); that'd help me narrowing down the fix.

Could someone who can reproduce this issue please run apport-collect 1384503 so I can get more information from the systems where you can reproduce?

W.r.t 3.1.1, we cannot backport the newer package into Trusty, see https://wiki.ubuntu.com/StableReleaseUpdates for the policy and rationale.

However, I will take a look at the changes in 3.1.1 to enable the new-style compression as well as the logic to detect when there is a mismatch; If that patchset is small and sane, we may be able to SRU just that change to fix this issue in Trusty.

Andrew (awensley) wrote :

I just found this report myself. I've been experiencing the problem for a while but had to rule out other causes. This is a pretty serious bug in an LTS release.

Magnus Lubeck (magnus-lubeck) wrote :

I also have run into this issue on a 3.1.0-2ubuntu0.1 rsync.

maglub@XXX:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.3 LTS"

Can't do a "apport-collect 1384503", as I don't have a browser available on this system. What other information can I supply to help?

A (launch-k1k2) wrote :

It's an ongoing problem with rsync that's been around for about 10 years or so.

Compressed files transferred with -z fail on occasion - though once you find a file that fails, that file will always fail
I even use --skip-compress=bz2 but that doesn't work either

Last time I got it was ... today.

1.6G -rw-rw-r-- 1 1004 1004 1.6G Dec 14 14:59 log.bz2

Wed Dec 16 07:01:01 AEDT 2015
receiving incremental file list
log.bz2
inflate returned -3 (0 bytes)
rsync error: error in rsync protocol data stream (code 12) at token.c(557) [receiver=3.1.0]
rsync: connection unexpectedly closed (163300 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.0]
Wed Dec 16 07:02:01 AEDT 2015

Same LTS version as above post

rsync:
  Installed: 3.1.0-2ubuntu0.1
  Candidate: 3.1.0-2ubuntu0.1
  Version table:
 *** 3.1.0-2ubuntu0.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
        100 /var/lib/dpkg/status
     3.1.0-2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

Tom Worley (tom-worley) wrote :

I can confirm I still get this error on Ubuntu 14.04 (all up to date)
rsync:
Both machines: 3.1.0-2ubuntu0.1

File is a .sql.gz file, 1.5 GB in size, as above, --skip-compress=gz doesn't help:

.....
dbdump.sql.gz
              0 0% 0.00kB/s 0:00:00
inflate returned -3 (0 bytes)
rsync error: error in rsync protocol data stream (code 12) at token.c(557) [receiver=3.1.0]
rsync: connection unexpectedly closed (4092364 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.0]

Hi

I think this is fixed in future rsync versions. For Ubuntu 14.04, exclude
ALL compression when rsyncing, e.v. chancve -az to -a.

Regards,
Jan

On 11 January 2016 at 12:56, Tom Worley <email address hidden> wrote:

> I can confirm I still get this error on Ubuntu 14.04 (all up to date)
> rsync:
> Both machines: 3.1.0-2ubuntu0.1
>
>
> File is a .sql.gz file, 1.5 GB in size, as above, --skip-compress=gz
> doesn't help:
>
> .....
> dbdump.sql.gz
> 0 0% 0.00kB/s 0:00:00
> inflate returned -3 (0 bytes)
> rsync error: error in rsync protocol data stream (code 12) at token.c(557)
> [receiver=3.1.0]
> rsync: connection unexpectedly closed (4092364 bytes received so far)
> [generator]
> rsync error: error in rsync protocol data stream (code 12) at io.c(226)
> [generator=3.1.0]
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (1431709).
> https://bugs.launchpad.net/bugs/1384503
>
> Title:
> rsync fails on large files with compression
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1384503/+subscriptions
>

Andrew (awensley) wrote :

Excluding all compression options was not an acceptable solution for me.

What has worked since I initially found this problem back in August was to install rsync from the utopic repository where the bug was fixed. That same version (3.1.1-3) is still available in vivid and wily unchanged as far as I can tell.

Example:

wget http://mirrors.kernel.org/ubuntu/pool/main/r/rsync/rsync_3.1.1-3_amd64.deb
sudo dpkg -i rsync_3.1.1-3_amd64.deb

Or for x86:

wget http://mirrors.kernel.org/ubuntu/pool/main/r/rsync/rsync_3.1.1-3_i386.deb
sudo dpkg -i rsync_3.1.1-3_i386.deb

Problem solved.

Robie Basak (racb) wrote :

I'd still like to see this fixed, but based on Ryan's comment from when he investigated this I don't think we can do this without a failure case. I presume most reporters won't want to share their data publicly. But we need a failure case in order to make progress.

Until we have one, I'm marking this bug as Incomplete, as I think that sets a more accurate expectation. If someone can provide us with a failure case, please do so and we can set the bug back to Triaged.

If you don't want to share the failure case publicly but are willing to share it with a Canonical engineer then we can probably arrange something around that. Please get in touch.

Alternatively if someone doesn't want to share the failure case at all but is willing to prepare a patch and drive the SRU then that's fine, too. We can believe you if you report success during pre-upload testing and SRU verification and others who are affected can confirm.

Changed in rsync (Ubuntu Trusty):
status: Confirmed → Invalid
status: Invalid → Incomplete
Charles Peters II (cp) wrote :

I attempted to SRU a patch almost a year ago. And that was after two other people confirmed the patch worked for them.

In reply to my email Alberto Salvia Novella said on Mon, Jan 12, 2015 at 5:14 PM
C Peters:
> https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1300367
> Can someone nominate this for a SRU?

Done ;)

Please see Comment #6 of bug report https://bugs.launchpad.net/rsync/+bug/1300367. The fixed package has been available in my Launchpad PPA since 2014-10-20.

Robie Basak (racb) wrote :

Sorry Charles, but a patch that is known to work just in one case isn't really sufficient for an SRU. It is useful and I appreciate your contribution, but alongside it we need to analyse and understand the regression risk for other valid use cases. This is the "Regression Potential" section of the SRU paperwork, together with the SRU Verification process that is documented. I asked for this in comment #8 in the same bug, but nobody responded.

This isn't unnecessary or bureaucratic red tape. The process is there to ensure that we do not regress millions of users by trying to fix a bug that doesn't affect them.

I'm not willing to upload your patch for SRU review because I don't know what other behaviour it might regress. In order to perform the required analysis, a complete failure case (which in this case needs the correct test data for a failure case) would be most useful.

Tom Worley (tom-worley) wrote :

Hi Andrew,
Installing that version (rsync_3.1.1-3 from the Vivid repo in this case - no dependency problems) on the machine initiating (which is also the target) the rsync didn't fix it.

However, installing it on the source machine as well seems to have fixed the problem.
Regards,
Tom

André Freitas (andre-freitas) wrote :

I am also suffering from this bug with a 1GB file with the --compress flag in Ubuntu 14.04 with Rsync 3.1.0. The bug is pretty consistent and after I removed the compress flag I got no more broken pipes.

I will try to make this bug reproducible with Docker and publish here.

André Freitas (andre-freitas) wrote :

Hi, I have just published a docker compose stack to test this issue:
https://github.com/NDrive/rsync-compress-bug

Actually, I couldn't reproduce the bug with success. Maybe it only happens in some certain files.

Anakan (planetaz) wrote :

Simply move compression from rsync to ssh: -e "ssh -C"

Alek_A (ackbeat) wrote :

I tried to rsync ubuntu 14.04 and 16.04 and got this error. But this:
> Simply move compression from rsync to ssh: -e "ssh -C"
fixed it! Thanks!

Ryan Harper (raharper) on 2016-06-07
description: updated
Michael Foley (foli) wrote :

We have a few servers that are exhibiting this behaviour while copying backups sets. I cannot make any of that data available, but have tried producing some /dev/urandom produced data files that exhibit the error and have succeeded in getting something I hope is useful.

I discovered by accident while running a loop that generated and rsyncd test data that interrupting the transfer with ctrl-c produces a failing test case that works without compression but fails with compression.

I did not test rigorously, but the files do seem to need to be large, but zero filled seems to be okay. So the test data files I'm attaching are zero filled and only the beginning is random data.

Here is a recipe for producing the data
  # note: needs working ssh to localhost, does not error on a local rsync
  # setup for test
  cd ~/
  mkdir source dest
  # create a large file
  dd if=/dev/zero of=source/random-data-0 bs=1M count=1500
  # only need first part of file with random data
  dd if=/dev/urandom of=source/random-data-0 bs=1M count=50 conv=notrunc
  # copy it once successfully
  rsync -vv -ia --inplace --compress --checksum -P source/ localhost:dest/
  # change the first part of the file with new data
  dd if=/dev/urandom of=source/random-data-0 bs=1M count=50 conv=notrunc
  # copy it again, but break transfer within the first part of file
  # by typing ctrl-c key during this rsync
  rsync -vv -ia --inplace --compress --checksum -P source/ localhost:dest/
  # rsync again should will exhibit error near transfer break point
  rsync -vv -ia --inplace --compress --checksum -P source/ localhost:dest/

Michael Foley (foli) wrote :

Attaching the test data set.

If this doesn't work for some reason it is also available here:
http://people.canonical.com/~foley/lp-bug-1384503-rsync-bad-data.tgz

Tom Haddon (mthaddon) on 2016-11-25
Changed in rsync (Ubuntu Trusty):
status: Incomplete → New
status: New → Confirmed
Jon Grimm (jgrimm) on 2016-12-07
Changed in rsync (Ubuntu Trusty):
status: Confirmed → Triaged
importance: Undecided → Medium
Robie Basak (racb) wrote :

Thank you for the test case. Unfortunately Ryan is no longer available to work on this, so I'll unassign him.

Fixing this is non-trivial. I think any fix has a high risk of regression, particularly with the permutations of rsync versions that must interoperate. The full matrix needs to be considered and each permutation tested. Given that this is a significant amount of work, the compatibility issue with older versions will die over time, and 16.04 exists, I'm not sure it's worth our time.

This does not preclude any volunteer from taking this on, however.

Changed in rsync (Ubuntu Trusty):
assignee: Ryan Harper (raharper) → nobody
Haw Loeung (hloeung) wrote :

rsync shipped in Xenial builds with the included zlib (--with-included-zlib=yes and -Izlib). Couldn't we SRU a fix doing the same thing as per https://bugs.launchpad.net/rsync/+bug/1300367/comments/6?

Robie Basak (racb) wrote :

On Tue, Jan 03, 2017 at 05:00:09AM -0000, Haw Loeung wrote:
> rsync shipped in Xenial builds with the included zlib (--with-included-
> zlib=yes and -Izlib). Couldn't we SRU a fix doing the same thing as per
> https://bugs.launchpad.net/rsync/+bug/1300367/comments/6?

I am -1 on any SRU that does not come with a comprehensive regression
risk analysis. It's not the patch that is the problem; it is the
consideration of all interoperability permutations that needs to be made
if behaviour in this area is to be changed.

I'm not saying we can't do it; just that we need to carefully consider
what behaviour in what parts of the interoperability matrix we'd be
changing, what the regression risk is for each of those permutations,
and how to mitigate that.

It's not sufficient to say "we did it in Xenial so we can do the same in
Trusty". For example: without testing or analysis, I don't know if your
proposal breaks interoperability between Precise and Trusty when perhaps
that combination works today, or between non-updated Trusty and updated
Trusty.

My example is not exhaustive (testing just my example is not enough).

Remember also that each version combination has two directions, further
exploding the number of permutations to check. It would be OK if an
analysis convincingly collapses some of the permutations, but no such
analysis currently exists, AFAIK.

David Britton (davidpbritton) wrote :

After much discussion, this is too risky for an SRU without much more demonstrated analysis. It seems more appropriate for a backport.

Changed in rsync (Ubuntu Trusty):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers